Skip to content

Commit fdb503d

Browse files
author
MFC Action
committed
Docs @ f2ef560
1 parent 5f4aec5 commit fdb503d

File tree

3 files changed

+56
-81
lines changed

3 files changed

+56
-81
lines changed

documentation/md_gpuDebugging.html

Lines changed: 24 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -165,59 +165,47 @@ <h2><a class="anchor" id="autotoc_md106"></a>
165165
Cray General Options</h2>
166166
<div class="fragment"><div class="line">CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)</div>
167167
</div><!-- fragment --><ul>
168-
<li>Dumps a time-stamped log line (<code>"ACC: ...&lt;/tt&gt;) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU.</code></li>
169-
<li><code>Outputs on STDERR by default. Can be changed by setting <code>CRAY_ACC_DEBUG_FILE</code>.<ul>
168+
<li>Dumps a time-stamped log line (<code>ACC: ...</code>) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".</li>
169+
<li>Outputs on STDERR by default. Can be changed by setting <code>CRAY_ACC_DEBUG_FILE</code>.<ul>
170170
<li>Recognizes <code>stderr</code>, <code>stdout</code>, and <code>process</code>.</li>
171171
<li><code>process</code> automatically generates a new file based on <code>pid</code> (each MPI process will have a different file)</li>
172172
</ul>
173-
</code></li>
174-
<li><code>While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP</code></li>
173+
</li>
174+
<li>While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP</li>
175175
</ul>
176-
<p><code></p><div class="fragment"><div class="line">CRAY_ACC_FORCE_EARLY_INIT=1</div>
176+
<div class="fragment"><div class="line">CRAY_ACC_FORCE_EARLY_INIT=1</div>
177177
</div><!-- fragment --><ul>
178178
<li>Force full GPU initialization at program start so you can see start-up hangs immediately</li>
179179
<li>Default behavior without an environment variable is to defer initialization on first use</li>
180180
<li>Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device</li>
181181
</ul>
182-
<p></code></p>
183-
<p><code></code></p>
184182
<h2><a class="anchor" id="autotoc_md107"></a>
185-
<code>Cray OpenACC Options</code></h2>
186-
<p><code></code></p>
187-
<p><code></p><div class="fragment"><div class="line">CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1</div>
183+
Cray OpenACC Options</h2>
184+
<div class="fragment"><div class="line">CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1</div>
188185
</div><!-- fragment --><ul>
189186
<li>Will cause <code>acc_present_dump()</code> to output variable names and file locations in addition to variable mappings</li>
190187
<li>Add <code>acc_present_dump()</code> around hotspots to help find problems with data movements<ul>
191188
<li>Helps more if adding <code>CRAY_ACC_DEBUG</code> environment variable</li>
192189
</ul>
193190
</li>
194191
</ul>
195-
<p></code></p>
196-
<p><code></code></p>
197192
<h1><a class="anchor" id="autotoc_md108"></a>
198-
<code>NVHPC Compiler Options</code></h1>
199-
<p><code></code></p>
200-
<p><code></code></p>
193+
NVHPC Compiler Options</h1>
201194
<h2><a class="anchor" id="autotoc_md109"></a>
202-
<code>NVHPC General Options</code></h2>
203-
<p><code></code></p>
204-
<p><code></p><div class="fragment"><div class="line">STATIC_RANDOM_SEED=1</div>
195+
NVHPC General Options</h2>
196+
<div class="fragment"><div class="line">STATIC_RANDOM_SEED=1</div>
205197
</div><!-- fragment --><ul>
206198
<li>Forces the seed returned by <code>RANDOM_SEED</code> to be constant, so it generates the same sequence of random numbers</li>
207199
<li>Useful for testing issues with randomized data</li>
208200
</ul>
209-
<p></code></p>
210-
<p><code></p><div class="fragment"><div class="line">NVCOMPILER_TERM=option[,option]</div>
201+
<div class="fragment"><div class="line">NVCOMPILER_TERM=option[,option]</div>
211202
</div><!-- fragment --><ul>
212203
<li><code>[no]debug</code>: Enables/disables just-in-time debugging (debugging invoked on error)</li>
213204
<li><code>[no]trace</code>: Enables/disables stack traceback on error</li>
214205
</ul>
215-
<p></code></p>
216-
<p><code></code></p>
217206
<h2><a class="anchor" id="autotoc_md110"></a>
218-
<code>NVHPC OpenACC Options</code></h2>
219-
<p><code></code></p>
220-
<p><code></p><div class="fragment"><div class="line">NVCOMPILER_ACC_NOTIFY= &lt;bitmask&gt;</div>
207+
NVHPC OpenACC Options</h2>
208+
<div class="fragment"><div class="line">NVCOMPILER_ACC_NOTIFY= &lt;bitmask&gt;</div>
221209
</div><!-- fragment --><ul>
222210
<li>Assign the environment variable to a bitmask to print out information to stderr for the following<ul>
223211
<li>kernel launches: 1</li>
@@ -229,15 +217,13 @@ <h2><a class="anchor" id="autotoc_md110"></a>
229217
</li>
230218
<li>1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"</li>
231219
</ul>
232-
<p></code></p>
233-
<p><code></p><div class="fragment"><div class="line">NVCOMPILER_ACC_TIME=1</div>
220+
<div class="fragment"><div class="line">NVCOMPILER_ACC_TIME=1</div>
234221
</div><!-- fragment --><ul>
235222
<li>Lightweight profiler</li>
236223
<li>prints a tidy end-of-run table with per-region and per-kernel times and bytes moved</li>
237224
<li>Do not use with CUDA profiler at the same time</li>
238225
</ul>
239-
<p></code></p>
240-
<p><code></p><div class="fragment"><div class="line">NVCOMPILER_ACC_DEBUG=1</div>
226+
<div class="fragment"><div class="line">NVCOMPILER_ACC_DEBUG=1</div>
241227
</div><!-- fragment --><ul>
242228
<li>Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.</li>
243229
<li>Great for "partially present" or "pointer went missing" errors.</li>
@@ -246,19 +232,15 @@ <h2><a class="anchor" id="autotoc_md110"></a>
246232
</ul>
247233
</li>
248234
</ul>
249-
<p></code></p>
250-
<p><code></code></p>
251235
<h2><a class="anchor" id="autotoc_md111"></a>
252-
<code>NVHPC OpenMP Options</code></h2>
253-
<p><code></code></p>
254-
<p><code></p><div class="fragment"><div class="line">LIBOMPTARGET_PROFILE=run.json</div>
236+
NVHPC OpenMP Options</h2>
237+
<div class="fragment"><div class="line">LIBOMPTARGET_PROFILE=run.json</div>
255238
</div><!-- fragment --><ul>
256239
<li>Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope</li>
257240
<li>Great lightweight profiler when Nsight is overkill.</li>
258241
<li>Granularity in µs via <code>LIBOMPTARGET_PROFILE_GRANULARITY</code> (default 500).</li>
259242
</ul>
260-
<p></code></p>
261-
<p><code></p><div class="fragment"><div class="line">LIBOMPTARGET_INFO=&lt;bitmask&gt;</div>
243+
<div class="fragment"><div class="line">LIBOMPTARGET_INFO=&lt;bitmask&gt;</div>
262244
</div><!-- fragment --><ul>
263245
<li>Prints out different types of runtime information</li>
264246
<li>Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.</li>
@@ -273,44 +255,37 @@ <h2><a class="anchor" id="autotoc_md111"></a>
273255
</ul>
274256
</li>
275257
</ul>
276-
<p></code></p>
277-
<p><code></p><div class="fragment"><div class="line">LIBOMPTARGET_DEBUG=1</div>
258+
<div class="fragment"><div class="line">LIBOMPTARGET_DEBUG=1</div>
278259
</div><!-- fragment --><ul>
279260
<li>Developer-level trace (host-side)</li>
280261
<li>Much noisier than <code>INFO</code></li>
281262
<li>Only works if the runtime was built with <code>-DOMPTARGET_DEBUG</code>.</li>
282263
</ul>
283-
<p></code></p>
284-
<p><code></p><div class="fragment"><div class="line">LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}</div>
264+
<div class="fragment"><div class="line">LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}</div>
285265
</div><!-- fragment --><ul>
286266
<li>This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.</li>
287267
<li>The value corresponds to the <code>-O{0,1,2,3}</code> command line argument passed to clang.</li>
288268
</ul>
289-
<p></code></p>
290-
<p><code></p><div class="fragment"><div class="line">LIBOMPTARGET_JIT_SKIP_OPT=1</div>
269+
<div class="fragment"><div class="line">LIBOMPTARGET_JIT_SKIP_OPT=1</div>
291270
</div><!-- fragment --><ul>
292271
<li>This environment variable can be used to skip the optimization pipeline during JIT compilation.</li>
293272
<li>If set, the image will only be passed through the backend.</li>
294273
<li>The backend is invoked with the <code>LIBOMPTARGET_JIT_OPT_LEVEL</code> flag.</li>
295274
</ul>
296-
<p></code></p>
297-
<p><code></code></p>
298275
<h1><a class="anchor" id="autotoc_md112"></a>
299-
<code>Compiler Documentation</code></h1>
300-
<p><code></code></p>
301-
<p><code></p><ul>
276+
Compiler Documentation</h1>
277+
<ul>
302278
<li><a href="https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables">Cray &amp; OpenMP Docs</a></li>
303279
<li><a href="https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables">Cray &amp; OpenACC Docs</a></li>
304280
<li><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables">NVHPC &amp; OpenACC Docs</a></li>
305281
<li><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2">NVHPC &amp; OpenMP Docs</a></li>
306-
<li>[LLVM &amp; OpenMP Docs] (<a href="https://openmp.llvm.org/design/Runtimes.html">https://openmp.llvm.org/design/Runtimes.html</a>)<ul>
282+
<li><a href="https://openmp.llvm.org/design/Runtimes.html">LLVM &amp; OpenMP Docs</a><ul>
307283
<li>NVHPC is built on top of LLVM</li>
308284
</ul>
309285
</li>
310286
<li><a href="https://www.openmp.org/spec-html/5.1/openmp.html">OpenMP Docs</a></li>
311287
<li><a href="https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf">OpenACC Docs</a> </li>
312288
</ul>
313-
<p></code></p>
314289
</div></div><!-- contents -->
315290
</div><!-- PageDoc -->
316291
</div><!-- doc-content -->

0 commit comments

Comments
 (0)