Skip to content

Commit 3ad99aa

Browse files
authored
Fix syntax highlighting tags (#77)
1 parent 2a85928 commit 3ad99aa

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

_posts/2025-04-30-neural-graphics-first-principles-performance.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ A workgroup is a collection of threads that execute simultaneously on the same c
3333

3434
Here's how the implementation works in Slang:
3535

36-
```slang
36+
```hlsl
3737
// ----- Constants and definitions --------
3838
3939
static const int GAUSSIANS_PER_BLOCK = 512;
@@ -101,13 +101,13 @@ Thread 1 checks blobs 1, 33, 65, …
101101
And so on…
102102

103103
Whenever a blob is identified that intersects with the current tile, it’s added to the shortlist using this line:
104-
```slang
104+
```hlsl
105105
blobs[blobCountAT++] = i;
106106
```
107107

108108
The shortlist `blobs` and its index incrementor `blobCountAT` didn’t appear in the excerpt above – that’s because they’re using workgroup shared memory, so they’re declared a bit differently, like this:
109109

110-
```slang
110+
```hlsl
111111
// ----- Shared memory declarations --------
112112
113113
// Note: In Slang, the 'groupshared' identifier is used to define
@@ -149,7 +149,7 @@ There’s a second problem that was causing poor performance in our simplified e
149149

150150
Looking back at the simplified implementation, the differentiable function we used to calculate blob colors was `simpleSplatBlobs()`:
151151

152-
```slang
152+
```hlsl
153153
/* simpleSplatBlobs() is a naive implementation of the computation of color for a pixel.
154154
* It will iterate over all of the Gaussians for each pixel, to determine their contributions
155155
* to the pixel color, so this will become prohibitively slow with a very small number of
@@ -181,7 +181,7 @@ Because this function is differentiable, we need to be able to propagate its var
181181

182182
We can avoid needing to do all of this storage of intermediate values if, instead, we provide a way for Slang to recalculate the values as it progresses through the backward propagation. To do this, we provide a user-defined backwards form for part of our rasterization algorithm.
183183

184-
```slang
184+
```hlsl
185185
/*
186186
* fineRasterize() produces the per-pixel final color from a sorted list of blobs that overlap the current tile.
187187
*
@@ -264,7 +264,7 @@ Manually providing a backwards derivative form might seem like it defeats the pu
264264

265265
So, in the code above, the backward form of `fineRasterize()` loops backward over all of our blobs, evaluates each one, and performs an “undo” operation, which we define in `undoPixelState`.
266266

267-
```slang
267+
```hlsl
268268
/*
269269
* undoPixelState() reverses the alpha blending operation and restores the previous pixel
270270
* state.
@@ -300,7 +300,7 @@ float4 undoAlphaBlend(float4 pixel, float4 gval)
300300

301301
One thing to note about undoing an alpha blend: because alpha values are all within the range [0.0, 1.0], our undo is only possible if the pixel never becomes fully opaque. This is handled inside the `transformPixelState` function called by `fineRasterize`:
302302

303-
```slang
303+
```hlsl
304304
/*
305305
* transformPixelState() applies the alpha blending operation to the pixel state &
306306
* updates the counter accordingly.
@@ -326,7 +326,7 @@ There’s one other notable difference between the simplified and full versions
326326

327327
In the simplified version, we initiated the backward derivative propagation with this line of SlangPy:
328328

329-
```Python
329+
```python
330330
module.perPixelLoss.bwds(per_pixel_loss,
331331
spy.grid(shape=(input_image.width,input_image.height)),
332332
blobs, input_image)
@@ -336,7 +336,7 @@ Recall that the `spy.grid()` function is a generator, which produces a grid-shap
336336

337337
By contrast, in this more complex version, we want to ensure that the `coarseRasterize()` and `bitonicSort()` functions can operate collaboratively on a set of pixels within a workgroup, so we create a mapping of pixels to thread IDs:
338338

339-
```Python
339+
```python
340340
def calcCompressedDispatchIDs(x_max: int, y_max: int, wg_x: int, wg_y: int):
341341
local_x = np.arange(0, wg_x, dtype=np.uint32)
342342
local_y = np.arange(0, wg_y, dtype=np.uint32)
@@ -358,7 +358,7 @@ def calcCompressedDispatchIDs(x_max: int, y_max: int, wg_x: int, wg_y: int):
358358

359359
What’s happening here is that we’re using some utility functions from NumPy to construct a grid of IDs manually, rather than asking SlangPy to generate it for us. We’re also providing the values in a single array, because, behind the scenes, SlangPy currently only supports a 1D dispatch shape– more general dispatch support is planned to be added soon. `x_max` and `y_max` represent the size of the full image, while `wg_x` and `wg_y` are the dimensions of the tile (and the workgroup that will calculate the pixel values within that tile). The IDs we create tell each thread both where it’s located within its workgroup, and which workgroup it belongs to within the full work dispatch, and from those, what pixel coordinates it’s responsible for calculating. We can then provide this set of IDs directly to our `perPixelLoss` function at dispatch:
360360

361-
```Python
361+
```python
362362
module.perPixelLoss.bwds(per_pixel_loss, dispatch_ids, blobs, input_image)
363363
```
364364

0 commit comments

Comments
 (0)