You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whenever a blob is identified that intersects with the current tile, it’s added to the shortlist using this line:
104
-
```slang
104
+
```hlsl
105
105
blobs[blobCountAT++] = i;
106
106
```
107
107
108
108
The shortlist `blobs` and its index incrementor `blobCountAT` didn’t appear in the excerpt above – that’s because they’re using workgroup shared memory, so they’re declared a bit differently, like this:
109
109
110
-
```slang
110
+
```hlsl
111
111
// ----- Shared memory declarations --------
112
112
113
113
// Note: In Slang, the 'groupshared' identifier is used to define
@@ -149,7 +149,7 @@ There’s a second problem that was causing poor performance in our simplified e
149
149
150
150
Looking back at the simplified implementation, the differentiable function we used to calculate blob colors was `simpleSplatBlobs()`:
151
151
152
-
```slang
152
+
```hlsl
153
153
/* simpleSplatBlobs() is a naive implementation of the computation of color for a pixel.
154
154
* It will iterate over all of the Gaussians for each pixel, to determine their contributions
155
155
* to the pixel color, so this will become prohibitively slow with a very small number of
@@ -181,7 +181,7 @@ Because this function is differentiable, we need to be able to propagate its var
181
181
182
182
We can avoid needing to do all of this storage of intermediate values if, instead, we provide a way for Slang to recalculate the values as it progresses through the backward propagation. To do this, we provide a user-defined backwards form for part of our rasterization algorithm.
183
183
184
-
```slang
184
+
```hlsl
185
185
/*
186
186
* fineRasterize() produces the per-pixel final color from a sorted list of blobs that overlap the current tile.
187
187
*
@@ -264,7 +264,7 @@ Manually providing a backwards derivative form might seem like it defeats the pu
264
264
265
265
So, in the code above, the backward form of `fineRasterize()` loops backward over all of our blobs, evaluates each one, and performs an “undo” operation, which we define in `undoPixelState`.
266
266
267
-
```slang
267
+
```hlsl
268
268
/*
269
269
* undoPixelState() reverses the alpha blending operation and restores the previous pixel
One thing to note about undoing an alpha blend: because alpha values are all within the range [0.0, 1.0], our undo is only possible if the pixel never becomes fully opaque. This is handled inside the `transformPixelState` function called by `fineRasterize`:
302
302
303
-
```slang
303
+
```hlsl
304
304
/*
305
305
* transformPixelState() applies the alpha blending operation to the pixel state &
306
306
* updates the counter accordingly.
@@ -326,7 +326,7 @@ There’s one other notable difference between the simplified and full versions
326
326
327
327
In the simplified version, we initiated the backward derivative propagation with this line of SlangPy:
@@ -336,7 +336,7 @@ Recall that the `spy.grid()` function is a generator, which produces a grid-shap
336
336
337
337
By contrast, in this more complex version, we want to ensure that the `coarseRasterize()` and `bitonicSort()` functions can operate collaboratively on a set of pixels within a workgroup, so we create a mapping of pixels to thread IDs:
What’s happening here is that we’re using some utility functions from NumPy to construct a grid of IDs manually, rather than asking SlangPy to generate it for us. We’re also providing the values in a single array, because, behind the scenes, SlangPy currently only supports a 1D dispatch shape– more general dispatch support is planned to be added soon. `x_max` and `y_max` represent the size of the full image, while `wg_x` and `wg_y` are the dimensions of the tile (and the workgroup that will calculate the pixel values within that tile). The IDs we create tell each thread both where it’s located within its workgroup, and which workgroup it belongs to within the full work dispatch, and from those, what pixel coordinates it’s responsible for calculating. We can then provide this set of IDs directly to our `perPixelLoss` function at dispatch:
0 commit comments