Torch integration working with CUDA, Vulkan and D3D12 #362

ccummingsNV · 2025-07-21T12:07:54Z

This is the first version of rewritten torch integration, which includes fixed interop with vulkan/d3d, and direct device sharing with CUDA. Key additions:

Lots of fixes for CUDA interop and CUDA backend
Special 'torch module/function' removed - use of torch is now auto detected from parameters passed to a function
Reduced complexity of autograd integrationn + moved some stuff native

What's missing and will be in next PR:

Shifting more stuff to native land, such as the marshalling of TensorRef and aspects of the torch call
A heap for interop buffers, to avoid constant re-allocation of buffers for passing in/out of torch

I'm still not entirely happy with the hoops we have to jump through for the auto-grad hook, but PyTorch has some pretty rock solid rules about what you can/can't do involving the storing/clearing/copying of tensors, and I can't find a simpler way to work around them.

…hader-slang/slangpy into dev/ccummings/torchintegration

slangpy/core/calldata.py

skallweitNV

LGTM, some of the comments are on the nitpicky side, so feel free to ignore. I didn't super deep into the marshalling and pytorch autograd implementation but reading through it looked good to me.

skallweitNV · 2025-07-31T11:29:13Z

.github/workflows/ci.yml

@@ -91,8 +91,16 @@ jobs:
        run: |
          sudo apt update && sudo apt install -y libxinerama-dev libxcursor-dev xorg-dev libglu1-mesa-dev pkg-config

-      # Setup Python.
-      - name: Setup Python ${{ matrix.python }}
+      # Setup Python (no pip cache on unit test windows runners - massive slow down).


Why can't we use the cache?

skallweitNV · 2025-07-31T11:37:37Z

src/slangpy_ext/device/kernel.cpp

+                if (command_encoder) {
+                    SGL_CHECK(
+                        !cuda_stream.is_valid(),
+                        "Can not specify cuda stream if appending to a command encoder."


nitpick: in general I think we should use CUDA all-caps in text/strings/output

skallweitNV · 2025-07-31T11:38:23Z

src/slangpy_ext/device/kernel.cpp

@@ -40,6 +40,8 @@ SGL_PY_EXPORT(device_kernel)
               uint3 thread_count,
               nb::dict vars,
               CommandEncoder* command_encoder,
+               CommandQueueType queue,


we don't support multiple queues, so is this needed?

skallweitNV · 2025-07-31T11:40:46Z

src/slangpy_ext/utils/slangpy.cpp

+{
+    uint8_t buffer[8];
+    for (int i = 0; i < 8; ++i) {
+        buffer[7 - i] = HEX_CHARS[(value >> (i * 4)) & 0xF];


minor: there is sgl::string::hexlify but it returns a std::string so this has no allocation overhead. but we could also introduce a hexlify overload that takes an output buffer and use that.

skallweitNV · 2025-07-31T11:45:07Z

src/slangpy_ext/utils/slangpy.cpp

+            nb::arg("args"),
+            nb::arg("kwargs"),
+            D_NA(NativeCallData, _py_torch_call)
+        )
        .def_prop_rw(
            "call_group_shape",
            &NativeCallData::get_call_group_shape,


unrelated but this should be just call_group_shape

skallweitNV · 2025-07-31T12:24:03Z

src/slangpy_ext/utils/slangpy.h

+    std::optional<nb::ndarray<nb::pytorch, nb::device::cuda>> tensor() const { return m_tensor; }
+
+    void set_tensor(const std::optional<nb::ndarray<nb::pytorch, nb::device::cuda>> tensor) { m_tensor = tensor; }
+
+    ref<Buffer> interop_buffer() const { return m_interop_buffer; }
+
+    void set_interop_buffer(const ref<Buffer>& interop_buffer) { m_interop_buffer = interop_buffer; }
+
+    int32_t id() const { return m_id; }
+
+    void set_id(int32_t id) { m_id = id; }
+
+    ref<TensorRef> grad_in() const { return m_grad_in; }
+    void set_grad_in(const ref<TensorRef>& grad_in) { m_grad_in = grad_in; }
+
+    ref<TensorRef> grad_out() const { return m_grad_out; }
+    void set_grad_out(const ref<TensorRef>& grad_out) { m_grad_out = grad_out; }
+
+    std::pair<AccessType, AccessType> last_access() const { return m_last_access; }
+    void set_last_access(const std::pair<AccessType, AccessType>& last_access) { m_last_access = last_access; }


it looks a lot like these could all just be public fields instead of getter/setters

skallweitNV · 2025-07-31T14:57:56Z

slangpy/core/function.py

+        Specify a CUDA stream to use for the function. This is useful for synchronizing with other
+        CUDA operations or ensuring that the function runs on a specific stream.
+        """
+        if stream.type != NativeHandleType.CUstream:


you also do that check in FunctionNodeCUDAStream constructor

skallweitNV · 2025-07-31T15:01:50Z

slangpy/tests/slangpy_tests/helpers.py

@@ -82,6 +80,17 @@ def get_device(
            "Please set use_cache=False if you want to use existing_device_handles."
        )

+    selected_adaptor_luid = None


slightly different then in sglhelpers.py, maybe worth consolidating. i.e. we could maybe select the adapter in conftest.py?

skallweitNV · 2025-07-31T15:02:35Z

slangpy/tests/slangpy_tests/helpers.py

+    torch.cuda.current_device()
+    torch.cuda.current_stream()


are these calls necessary?

skallweitNV · 2025-07-31T15:06:09Z

slangpy/torchintegration/torchtensormarshall.py

+from typing import Any, Optional, cast
+from numpy import ScalarType
+from slangpy import DataType, Device, BufferUsage, TypeReflection, DeviceType
+import torch


are we guaranteed that this import is not run at import slangpy time?

ccummingsNV added 18 commits July 21, 2025 12:37

Custom functions in utils + helpers for creating torch devices

f269af4

Better error handling on device init with cuda interop

fd7dc2d

Add device id to tensor view

30df11c

Fix interop check

929ca0e

Enable torch integration tests for cuda

05683a6

Pass buffers directly in cuda + add correct syncs

5a28464

Don't create custom stream + don't set current context

27fb477

Wrap cuda sync in cuda interop check

9b2fd3f

Enable torch tests for vulkan + d3d12

9a9e354

Fix some typing issues

cda0f15

Install pytorch on ci

b40ca68

Better ymls + fix device list for torch tests

c92d579

More yml improvements + disable race condition repro test

d601bfe

Start on new native side of ref to torch tensor

c963b8d

Fix up syncing and autograd graph. All tests passing

5289999

First attempt at getting torch fully integrated with usual slangpy call

1d2b99b

Delete old stuff + check for cuda tensors

23229f6

Fix cuda interop so it can properly select and share device

f455858

ccummingsNV requested a review from a team as a code owner July 25, 2025 14:11

ccummingsNV added 11 commits July 25, 2025 16:30

Fix some tests

2bd493f

Update samples

dbc7ce7

Temp disable call group tests whilst crashing

dccc686

Merge branch 'main' into dev/ccummings/torchintegration

9bf24f7

Merge branch 'main' into dev/ccummings/torchintegration

066d1ed

Fix driver crash due to unitialized sampler

2f478e3

Skip sampler test until working on cuda

9bc30dd

Merge branch 'main' into dev/ccummings/torchintegration

ab5a300

Ignore nsys databases

48448ad

Include hash in temp output file

a28e684

Log cuda stream info

e4c1fa0

ccummingsNV added 4 commits July 28, 2025 17:45

Try smaller max processes

33a8794

Disable python cache on windows

2c42030

Fix CI check

973570b

Merge branch 'main' into dev/ccummings/torchintegration

5f1bfbb

ccummingsNV requested a review from Copilot July 29, 2025 13:28

This comment was marked as outdated.

Sign in to view

ccummingsNV added 7 commits July 29, 2025 16:47

Fix torch shutdown crash

95d747c

Try disabling torch multithreading to see if it fixes tests

2648b69

Make dtypes debuggable

d15e01e

Merge branch 'dev/ccummings/torchintegration' of https://github.com/s…

ebd42d5

…hader-slang/slangpy into dev/ccummings/torchintegration

Fix linux crash + cleanup tests

2b7c551

Extra launch config, fix watcher bug, restore ci test setting

8c1a74c

PR cleanup

46f7034

ccummingsNV requested a review from Copilot July 30, 2025 13:19

ccummingsNV changed the title ~~Draft: Working torch integration~~ Torch integration working with CUDA, Vulkan and D3D12 Jul 30, 2025

This comment was marked as outdated.

Sign in to view

PR cleanup

a762f3b

ccummingsNV requested review from tdavidovicNV, skallweitNV and kaizhangNV July 30, 2025 13:46

tdavidovicNV reviewed Jul 30, 2025

View reviewed changes

slangpy/core/calldata.py Outdated Show resolved Hide resolved

ccummingsNV requested a review from Copilot July 30, 2025 13:48

This comment was marked as resolved.

Sign in to view

PR fixes + fix TensorRef signature

3576444

ccummingsNV requested a review from Copilot July 30, 2025 14:12

This comment was marked as resolved.

Sign in to view

ccummingsNV added 2 commits July 30, 2025 15:27

Fix circular ref hack

57dbba5

Pr fix + better signature test fix

76d821b

ccummingsNV requested a review from tunabrain July 30, 2025 21:26

skallweitNV approved these changes Jul 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch integration working with CUDA, Vulkan and D3D12 #362

Torch integration working with CUDA, Vulkan and D3D12 #362

ccummingsNV commented Jul 21, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

skallweitNV left a comment

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

skallweitNV Jul 31, 2025

Uh oh!

Uh oh!

Torch integration working with CUDA, Vulkan and D3D12 #362

Are you sure you want to change the base?

Torch integration working with CUDA, Vulkan and D3D12 #362

Conversation

ccummingsNV commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

skallweitNV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ccummingsNV commented Jul 21, 2025 •

edited

Loading