Skip to content

Add ORT worker proxy to prevent main thread locking, and service worker to cross-origin isolate which allows wasm threads #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

josephrocca
Copy link

Please feel free to close this if you wanted to make the changes in a different way and just use this as a reference - figured it'd be easier to add a pull request than to explain in #8

@juharris
Copy link
Owner

juharris commented Jul 6, 2022

Thanks so much! I'll try this out soon.

@josephrocca josephrocca mentioned this pull request Jul 6, 2022
@juharris
Copy link
Owner

juharris commented Jul 6, 2022

I tried to run it locally:
I got this warning 3 times:

c9b1644f-c3b0-49f6-b199-cec1b103569d:866 
        
       Tried to spawn a new thread, but the thread pool is exhausted.
This might result in a deadlock unless some threads eventually exit or the code explicitly breaks out to the event loop.
If you want to increase the pool size, use setting `-s PTHREAD_POOL_SIZE=...`.
If you want to throw an explicit error instead of the risk of deadlocking in those cases, use setting `-s PTHREAD_POOL_SIZE_STRICT=2`.

and this error:

App.tsx:293 
        
       DOMException: Failed to execute 'postMessage' on 'Worker': ArrayBuffer at index 2 is already detached.
    at http://localhost:3000/onnxruntime_web_build_inference_with_training_ops/ort.js:36387:25
    at new Promise (<anonymous>)
    at Object.run (http://localhost:3000/onnxruntime_web_build_inference_with_training_ops/ort.js:36384:16)
    at OnnxruntimeWebAssemblySessionHandler.run (http://localhost:3000/onnxruntime_web_build_inference_with_training_ops/ort.js:36547:47)
    at InferenceSession.run (http://localhost:3000/onnxruntime_web_build_inference_with_training_ops/ort.js:343:44)
    at runModel (http://localhost:3000/static/js/bundle.js:91:34)
    at train (http://localhost:3000/static/js/bundle.js:242:41)

@josephrocca
Copy link
Author

Hmm, this could be related to your custom build of ORT Web using different Emscripten parameters (like PTHREAD_POOL_SIZE) to the normal build, but that doesn't seem likely if you just followed their usual build instructions.

One other guess is that your local machine has a larger-than-normal number of threads, and the normal build isn't "prepared" for that number. In that case I think it's probably a bug with ORT Web. Can you try putting the following line after the ort.env.wasm.proxy line that I added:

ort.env.wasm.numThreads = 2;

I think the default value is navigator.hardwareConcurrency/2 - i.e. half the number of cores/threads that your OS reports to the browser.

And also try this demo to see if it works on your machine (it takes a minute or so to init due to huge number of ops):

https://josephrocca.github.io/super-resolution-js/

@juharris
Copy link
Owner

juharris commented Jul 6, 2022

Nah that site didn't work either
image

@josephrocca
Copy link
Author

josephrocca commented Jul 6, 2022

Maybe try commenting out the ort.env.wasm.proxy line for now in case that's doing something weird. Annoying thing here is that we can't just pass this issue off to the pros on the ORT Web team because it could have something to do with this custom build of ORT Web.

The error you're getting for the super-resolution demo looks like it could be an out-of-memory error that's unrelated to what we're trying to debug here. Are you trying with Chrome? If so, that's really strange, since it works for me and as long as you've got more than 1.5GB of RAM (the limit that it looks like you're hitting) on your system, which I of course assume you do, then I don't see why it's giving you that error.

@juharris
Copy link
Owner

juharris commented Jul 8, 2022

I'm using Edge, so yeah basically Chrome. I have 32GB of RAM.

With ort.env.wasm.numThreads = 2; but without ort.env.wasm.proxy = true;, it works but it's much slower.
With ort.env.wasm.numThreads = 8;, it's was about the same speed as before these changes but the UI seems more responsive.

I do get this output with 8 threads and got some of that but less times with 2 threads:

ort-wasm-threaded.js:850 
        
       Tried to spawn a new thread, but the thread pool is exhausted.
This might result in a deadlock unless some threads eventually exit or the code explicitly breaks out to the event loop.
If you want to increase the pool size, use setting `-s PTHREAD_POOL_SIZE=...`.
If you want to throw an explicit error instead of the risk of deadlocking in those cases, use setting `-s PTHREAD_POOL_SIZE_STRICT=2`.
ia @ ort-wasm-threaded.js:850
ort-wasm-threaded.js:1776 Heap resize call from 16777216 to 20185088 took 0.1899999976158142 msecs. Success: true
2ort-wasm-threaded.js:850 
        
       Tried to spawn a new thread, but the thread pool is exhausted.
This might result in a deadlock unless some threads eventually exit or the code explicitly breaks out to the event loop.
If you want to increase the pool size, use setting `-s PTHREAD_POOL_SIZE=...`.
If you want to throw an explicit error instead of the risk of deadlocking in those cases, use setting `-s PTHREAD_POOL_SIZE_STRICT=2`.
ia @ ort-wasm-threaded.js:850
ort-wasm-threaded.js:1776 Heap resize call from 20185088 to 24248320 took 0.12999999523162842 msecs. Success: true
ort-wasm-threaded.js:850 
        
       Tried to spawn a new thread, but the thread pool is exhausted.
This might result in a deadlock unless some threads eventually exit or the code explicitly breaks out to the event loop.
If you want to increase the pool size, use setting `-s PTHREAD_POOL_SIZE=...`.
If you want to throw an explicit error instead of the risk of deadlocking in those cases, use setting `-s PTHREAD_POOL_SIZE_STRICT=2`.
ia @ ort-wasm-threaded.js:850
ort-wasm-threaded.js:1776 Heap resize call from 24248320 to 29097984 took 0.17499999701976776 msecs. Success: true

@josephrocca
Copy link
Author

I think it's no longer using SIMD, because IIUC it should be loading ort-wasm-simd-threaded.js rather than ort-wasm-threaded.js (shown in the error message you pasted). So that probably explains why it doesn't seem to be giving you any boost but I'm not sure why it's suddenly not using SIMD.

This superresolution demo that I linked earlier loads ort-wasm-simd-threaded.js and runs fine for me in Chrome. So there's something weird happening here related to one or more of: Edge, your custom build of ORT Web, and the specific model that you're loading.

  • It seems like ArrayBuffer at index 2 is already detached is caused by ort.env.wasm.proxy = true, and if you could create a minimal example of this, including the ORT Web build instructions that you used, and post an issue on the onnxruntime repo, then the pros at ORT Web might be able to take it from here.
  • Another minimal example, but without ort.env.wasm.proxy = true would show that SIMD is not loading when it should be, and those "thread pool is exhausted" warnings.

Ideally you'd be able to reproduce these bugs/errors somehow without your custom build - I think that would raise the eyebrows of the ORT Web maintainers more. Swapping out the custom build for a normal build, and your current model (with training ops) for one that works without training ops would tell you whether it's training-ops-related.

Also, I realised that this error is probably not OOM-related, or at least not in the same way as that other issue I linked, since the number would indicate ~150mb, rather than 1.5gb. So if you could test that on Chrome then that would be interesting because it might be an Edge-related bug. I've tested on Edge (Linux) and it works fine for me.

@juharris
Copy link
Owner

BTW I tried using https://josephrocca.github.io/super-resolution-js/ in Google Chrome and I got a similar error to what I got in Edge:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants