Transform idle browsers into a powerful distributed AI inference network
Your own browser-based inference infrastructure by turning idle browsers into compute nodes
🚀 Quick Start • 📖 API Reference • 🛠️ Development • 💬 Discord
Woolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.
🔗 Client side: Available in
woolball-client
📋 Roadmap: Check our next steps
🔧 Provider | 🎯 Task | 🤖 Models | 📊 Status |
---|---|---|---|
Transformers.js | 🎤 Speech-to-Text | ONNX Models | ✅ Ready |
Transformers.js | 🔊 Text-to-Speech | ONNX Models | ✅ Ready |
Kokoro.js | 🔊 Text-to-Speech | ONNX Models | ✅ Ready |
Transformers.js | 🌐 Translation | ONNX Models | ✅ Ready |
Transformers.js | 📝 Text Generation | ONNX Models | ✅ Ready |
WebLLM | 📝 Text Generation | MLC Models | ✅ Ready |
MediaPipe | 📝 Text Generation | LiteRT Models | ✅ Ready |
Get up and running in under 2 minutes:
git clone --branch deploy --single-branch --depth 1 https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up -d
Open http://localhost:9000 to ensure at least one client node is connected.
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"user","content":"Hello! Can you explain what Woolball is?"}]' \
-F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
-F "provider=mediapipe" \
-F "maxTokens=200"
Deploy Woolball to DigitalOcean App Platform with a single click:
- 🌐 Woolball Client: Frontend interface accessible via your app URL
- 🔌 Core API: RESTful API for AI inference jobs (
/api
route) - 🔗 WebSocket Server: Real-time communication with browser nodes (
/ws
route) - ⚙️ Background Service: Job orchestration and node management
- 📊 Redis Database: Managed Redis instance for caching and queues
- Your app will be available at
https://your-app-name.ondigitalocean.app
- API endpoint:
https://your-app-name.ondigitalocean.app/api/v1
- WebSocket:
wss://your-app-name.ondigitalocean.app/ws
Generate text with powerful language models
🤖 Available Models
Model | Quantization | Description |
---|---|---|
HuggingFaceTB/SmolLM2-135M-Instruct |
fp16 |
Compact model for basic text generation |
HuggingFaceTB/SmolLM2-360M-Instruct |
q4 |
Balanced performance and size |
Mozilla/Qwen2.5-0.5B-Instruct |
q4 |
Efficient model for general tasks |
onnx-community/Qwen2.5-Coder-0.5B-Instruct |
q8 |
Specialized for code generation |
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
-F "model=HuggingFaceTB/SmolLM2-135M-Instruct" \
-F "dtype=fp16" \
-F "max_new_tokens=250" \
-F "temperature=0.7" \
-F "do_sample=true"
Parameter | Type | Default | Description |
---|---|---|---|
model |
string | - | 🤖 Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct") |
dtype |
string | - | 🔧 Quantization level (e.g., "fp16", "q4") |
max_length |
number | 20 | 📏 Maximum length the generated tokens can have (includes input prompt) |
max_new_tokens |
number | null | 🆕 Maximum number of tokens to generate, ignoring prompt length |
min_length |
number | 0 | 📐 Minimum length of the sequence to be generated (includes input prompt) |
min_new_tokens |
number | null | 🔢 Minimum numbers of tokens to generate, ignoring prompt length |
do_sample |
boolean | false | 🎲 Whether to use sampling; use greedy decoding otherwise |
num_beams |
number | 1 | 🔍 Number of beams for beam search. 1 means no beam search |
temperature |
number | 1.0 | 🌡️ Value used to modulate the next token probabilities |
top_k |
number | 50 | 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering |
top_p |
number | 1.0 | 📊 If < 1, only tokens with probabilities adding up to top_p or higher are kept |
repetition_penalty |
number | 1.0 | 🔄 Parameter for repetition penalty. 1.0 means no penalty |
no_repeat_ngram_size |
number | 0 | 🚫 If > 0, all ngrams of that size can only occur once |
🤖 Available Models
Model | Description |
---|---|
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC |
DeepSeek R1 distilled model with reasoning capabilities |
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC |
DeepSeek R1 distilled Llama-based model |
SmolLM2-1.7B-Instruct-q4f32_1-MLC |
Compact instruction-following model |
Llama-3.1-8B-Instruct-q4f32_1-MLC |
Meta's Llama 3.1 8B instruction model |
Qwen3-8B-q4f32_1-MLC |
Alibaba's Qwen3 8B model |
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
-F "model=DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC" \
-F "provider=webllm" \
-F "temperature=0.7" \
-F "top_p=0.95"
Parameter | Type | Description |
---|---|---|
model |
string | 🤖 Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC") |
provider |
string | 🔧 Must be set to "webllm" when using WebLLM models |
context_window_size |
number | 🪟 Size of the context window for the model |
sliding_window_size |
number | 🔄 Size of the sliding window for attention |
attention_sink_size |
number | 🎯 Size of the attention sink |
repetition_penalty |
number | 🔄 Penalty for repeating tokens |
frequency_penalty |
number | 📊 Penalty for token frequency |
presence_penalty |
number | 👁️ Penalty for token presence |
top_p |
number | 📈 If < 1, only tokens with probabilities adding up to top_p or higher are kept |
temperature |
number | 🌡️ Value used to modulate the next token probabilities |
bos_token_id |
number | 🏁 Beginning of sequence token ID (optional) |
🤖 Available Models
Model | Device Type | Description |
---|---|---|
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task |
CPU | Gemma2 2B model optimized for CPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin |
GPU | Gemma2 2B model optimized for GPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task |
CPU/GPU | Gemma3 1B model with INT4 quantization |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task |
Web | Gemma3 4B model optimized for web deployment |
curl -X POST http://localhost:9002/api/v1/text-generation \
-F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain quantum computing in simple terms."}]' \
-F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
-F "provider=mediapipe" \
-F "maxTokens=500" \
-F "temperature=0.7" \
-F "topK=40" \
-F "randomSeed=12345"
Parameter | Type | Description |
---|---|---|
model |
string | 🤖 Model ID for MediaPipe LiteRT models on DigitalOcean Spaces |
provider |
string | 🔧 Must be set to "mediapipe" when using MediaPipe models |
maxTokens |
number | 🔢 Maximum number of tokens to generate |
randomSeed |
number | 🎲 Random seed for reproducible results |
topK |
number | 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering |
temperature |
number | 🌡️ Value used to modulate the next token probabilities |
Convert audio to text with Whisper models
Model | Quantization | Description |
---|---|---|
onnx-community/whisper-large-v3-turbo_timestamped |
q4 |
🎯 High accuracy with timestamps |
onnx-community/whisper-small |
q4 |
⚡ Fast processing |
# 📁 Local file
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=@/path/to/your/file.mp3" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
# 🔗 URL
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=https://example.com/audio.mp3" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
# 📊 Base64
curl -X POST http://localhost:9002/api/v1/speech-recognition \
-F "input=data:audio/mp3;base64,YOUR_BASE64_ENCODED_AUDIO" \
-F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
-F "dtype=q4" \
-F "language=en" \
-F "return_timestamps=true" \
-F "stream=false"
Parameter | Type | Description |
---|---|---|
model |
string | 🤖 Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped") |
dtype |
string | 🔧 Quantization level (e.g., "q4") |
return_timestamps |
boolean | 'word' | ⏰ Return timestamps ("word" for word-level). Default is false . |
stream |
boolean | 📡 Stream results in real-time. Default is false . |
chunk_length_s |
number | 📏 Length of audio chunks to process in seconds. Default is 0 (no chunking). |
stride_length_s |
number | 🔄 Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to chunk_length_s / 6 . |
force_full_sequences |
boolean | 🎯 Whether to force outputting full sequences or not. Default is false . |
language |
string | 🌍 Source language (auto-detect if null). Use this to potentially improve performance if the source language is known. |
task |
null | 'transcribe' | 'translate' | 🎯 The task to perform. Default is null , meaning it should be auto-detected. |
num_frames |
number | 🎬 The number of frames in the input audio. |
Generate natural speech from text
🤖 Available Models
Language | Model | Flag |
---|---|---|
English | Xenova/mms-tts-eng |
🇺🇸 |
Spanish | Xenova/mms-tts-spa |
🇪🇸 |
French | Xenova/mms-tts-fra |
🇫🇷 |
German | Xenova/mms-tts-deu |
🇩🇪 |
Portuguese | Xenova/mms-tts-por |
🇵🇹 |
Russian | Xenova/mms-tts-rus |
🇷🇺 |
Arabic | Xenova/mms-tts-ara |
🇸🇦 |
Korean | Xenova/mms-tts-kor |
🇰🇷 |
# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test for text to speech." \
-F "model=Xenova/mms-tts-eng" \
-F "dtype=q8" \
-F "stream=false"
# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test for streaming text to speech." \
-F "model=Xenova/mms-tts-eng" \
-F "dtype=q8" \
-F "stream=true"
Parameter | Type | Description | Required For |
---|---|---|---|
model |
string | 🤖 Model ID | All providers |
dtype |
string | 🔧 Quantization level (e.g., "q8") | All providers |
stream |
boolean | 📡 Whether to stream the audio response. Default is false . |
All providers |
🤖 Available Models
Model | Quantization | Description |
---|---|---|
onnx-community/Kokoro-82M-ONNX |
q8 |
High-quality English TTS with multiple voices |
onnx-community/Kokoro-82M-v1.0-ONNX |
q8 |
Alternative Kokoro model version |
# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test using Kokoro voices." \
-F "model=onnx-community/Kokoro-82M-ONNX" \
-F "voice=af_nova" \
-F "dtype=q8" \
-F "stream=false"
# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
-F "input=Hello, this is a test using Kokoro voices with streaming." \
-F "model=onnx-community/Kokoro-82M-ONNX" \
-F "voice=af_nova" \
-F "dtype=q8" \
-F "stream=true"
Parameter | Type | Description | Required For |
---|---|---|---|
model |
string | 🤖 Model ID | Required |
dtype |
string | 🔧 Quantization level (e.g., "q8") | Required |
voice |
string | 🎭 Voice ID (see below) | Required |
stream |
boolean | 📡 Whether to stream the audio response. Default is false . |
Optional |
🎭 Available Voice Options
🇺🇸 American Voices
- 👩 Female:
af_heart
,af_alloy
,af_aoede
,af_bella
,af_jessica
,af_nova
,af_sarah
- 👨 Male:
am_adam
,am_echo
,am_eric
,am_liam
,am_michael
,am_onyx
🇬🇧 British Voices
- 👩 Female:
bf_emma
,bf_isabella
,bf_alice
,bf_lily
- 👨 Male:
bm_george
,bm_lewis
,bm_daniel
,bm_fable
Translate between 200+ languages
Model | Quantization | Description |
---|---|---|
Xenova/nllb-200-distilled-600M |
q8 |
🌍 Multilingual translation model supporting 200+ languages |
curl -X POST http://localhost:9002/api/v1/translation \
-F "input=Hello, how are you today?" \
-F "model=Xenova/nllb-200-distilled-600M" \
-F "dtype=q8" \
-F "srcLang=eng_Latn" \
-F "tgtLang=por_Latn"
Uses FLORES200 format - supports 200+ languages!
Parameter | Type | Description |
---|---|---|
model |
string | 🤖 Model ID (e.g., "Xenova/nllb-200-distilled-600M") |
dtype |
string | 🔧 Quantization level (e.g., "q8") |
srcLang |
string | 🌍 Source language code in FLORES200 format (e.g., "eng_Latn") |
tgtLang |
string | 🌍 Target language code in FLORES200 format (e.g., "por_Latn") |
git clone https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up --build -d
🔧 Service | 🚪 Port | 🔗 URL |
---|---|---|
🔌 WebSocket | 9003 | localhost:9003 |
🌐 API Server | 9002 | localhost:9002 |
👥 Client Demo | 9000 | localhost:9000 |
We welcome contributions! Here's how you can help:
- 🐛 Report bugs via GitHub Issues
- 💡 Suggest features in our Discord
- 🔧 Submit PRs for improvements
- 📖 Improve documentation
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.
Made with ❤️ by the Woolball team