Skip to content

woolball-xyz/woolball-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧶 Woolball Server

Online Discord License Docker Quality Gate Status

Deploy to DO

Transform idle browsers into a powerful distributed AI inference network

Your own browser-based inference infrastructure by turning idle browsers into compute nodes

🚀 Quick Start📖 API Reference🛠️ Development💬 Discord


✨ What is Woolball?

Woolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.

🔗 Client side: Available in woolball-client
📋 Roadmap: Check our next steps


🎯 Supported AI Tasks

🔧 Provider 🎯 Task 🤖 Models 📊 Status
Transformers.js 🎤 Speech-to-Text ONNX Models ✅ Ready
Transformers.js 🔊 Text-to-Speech ONNX Models ✅ Ready
Kokoro.js 🔊 Text-to-Speech ONNX Models ✅ Ready
Transformers.js 🌐 Translation ONNX Models ✅ Ready
Transformers.js 📝 Text Generation ONNX Models ✅ Ready
WebLLM 📝 Text Generation MLC Models ✅ Ready
MediaPipe 📝 Text Generation LiteRT Models ✅ Ready

🚀 Quick Start

Get up and running in under 2 minutes:

1️⃣ Clone & Deploy

git clone --branch deploy --single-branch --depth 1 https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up -d

2️⃣ Verify Setup

Open http://localhost:9000 to ensure at least one client node is connected.

3️⃣ Start Using the API

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"user","content":"Hello! Can you explain what Woolball is?"}]' \
  -F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
  -F "provider=mediapipe" \
  -F "maxTokens=200"

☁️ One-Click Deploy to DigitalOcean

Deploy Woolball to DigitalOcean App Platform with a single click:

Deploy to DO

🔧 What gets deployed:

  • 🌐 Woolball Client: Frontend interface accessible via your app URL
  • 🔌 Core API: RESTful API for AI inference jobs (/api route)
  • 🔗 WebSocket Server: Real-time communication with browser nodes (/ws route)
  • ⚙️ Background Service: Job orchestration and node management
  • 📊 Redis Database: Managed Redis instance for caching and queues

🚀 After Deployment:

  1. Your app will be available at https://your-app-name.ondigitalocean.app
  2. API endpoint: https://your-app-name.ondigitalocean.app/api/v1
  3. WebSocket: wss://your-app-name.ondigitalocean.app/ws

📖 API Reference

📖 Text Generation

Generate text with powerful language models

🤗 Transformers.js Provider

🤖 Available Models

Model Quantization Description
HuggingFaceTB/SmolLM2-135M-Instruct fp16 Compact model for basic text generation
HuggingFaceTB/SmolLM2-360M-Instruct q4 Balanced performance and size
Mozilla/Qwen2.5-0.5B-Instruct q4 Efficient model for general tasks
onnx-community/Qwen2.5-Coder-0.5B-Instruct q8 Specialized for code generation

💡 Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
  -F "model=HuggingFaceTB/SmolLM2-135M-Instruct" \
  -F "dtype=fp16" \
  -F "max_new_tokens=250" \
  -F "temperature=0.7" \
  -F "do_sample=true"

⚙️ Parameters

Parameter Type Default Description
model string - 🤖 Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct")
dtype string - 🔧 Quantization level (e.g., "fp16", "q4")
max_length number 20 📏 Maximum length the generated tokens can have (includes input prompt)
max_new_tokens number null 🆕 Maximum number of tokens to generate, ignoring prompt length
min_length number 0 📐 Minimum length of the sequence to be generated (includes input prompt)
min_new_tokens number null 🔢 Minimum numbers of tokens to generate, ignoring prompt length
do_sample boolean false 🎲 Whether to use sampling; use greedy decoding otherwise
num_beams number 1 🔍 Number of beams for beam search. 1 means no beam search
temperature number 1.0 🌡️ Value used to modulate the next token probabilities
top_k number 50 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering
top_p number 1.0 📊 If < 1, only tokens with probabilities adding up to top_p or higher are kept
repetition_penalty number 1.0 🔄 Parameter for repetition penalty. 1.0 means no penalty
no_repeat_ngram_size number 0 🚫 If > 0, all ngrams of that size can only occur once

🤖 WebLLM Provider

🤖 Available Models

Model Description
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC DeepSeek R1 distilled model with reasoning capabilities
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC DeepSeek R1 distilled Llama-based model
SmolLM2-1.7B-Instruct-q4f32_1-MLC Compact instruction-following model
Llama-3.1-8B-Instruct-q4f32_1-MLC Meta's Llama 3.1 8B instruction model
Qwen3-8B-q4f32_1-MLC Alibaba's Qwen3 8B model

💡 Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of Brazil?"}]' \
  -F "model=DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC" \
  -F "provider=webllm" \
  -F "temperature=0.7" \
  -F "top_p=0.95"

⚙️ Parameters

Parameter Type Description
model string 🤖 Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC")
provider string 🔧 Must be set to "webllm" when using WebLLM models
context_window_size number 🪟 Size of the context window for the model
sliding_window_size number 🔄 Size of the sliding window for attention
attention_sink_size number 🎯 Size of the attention sink
repetition_penalty number 🔄 Penalty for repeating tokens
frequency_penalty number 📊 Penalty for token frequency
presence_penalty number 👁️ Penalty for token presence
top_p number 📈 If < 1, only tokens with probabilities adding up to top_p or higher are kept
temperature number 🌡️ Value used to modulate the next token probabilities
bos_token_id number 🏁 Beginning of sequence token ID (optional)

📱 MediaPipe Provider

🤖 Available Models

Model Device Type Description
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task CPU Gemma2 2B model optimized for CPU inference
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin GPU Gemma2 2B model optimized for GPU inference
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task CPU/GPU Gemma3 1B model with INT4 quantization
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task Web Gemma3 4B model optimized for web deployment

💡 Example Usage

curl -X POST http://localhost:9002/api/v1/text-generation \
  -F 'input=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain quantum computing in simple terms."}]' \
  -F "model=https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task" \
  -F "provider=mediapipe" \
  -F "maxTokens=500" \
  -F "temperature=0.7" \
  -F "topK=40" \
  -F "randomSeed=12345"

⚙️ Parameters

Parameter Type Description
model string 🤖 Model ID for MediaPipe LiteRT models on DigitalOcean Spaces
provider string 🔧 Must be set to "mediapipe" when using MediaPipe models
maxTokens number 🔢 Maximum number of tokens to generate
randomSeed number 🎲 Random seed for reproducible results
topK number 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering
temperature number 🌡️ Value used to modulate the next token probabilities

🎤 Speech Recognition

Convert audio to text with Whisper models

🤖 Available Models

Model Quantization Description
onnx-community/whisper-large-v3-turbo_timestamped q4 🎯 High accuracy with timestamps
onnx-community/whisper-small q4 ⚡ Fast processing

💡 Example Usage

# 📁 Local file
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=@/path/to/your/file.mp3" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

# 🔗 URL
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=https://example.com/audio.mp3" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

# 📊 Base64
curl -X POST http://localhost:9002/api/v1/speech-recognition \
  -F "input=data:audio/mp3;base64,YOUR_BASE64_ENCODED_AUDIO" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "dtype=q4" \
  -F "language=en" \
  -F "return_timestamps=true" \
  -F "stream=false"

⚙️ Parameters

Parameter Type Description
model string 🤖 Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped")
dtype string 🔧 Quantization level (e.g., "q4")
return_timestamps boolean | 'word' ⏰ Return timestamps ("word" for word-level). Default is false.
stream boolean 📡 Stream results in real-time. Default is false.
chunk_length_s number 📏 Length of audio chunks to process in seconds. Default is 0 (no chunking).
stride_length_s number 🔄 Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to chunk_length_s / 6.
force_full_sequences boolean 🎯 Whether to force outputting full sequences or not. Default is false.
language string 🌍 Source language (auto-detect if null). Use this to potentially improve performance if the source language is known.
task null | 'transcribe' | 'translate' 🎯 The task to perform. Default is null, meaning it should be auto-detected.
num_frames number 🎬 The number of frames in the input audio.

🔊 Text-to-Speech

Generate natural speech from text

🤗 Transformers.js (MMS Models)

🤖 Available Models

Language Model Flag
English Xenova/mms-tts-eng 🇺🇸
Spanish Xenova/mms-tts-spa 🇪🇸
French Xenova/mms-tts-fra 🇫🇷
German Xenova/mms-tts-deu 🇩🇪
Portuguese Xenova/mms-tts-por 🇵🇹
Russian Xenova/mms-tts-rus 🇷🇺
Arabic Xenova/mms-tts-ara 🇸🇦
Korean Xenova/mms-tts-kor 🇰🇷

💡 Example Usage

# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test for text to speech." \
  -F "model=Xenova/mms-tts-eng" \
  -F "dtype=q8" \
  -F "stream=false"

# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test for streaming text to speech." \
  -F "model=Xenova/mms-tts-eng" \
  -F "dtype=q8" \
  -F "stream=true"

⚙️ Parameters

Parameter Type Description Required For
model string 🤖 Model ID All providers
dtype string 🔧 Quantization level (e.g., "q8") All providers
stream boolean 📡 Whether to stream the audio response. Default is false. All providers

🐱 Kokoro (Premium Voices)

🤖 Available Models

Model Quantization Description
onnx-community/Kokoro-82M-ONNX q8 High-quality English TTS with multiple voices
onnx-community/Kokoro-82M-v1.0-ONNX q8 Alternative Kokoro model version

💡 Example Usage

# Standard request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test using Kokoro voices." \
  -F "model=onnx-community/Kokoro-82M-ONNX" \
  -F "voice=af_nova" \
  -F "dtype=q8" \
  -F "stream=false"

# Streaming request
curl -X POST http://localhost:9002/api/v1/text-to-speech \
  -F "input=Hello, this is a test using Kokoro voices with streaming." \
  -F "model=onnx-community/Kokoro-82M-ONNX" \
  -F "voice=af_nova" \
  -F "dtype=q8" \
  -F "stream=true"

⚙️ Parameters

Parameter Type Description Required For
model string 🤖 Model ID Required
dtype string 🔧 Quantization level (e.g., "q8") Required
voice string 🎭 Voice ID (see below) Required
stream boolean 📡 Whether to stream the audio response. Default is false. Optional

🎭 Available Voice Options

🇺🇸 American Voices

  • 👩 Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_nova, af_sarah
  • 👨 Male: am_adam, am_echo, am_eric, am_liam, am_michael, am_onyx

🇬🇧 British Voices

  • 👩 Female: bf_emma, bf_isabella, bf_alice, bf_lily
  • 👨 Male: bm_george, bm_lewis, bm_daniel, bm_fable

🌐 Translation

Translate between 200+ languages

🤖 Available Models

Model Quantization Description
Xenova/nllb-200-distilled-600M q8 🌍 Multilingual translation model supporting 200+ languages

💡 Example Usage

curl -X POST http://localhost:9002/api/v1/translation \
  -F "input=Hello, how are you today?" \
  -F "model=Xenova/nllb-200-distilled-600M" \
  -F "dtype=q8" \
  -F "srcLang=eng_Latn" \
  -F "tgtLang=por_Latn"

🌍 Language Support

Uses FLORES200 format - supports 200+ languages!

⚙️ Parameters

Parameter Type Description
model string 🤖 Model ID (e.g., "Xenova/nllb-200-distilled-600M")
dtype string 🔧 Quantization level (e.g., "q8")
srcLang string 🌍 Source language code in FLORES200 format (e.g., "eng_Latn")
tgtLang string 🌍 Target language code in FLORES200 format (e.g., "por_Latn")

🛠️ Local Development

🐳 Docker Setup (Recommended)

git clone https://github.com/woolball-xyz/woolball-server.git
cd woolball-server && docker compose up --build -d

🌐 Service Endpoints

🔧 Service 🚪 Port 🔗 URL
🔌 WebSocket 9003 localhost:9003
🌐 API Server 9002 localhost:9002
👥 Client Demo 9000 localhost:9000

🔄 Network Flow

Network Architecture


🤝 Contributing

We welcome contributions! Here's how you can help:

  • 🐛 Report bugs via GitHub Issues
  • 💡 Suggest features in our Discord
  • 🔧 Submit PRs for improvements
  • 📖 Improve documentation

📄 License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.


Made with ❤️ by the Woolball team

🌟 Star us on GitHub💬 Join Discord

About

Your own browser-based inference infrastructure by turning idle browsers into compute nodes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages