Skip to content

Commit ecf4a00

Browse files
committed
Realtime: twilio example
1 parent 5346d63 commit ecf4a00

File tree

6 files changed

+436
-0
lines changed

6 files changed

+436
-0
lines changed

docs/scripts/generate_ref_files.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def md_target(py_path: Path) -> Path:
3131
rel = py_path.relative_to(SRC_ROOT).with_suffix(".md")
3232
return DOCS_ROOT / rel
3333

34+
3435
def pretty_title(last_segment: str) -> str:
3536
"""
3637
Convert a module/file segment like 'tool_context' to 'Tool Context'.
@@ -39,6 +40,7 @@ def pretty_title(last_segment: str) -> str:
3940
cleaned = last_segment.replace("_", " ").replace("-", " ")
4041
return capwords(cleaned)
4142

43+
4244
# ---- Main ------------------------------------------------------------
4345

4446

examples/realtime/twilio/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Realtime Twilio Integration
2+
3+
This example demonstrates how to connect the OpenAI Realtime API to a phone call using Twilio's Media Streams. The server handles incoming phone calls and streams audio between Twilio and the OpenAI Realtime API, enabling real-time voice conversations with an AI agent over the phone.
4+
5+
## Prerequisites
6+
7+
- Python 3.9+
8+
- OpenAI API key with Realtime API access
9+
- Twilio account with a phone number
10+
- A tunneling service like ngrok to expose your local server
11+
12+
## Setup
13+
14+
1. **Start the server:**
15+
16+
```bash
17+
uv run server.py
18+
```
19+
20+
The server will start on port 8000 by default.
21+
22+
2. **Expose the server publicly, e.g. via ngrok:**
23+
24+
```bash
25+
ngrok http 8000
26+
```
27+
28+
Note the public URL (e.g., `https://abc123.ngrok.io`)
29+
30+
3. **Configure your Twilio phone number:**
31+
- Log into your Twilio Console
32+
- Select your phone number
33+
- Set the webhook URL for incoming calls to: `https://your-ngrok-url.ngrok.io/incoming-call`
34+
- Set the HTTP method to POST
35+
36+
## Usage
37+
38+
1. Call your Twilio phone number
39+
2. You'll hear: "Hello! You're now connected to an AI assistant. You can start talking!"
40+
3. Start speaking - the AI will respond in real-time
41+
4. The assistant has access to tools like weather information and current time
42+
43+
## How It Works
44+
45+
1. **Incoming Call**: When someone calls your Twilio number, Twilio makes a request to `/incoming-call`
46+
2. **TwiML Response**: The server returns TwiML that:
47+
- Plays a greeting message
48+
- Connects the call to a WebSocket stream at `/media-stream`
49+
3. **WebSocket Connection**: Twilio establishes a WebSocket connection for bidirectional audio streaming
50+
4. **Transport Layer**: The `TwilioRealtimeTransportLayer` class owns the WebSocket message handling:
51+
- Takes ownership of the Twilio WebSocket after initial handshake
52+
- Runs its own message loop to process all Twilio messages
53+
- Handles protocol differences between Twilio and OpenAI
54+
- Automatically sets G.711 μ-law audio format for Twilio compatibility
55+
- Manages audio chunk tracking for interruption support
56+
- Wraps the OpenAI realtime model instead of subclassing it
57+
5. **Audio Processing**:
58+
- Audio from the caller is base64 decoded and sent to OpenAI Realtime API
59+
- Audio responses from OpenAI are base64 encoded and sent back to Twilio
60+
- Twilio plays the audio to the caller
61+
62+
## Configuration
63+
64+
- **Port**: Set `PORT` environment variable (default: 8000)
65+
- **OpenAI API Key**: Set `OPENAI_API_KEY` environment variable
66+
- **Agent Instructions**: Modify the `RealtimeAgent` configuration in `server.py`
67+
- **Tools**: Add or modify function tools in `server.py`
68+
69+
## Troubleshooting
70+
71+
- **WebSocket connection issues**: Ensure your ngrok URL is correct and publicly accessible
72+
- **Audio quality**: Twilio streams audio in mulaw format at 8kHz, which may affect quality
73+
- **Latency**: Network latency between Twilio, your server, and OpenAI affects response time
74+
- **Logs**: Check the console output for detailed connection and error logs
75+
76+
## Architecture
77+
78+
```
79+
Phone Call → Twilio → WebSocket → TwilioRealtimeTransportLayer → OpenAI Realtime API
80+
81+
RealtimeAgent with Tools
82+
83+
Audio Response → Twilio → Phone Call
84+
```
85+
86+
The `TwilioRealtimeTransportLayer` acts as a bridge between Twilio's Media Streams and OpenAI's Realtime API, handling the protocol differences and audio format conversions. It wraps the OpenAI realtime model to provide a clean interface for Twilio integration.

examples/realtime/twilio/__init__.py

Whitespace-only changes.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
fastapi
2+
uvicorn[standard]
3+
websockets
4+
python-dotenv

examples/realtime/twilio/server.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
import os
2+
from typing import TYPE_CHECKING
3+
4+
from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
5+
from fastapi.responses import PlainTextResponse
6+
7+
# Import TwilioHandler class - handle both module and package use cases
8+
if TYPE_CHECKING:
9+
# For type checking, use the relative import
10+
from .twilio_handler import TwilioHandler
11+
else:
12+
# At runtime, try both import styles
13+
try:
14+
# Try relative import first (when used as a package)
15+
from .twilio_handler import TwilioHandler
16+
except ImportError:
17+
# Fall back to direct import (when run as a script)
18+
from twilio_handler import TwilioHandler
19+
20+
21+
class TwilioWebSocketManager:
22+
def __init__(self):
23+
self.active_handlers: dict[str, TwilioHandler] = {}
24+
25+
async def new_session(self, websocket: WebSocket) -> TwilioHandler:
26+
"""Create and configure a new session."""
27+
print("Creating twilio handler")
28+
29+
handler = TwilioHandler(websocket)
30+
return handler
31+
32+
# In a real app, you'd also want to clean up/close the handler when the call ends
33+
34+
35+
manager = TwilioWebSocketManager()
36+
app = FastAPI()
37+
38+
39+
@app.get("/")
40+
async def root():
41+
return {"message": "Twilio Media Stream Server is running!"}
42+
43+
44+
@app.post("/incoming-call")
45+
@app.get("/incoming-call")
46+
async def incoming_call(request: Request):
47+
"""Handle incoming Twilio phone calls"""
48+
host = request.headers.get("Host")
49+
50+
twiml_response = f"""<?xml version="1.0" encoding="UTF-8"?>
51+
<Response>
52+
<Say>Hello! You're now connected to an AI assistant. You can start talking!</Say>
53+
<Connect>
54+
<Stream url="wss://{host}/media-stream" />
55+
</Connect>
56+
</Response>"""
57+
return PlainTextResponse(content=twiml_response, media_type="text/xml")
58+
59+
60+
@app.websocket("/media-stream")
61+
async def media_stream_endpoint(websocket: WebSocket):
62+
"""WebSocket endpoint for Twilio Media Streams"""
63+
64+
try:
65+
handler = await manager.new_session(websocket)
66+
await handler.start()
67+
68+
await handler.wait_until_done()
69+
70+
except WebSocketDisconnect:
71+
print("WebSocket disconnected")
72+
except Exception as e:
73+
print(f"WebSocket error: {e}")
74+
75+
76+
if __name__ == "__main__":
77+
import uvicorn
78+
79+
port = int(os.getenv("PORT", 8000))
80+
uvicorn.run(app, host="0.0.0.0", port=port)

0 commit comments

Comments
 (0)