Skip to content

Commit caee157

Browse files
committed
Realtime: twilio example
1 parent d930dc4 commit caee157

File tree

7 files changed

+436
-11
lines changed

7 files changed

+436
-11
lines changed

docs/scripts/generate_ref_files.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def md_target(py_path: Path) -> Path:
3131
rel = py_path.relative_to(SRC_ROOT).with_suffix(".md")
3232
return DOCS_ROOT / rel
3333

34+
3435
def pretty_title(last_segment: str) -> str:
3536
"""
3637
Convert a module/file segment like 'tool_context' to 'Tool Context'.
@@ -39,6 +40,7 @@ def pretty_title(last_segment: str) -> str:
3940
cleaned = last_segment.replace("_", " ").replace("-", " ")
4041
return capwords(cleaned)
4142

43+
4244
# ---- Main ------------------------------------------------------------
4345

4446

examples/realtime/twilio/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Realtime Twilio Integration
2+
3+
This example demonstrates how to connect the OpenAI Realtime API to a phone call using Twilio's Media Streams. The server handles incoming phone calls and streams audio between Twilio and the OpenAI Realtime API, enabling real-time voice conversations with an AI agent over the phone.
4+
5+
## Prerequisites
6+
7+
- Python 3.9+
8+
- OpenAI API key with Realtime API access
9+
- Twilio account with a phone number
10+
- A tunneling service like ngrok to expose your local server
11+
12+
## Setup
13+
14+
1. **Start the server:**
15+
16+
```bash
17+
uv run server.py
18+
```
19+
20+
The server will start on port 8000 by default.
21+
22+
2. **Expose the server publicly, e.g. via ngrok:**
23+
24+
```bash
25+
ngrok http 8000
26+
```
27+
28+
Note the public URL (e.g., `https://abc123.ngrok.io`)
29+
30+
3. **Configure your Twilio phone number:**
31+
- Log into your Twilio Console
32+
- Select your phone number
33+
- Set the webhook URL for incoming calls to: `https://your-ngrok-url.ngrok.io/incoming-call`
34+
- Set the HTTP method to POST
35+
36+
## Usage
37+
38+
1. Call your Twilio phone number
39+
2. You'll hear: "Hello! You're now connected to an AI assistant. You can start talking!"
40+
3. Start speaking - the AI will respond in real-time
41+
4. The assistant has access to tools like weather information and current time
42+
43+
## How It Works
44+
45+
1. **Incoming Call**: When someone calls your Twilio number, Twilio makes a request to `/incoming-call`
46+
2. **TwiML Response**: The server returns TwiML that:
47+
- Plays a greeting message
48+
- Connects the call to a WebSocket stream at `/media-stream`
49+
3. **WebSocket Connection**: Twilio establishes a WebSocket connection for bidirectional audio streaming
50+
4. **Transport Layer**: The `TwilioRealtimeTransportLayer` class owns the WebSocket message handling:
51+
- Takes ownership of the Twilio WebSocket after initial handshake
52+
- Runs its own message loop to process all Twilio messages
53+
- Handles protocol differences between Twilio and OpenAI
54+
- Automatically sets G.711 μ-law audio format for Twilio compatibility
55+
- Manages audio chunk tracking for interruption support
56+
- Wraps the OpenAI realtime model instead of subclassing it
57+
5. **Audio Processing**:
58+
- Audio from the caller is base64 decoded and sent to OpenAI Realtime API
59+
- Audio responses from OpenAI are base64 encoded and sent back to Twilio
60+
- Twilio plays the audio to the caller
61+
62+
## Configuration
63+
64+
- **Port**: Set `PORT` environment variable (default: 8000)
65+
- **OpenAI API Key**: Set `OPENAI_API_KEY` environment variable
66+
- **Agent Instructions**: Modify the `RealtimeAgent` configuration in `server.py`
67+
- **Tools**: Add or modify function tools in `server.py`
68+
69+
## Troubleshooting
70+
71+
- **WebSocket connection issues**: Ensure your ngrok URL is correct and publicly accessible
72+
- **Audio quality**: Twilio streams audio in mulaw format at 8kHz, which may affect quality
73+
- **Latency**: Network latency between Twilio, your server, and OpenAI affects response time
74+
- **Logs**: Check the console output for detailed connection and error logs
75+
76+
## Architecture
77+
78+
```
79+
Phone Call → Twilio → WebSocket → TwilioRealtimeTransportLayer → OpenAI Realtime API
80+
81+
RealtimeAgent with Tools
82+
83+
Audio Response → Twilio → Phone Call
84+
```
85+
86+
The `TwilioRealtimeTransportLayer` acts as a bridge between Twilio's Media Streams and OpenAI's Realtime API, handling the protocol differences and audio format conversions. It wraps the OpenAI realtime model to provide a clean interface for Twilio integration.

examples/realtime/twilio/__init__.py

Whitespace-only changes.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
fastapi
2+
uvicorn[standard]
3+
websockets
4+
python-dotenv

examples/realtime/twilio/server.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
import os
2+
3+
from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
4+
from fastapi.responses import PlainTextResponse
5+
from twilio_handler import TwilioHandler
6+
7+
8+
class TwilioWebSocketManager:
9+
def __init__(self):
10+
self.active_handlers: dict[str, TwilioHandler] = {}
11+
12+
async def new_session(self, websocket: WebSocket) -> TwilioHandler:
13+
"""Create and configure a new session."""
14+
print("Creating twilio handler")
15+
16+
handler = TwilioHandler(websocket)
17+
return handler
18+
19+
# In a real app, you'd also want to clean up/close the handler when the call ends
20+
21+
22+
manager = TwilioWebSocketManager()
23+
app = FastAPI()
24+
25+
26+
@app.get("/")
27+
async def root():
28+
return {"message": "Twilio Media Stream Server is running!"}
29+
30+
31+
@app.post("/incoming-call")
32+
@app.get("/incoming-call")
33+
async def incoming_call(request: Request):
34+
"""Handle incoming Twilio phone calls"""
35+
host = request.headers.get("Host")
36+
37+
twiml_response = f"""<?xml version="1.0" encoding="UTF-8"?>
38+
<Response>
39+
<Say>Hello! You're now connected to an AI assistant. You can start talking!</Say>
40+
<Connect>
41+
<Stream url="wss://{host}/media-stream" />
42+
</Connect>
43+
</Response>"""
44+
return PlainTextResponse(content=twiml_response, media_type="text/xml")
45+
46+
47+
@app.websocket("/media-stream")
48+
async def media_stream_endpoint(websocket: WebSocket):
49+
"""WebSocket endpoint for Twilio Media Streams"""
50+
51+
try:
52+
handler = await manager.new_session(websocket)
53+
await handler.start()
54+
55+
await handler.wait_until_done()
56+
57+
except WebSocketDisconnect:
58+
print("WebSocket disconnected")
59+
except Exception as e:
60+
print(f"WebSocket error: {e}")
61+
62+
63+
if __name__ == "__main__":
64+
import uvicorn
65+
66+
port = int(os.getenv("PORT", 8000))
67+
uvicorn.run(app, host="0.0.0.0", port=port)

0 commit comments

Comments
 (0)