|
| 1 | +# Realtime Twilio Integration |
| 2 | + |
| 3 | +This example demonstrates how to connect the OpenAI Realtime API to a phone call using Twilio's Media Streams. The server handles incoming phone calls and streams audio between Twilio and the OpenAI Realtime API, enabling real-time voice conversations with an AI agent over the phone. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +- Python 3.9+ |
| 8 | +- OpenAI API key with Realtime API access |
| 9 | +- Twilio account with a phone number |
| 10 | +- A tunneling service like ngrok to expose your local server |
| 11 | + |
| 12 | +## Setup |
| 13 | + |
| 14 | +1. **Start the server:** |
| 15 | + |
| 16 | + ```bash |
| 17 | + uv run server.py |
| 18 | + ``` |
| 19 | + |
| 20 | + The server will start on port 8000 by default. |
| 21 | + |
| 22 | +2. **Expose the server publicly, e.g. via ngrok:** |
| 23 | + |
| 24 | + ```bash |
| 25 | + ngrok http 8000 |
| 26 | + ``` |
| 27 | + |
| 28 | + Note the public URL (e.g., `https://abc123.ngrok.io`) |
| 29 | + |
| 30 | +3. **Configure your Twilio phone number:** |
| 31 | + - Log into your Twilio Console |
| 32 | + - Select your phone number |
| 33 | + - Set the webhook URL for incoming calls to: `https://your-ngrok-url.ngrok.io/incoming-call` |
| 34 | + - Set the HTTP method to POST |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +1. Call your Twilio phone number |
| 39 | +2. You'll hear: "Hello! You're now connected to an AI assistant. You can start talking!" |
| 40 | +3. Start speaking - the AI will respond in real-time |
| 41 | +4. The assistant has access to tools like weather information and current time |
| 42 | +
|
| 43 | +## How It Works |
| 44 | +
|
| 45 | +1. **Incoming Call**: When someone calls your Twilio number, Twilio makes a request to `/incoming-call` |
| 46 | +2. **TwiML Response**: The server returns TwiML that: |
| 47 | + - Plays a greeting message |
| 48 | + - Connects the call to a WebSocket stream at `/media-stream` |
| 49 | +3. **WebSocket Connection**: Twilio establishes a WebSocket connection for bidirectional audio streaming |
| 50 | +4. **Transport Layer**: The `TwilioRealtimeTransportLayer` class owns the WebSocket message handling: |
| 51 | + - Takes ownership of the Twilio WebSocket after initial handshake |
| 52 | + - Runs its own message loop to process all Twilio messages |
| 53 | + - Handles protocol differences between Twilio and OpenAI |
| 54 | + - Automatically sets G.711 μ-law audio format for Twilio compatibility |
| 55 | + - Manages audio chunk tracking for interruption support |
| 56 | + - Wraps the OpenAI realtime model instead of subclassing it |
| 57 | +5. **Audio Processing**: |
| 58 | + - Audio from the caller is base64 decoded and sent to OpenAI Realtime API |
| 59 | + - Audio responses from OpenAI are base64 encoded and sent back to Twilio |
| 60 | + - Twilio plays the audio to the caller |
| 61 | +
|
| 62 | +## Configuration |
| 63 | +
|
| 64 | +- **Port**: Set `PORT` environment variable (default: 8000) |
| 65 | +- **OpenAI API Key**: Set `OPENAI_API_KEY` environment variable |
| 66 | +- **Agent Instructions**: Modify the `RealtimeAgent` configuration in `server.py` |
| 67 | +- **Tools**: Add or modify function tools in `server.py` |
| 68 | +
|
| 69 | +## Troubleshooting |
| 70 | +
|
| 71 | +- **WebSocket connection issues**: Ensure your ngrok URL is correct and publicly accessible |
| 72 | +- **Audio quality**: Twilio streams audio in mulaw format at 8kHz, which may affect quality |
| 73 | +- **Latency**: Network latency between Twilio, your server, and OpenAI affects response time |
| 74 | +- **Logs**: Check the console output for detailed connection and error logs |
| 75 | +
|
| 76 | +## Architecture |
| 77 | +
|
| 78 | +``` |
| 79 | +Phone Call → Twilio → WebSocket → TwilioRealtimeTransportLayer → OpenAI Realtime API |
| 80 | + ↓ |
| 81 | + RealtimeAgent with Tools |
| 82 | + ↓ |
| 83 | + Audio Response → Twilio → Phone Call |
| 84 | +``` |
| 85 | +
|
| 86 | +The `TwilioRealtimeTransportLayer` acts as a bridge between Twilio's Media Streams and OpenAI's Realtime API, handling the protocol differences and audio format conversions. It wraps the OpenAI realtime model to provide a clean interface for Twilio integration. |
0 commit comments