Setup & Installation
What This Skill Does
Connects a React frontend to a Python FastAPI backend over WebSocket to generate spoken audio from text using Azure OpenAI's GPT Realtime Mini model. Takes a text prompt, streams PCM audio chunks, converts them to WAV, and returns base64-encoded audio for browser playback. Includes transcript output alongside the audio. Handles the full WebSocket session lifecycle, PCM-to-WAV conversion, and base64 transport in one reference implementation, saving the integration work you'd otherwise do by reading scattered Azure docs.
When to use it
- Working with podcast generation functionality
- Implementing podcast generation features
- Debugging podcast generation related issues
