Back to skills

podcast-generation

ai-tools

Connects a React frontend to a Python FastAPI backend over WebSocket to generate spoken audio from text using Azure OpenAI's GPT Realtime Mini model. Takes a text prompt, streams PCM audio chunks, con

Setup & Installation

npx skills add https://github.com/microsoft/podcast-generation --skill podcast-generation
or paste the link and ask your coding assistant to install it
https://github.com/microsoft/podcast-generation
View on GitHub

What This Skill Does

Connects a React frontend to a Python FastAPI backend over WebSocket to generate spoken audio from text using Azure OpenAI's GPT Realtime Mini model. Takes a text prompt, streams PCM audio chunks, converts them to WAV, and returns base64-encoded audio for browser playback. Includes transcript output alongside the audio. Handles the full WebSocket session lifecycle, PCM-to-WAV conversion, and base64 transport in one reference implementation, saving the integration work you'd otherwise do by reading scattered Azure docs.

When to use it

  • Working with podcast generation functionality
  • Implementing podcast generation features
  • Debugging podcast generation related issues