← All Sections

LLM Experiments (2023-2024)

In late 2023 I put together a small public repo of scripts playing with OpenAI’s APIs when streaming for text and audio was still poorly documented. The code lives at github.com/ggoonnzzaallo/llm_experiments: three demos covering TTS streaming, vision-plus-narration on video, and chaining streamed chat output into streamed speech.

button.py: A minimal UI with a text box and a button. Whatever you type is sent to OpenAI’s text-to-speech endpoint. You can toggle streamed playback (audio starts before the full clip is generated) versus waiting for a complete file. I wrote it mainly to quantify how much latency streaming saves; the docs mentioned streaming but did not ship a working example, so this was my reference implementation.

narrator.ipynb: A notebook that takes an MP4 from disk and asks an OpenAI vision model to narrate what it sees, frame by frame. With a tuned system prompt it behaves like a rough sports-style commentator; the demo below is straight model output with no manual edits. Greg Brockman, OpenAI co-founder, retweeted it.

streamed_text_plus_streamed_audio.py: End-to-end low-latency speech from chat. The reply from GPT-3.5-Turbo is streamed live; I chunk the partial text on sentence boundaries and pipe each chunk to TTS, which is also streamed back so playback starts before the full utterance exists. That keeps time-to-first-audio roughly around a second even when the reply is long.