Yak shaving adventures
Automating pptx to mp4 workflow while spending time doing everything else
Today, I'm excited to share an adventure that would make even the most seasoned coder's head spin. It's a story about PowerPoint, MP3s, and the Sphinx documentation - a true odyssey in the world of programming and audio processing. So, buckle up and let's dive into this tech tango!
The Challenge: Whisper, GPT-4, and the Elusive `pptx`
It all started with a seemingly simple task: adding MP3 files to PowerPoint presentations. With Whisper for audio processing and GPT-4 for coding, I was all set. Or so I thought. The `pptx` Python package was crucial, but GPT-4, despite its brilliance, was dropping the ball on detailed instructions.
Snippet 1: Basic `pptx` Usage
what’s the issue with the above? GPT4 doesn’t know a lot about python-pptx and it gives bad coding suggestions, does it mean that I should spend multiple hours trying to understand this library? It would be about 514 pages of text. Maybe not all of it is relevant but also one needs to understand basics. Also pptx is XML type format with all quirks and weird behaviour for a somebody is used to consuming simple json.
The Pivot: Crafting a Custom GPT Agent
Not one to back down, I decided to create a custom GPT agent, fine-tuned to the `pptx` package. But first, I needed all its documentation in one place. And oh boy, was it scattered!
The Breakthrough: Sphinx and the Single HTML File
After some Python scripts and Makefile muddles, I stumbled upon a golden nugget from 2010 by George Notaras. The magic words? `sphinx-build -b singlehtml`.
Voilà! A single HTML file at `zzz/index.html`.
The Victory Lap: The Custom GPT Agent
With this unified document, my custom GPT agent was no longer a pipe dream. It became a reality, offering nuanced, `pptx`-specific guidance.
# Imagine complex slide layouts and dynamic MP3 insertions, all simplified by the custom GPT agent
Below is a screenshot from the documentation showing how to make charts in pptx.
The Moral of the Tech Tale
This journey, my fellow tech enthusiasts, was a classic case of "yak shaving." But it led to a custom GPT agent, a streamlined PowerPoint-MP3 process, and a deep dive into Sphinx.
So, to all you coders and tech lovers out there, remember: sometimes, the path less traveled is paved with the best code!
Stay coding, stay curious.
Actual code for the technological spike:
It doesn’t look very impressive now but it works and it can be refined further.
Yeah, it works :)





