CapForge started as a personal itch and turned into something I ship evenings and weekends. No team, no investors, no roadmap from a product manager. Just a tool I want to exist.
In 2024 I was editing a short video for a talk and realized I'd spent more time writing subtitles than actually filming the thing. I tried the cloud services. They were fine. But they also wanted me to upload my raw footage, pay per minute, and still mis-transcribed half the technical vocabulary.
Meanwhile, an open-source model from OpenAI called Whisper had quietly become very good. Good enough to run on a laptop and produce word-level timestamps most paid services didn't expose.
So I started building. First a weekend prototype in Python, then a proper app wrapped in Electron so it could ship as a signed installer with the Python backend embedded. The first release, October 2025, was called subtitle-forge and was rough around every edge. The karaoke preset was the feature I was actually proud of.
People started using it. People asked for things. Six months and a dozen releases later, here we are: 99+ languages, word-level timing, six motion presets, and transparent MOV export. Still a one-person project. Still free.
If that sounds like something you want to use, download it. If it saves you an hour, buy me a coffee. If it crashes, tell me about it. That's the whole deal.
Written down so I don't forget them when a tempting venture-backed offer shows up.
No cloud round-trip. No "optional" upload. Your footage stays on your disk. Full stop. This rule is what makes the app worth shipping.
No tiered pricing, no feature gates, no "pro" version. Donations are gifts, not payments, and they stay optional in every release.
The app doesn't phone home. Ever. Not for crash reports, not for "anonymous usage data", not for version checks. If I want data, I'll ask.
The whole thing lives on GitHub. Fork it, audit it, rip out the bits you like. I ship binaries because I like small downloads, not because anything is hidden.
No installer with checkboxes for bundled offers. No auto-updater that runs in the background. You download a file, you open it, it works.
If a feature helps creators but annoys advertisers, it ships. If the reverse, it doesn't. Every tradeoff lands on the side of the person using it.
CapForge exists because of decades of work by other people. Here are the libraries and models that do the real heavy lifting.
The ASR engine. Whisper transcription plus forced alignment for word-level timestamps, with pyannote-audio handling speaker diarization.
The desktop shell. Wraps the React renderer, spawns the Python backend, and handles OS menus, file dialogs, and auto-updates.
The embedded Python 3.11 backend. FastAPI and uvicorn serve REST and WebSocket endpoints for transcription, edits, and live render progress.
The ML runtime. PyTorch 2.6 on CUDA 12.4 drives faster-whisper and CTranslate2 — the reason transcription isn't agonizingly slow.
Decodes every input format, renders the final video, and handles hardware-accelerated H.264 and ProRes 4444 export. Pillow rasterizes the subtitle frames.
The waveform in the editor. Scrubbable, zoomable, and tightly synced to the canvas timeline and video preview.
Video maker by day, software hobbyist by evening. If you want to chat, my DMs are open on most of these places.