Release Notes: v0.2.0¶
Release Date: 2026-02-14
Overview¶
vac v0.2.0 introduces automatic subtitle generation using speech-to-text, OmniVoice integration for unified TTS/STT provider abstraction, and dictionary-based case correction for professional subtitle output.
Highlights¶
- Automatic Subtitle Generation - Generate SRT/VTT subtitle files from audio using Deepgram STT
- OmniVoice Integration - Unified interface for TTS and STT providers
- Dictionary-based Case Correction - Proper capitalization of tech terms and proper nouns in subtitles
New Features¶
Subtitle Generation¶
- New
vac subtitlecommand for generating subtitles from audio - Supports SRT and WebVTT output formats
- Word-level timing accuracy via Deepgram speech-to-text
- Auto-detects language from audio manifest
# Generate subtitles from audio
vac subtitle --audio audio/en-US/
# Output: subtitles/en-US.srt, subtitles/en-US.vtt
OmniVoice Provider Abstraction¶
- Unified TTS provider interface via OmniVoice
- Unified STT provider interface for subtitle generation
- Tested with ElevenLabs (TTS) and Deepgram (STT)
- Easy to add additional providers in the future
Dictionary-based Case Correction¶
- Built-in dictionary with 200+ tech terms (AI, API, GitHub, Claude, etc.)
- Custom dictionary support via JSON files
- Ensures proper capitalization in auto-generated subtitles
Subtitle Embedding¶
- Embed soft subtitles (toggleable by viewer)
- Burn hard subtitles (permanent in video)
- Path sanitization for safe ffmpeg execution
New CLI Commands¶
vac subtitle¶
vac subtitle [flags]
Flags:
-a, --audio string Audio directory containing manifest.json (required)
-o, --output string Output directory for subtitle files (default "subtitles")
-l, --lang string Language code (auto-detected from manifest if not specified)
--provider string STT provider: deepgram or elevenlabs (default: deepgram)
--individual Also generate individual subtitle files per slide
vac stt¶
Complete Workflow¶
With v0.2.0, the complete workflow from Marp presentation to video with subtitles is:
# Set API keys
export ELEVENLABS_API_KEY="your-key"
export DEEPGRAM_API_KEY="your-key"
# 1. Generate audio (TTS)
vac tts --input slides.md --output audio/en-US/
# 2. Generate video
vac video --input slides.md --manifest audio/en-US/manifest.json --output video/presentation.mp4
# 3. Generate subtitles (STT)
vac subtitle --audio audio/en-US/
# 4. Embed subtitles
ffmpeg -i video/presentation.mp4 -i subtitles/en-US.srt \
-c:v copy -c:a copy -c:s mov_text \
-metadata:s:s:0 language=eng \
video/presentation_with_subs.mp4
See the Complete Workflow Guide for detailed instructions.
Dependencies¶
- Added
github.com/plexusone/omnivoicev0.4.1 for unified TTS/STT interface - Added
github.com/plexusone/omnivoice-deepgramv0.3.0 for Deepgram STT support - Bumped
github.com/grokify/mogoto v0.73.2 forSanitizePathsupport
Installation¶
New Prerequisites¶
- Deepgram API key (for subtitle generation): Sign up at Deepgram
Documentation¶
- Added Complete Workflow Guide - End-to-end tutorial
- Added Subtitle Generation Guide - Detailed subtitle documentation
- Updated README with OmniVoice and Deepgram information
Breaking Changes¶
None. This release is fully backwards-compatible with v0.1.0.
What's Next¶
- Burned-in subtitle styling options
- Karaoke-style word highlighting
- Word-by-word reveal captions for social media
- Additional TTS/STT provider support via OmniVoice