Skip to content

Release Notes: v0.2.0

Release Date: 2026-02-14

Overview

vac v0.2.0 introduces automatic subtitle generation using speech-to-text, OmniVoice integration for unified TTS/STT provider abstraction, and dictionary-based case correction for professional subtitle output.

Highlights

  • Automatic Subtitle Generation - Generate SRT/VTT subtitle files from audio using Deepgram STT
  • OmniVoice Integration - Unified interface for TTS and STT providers
  • Dictionary-based Case Correction - Proper capitalization of tech terms and proper nouns in subtitles

New Features

Subtitle Generation

  • New vac subtitle command for generating subtitles from audio
  • Supports SRT and WebVTT output formats
  • Word-level timing accuracy via Deepgram speech-to-text
  • Auto-detects language from audio manifest
# Generate subtitles from audio
vac subtitle --audio audio/en-US/

# Output: subtitles/en-US.srt, subtitles/en-US.vtt

OmniVoice Provider Abstraction

  • Unified TTS provider interface via OmniVoice
  • Unified STT provider interface for subtitle generation
  • Tested with ElevenLabs (TTS) and Deepgram (STT)
  • Easy to add additional providers in the future

Dictionary-based Case Correction

  • Built-in dictionary with 200+ tech terms (AI, API, GitHub, Claude, etc.)
  • Custom dictionary support via JSON files
  • Ensures proper capitalization in auto-generated subtitles
# Custom dictionary location
~/.config/vac/dictionaries/*.json
./dictionaries/*.json

Subtitle Embedding

  • Embed soft subtitles (toggleable by viewer)
  • Burn hard subtitles (permanent in video)
  • Path sanitization for safe ffmpeg execution

New CLI Commands

vac subtitle

vac subtitle [flags]

Flags:
  -a, --audio string        Audio directory containing manifest.json (required)
  -o, --output string       Output directory for subtitle files (default "subtitles")
  -l, --lang string         Language code (auto-detected from manifest if not specified)
      --provider string     STT provider: deepgram or elevenlabs (default: deepgram)
      --individual          Also generate individual subtitle files per slide

vac stt

vac stt [flags]

Speech-to-text transcription for individual audio files.

Complete Workflow

With v0.2.0, the complete workflow from Marp presentation to video with subtitles is:

# Set API keys
export ELEVENLABS_API_KEY="your-key"
export DEEPGRAM_API_KEY="your-key"

# 1. Generate audio (TTS)
vac tts --input slides.md --output audio/en-US/

# 2. Generate video
vac video --input slides.md --manifest audio/en-US/manifest.json --output video/presentation.mp4

# 3. Generate subtitles (STT)
vac subtitle --audio audio/en-US/

# 4. Embed subtitles
ffmpeg -i video/presentation.mp4 -i subtitles/en-US.srt \
  -c:v copy -c:a copy -c:s mov_text \
  -metadata:s:s:0 language=eng \
  video/presentation_with_subs.mp4

See the Complete Workflow Guide for detailed instructions.

Dependencies

  • Added github.com/plexusone/omnivoice v0.4.1 for unified TTS/STT interface
  • Added github.com/plexusone/omnivoice-deepgram v0.3.0 for Deepgram STT support
  • Bumped github.com/grokify/mogo to v0.73.2 for SanitizePath support

Installation

go install github.com/grokify/videoascode/cmd/vac@v0.2.0

New Prerequisites

  • Deepgram API key (for subtitle generation): Sign up at Deepgram

Documentation

Breaking Changes

None. This release is fully backwards-compatible with v0.1.0.

What's Next

  • Burned-in subtitle styling options
  • Karaoke-style word highlighting
  • Word-by-word reveal captions for social media
  • Additional TTS/STT provider support via OmniVoice