vac vs marptalk¶

Two tools solving the same problem: converting Marp markdown presentations into narrated videos with AI-generated voiceovers. Both emerged from the same insight—that presentations are more effective when you can see slides and hear narration together—but took different paths to get there.

Origins¶

marptalk¶

Jason Hall created marptalk after observing that short presentations with spoken narration and visual elements are an effective way to learn, but "not every bit of information on the internet is presented the way I prefer." His solution: a Node.js tool that takes Marp presentations with embedded speaker notes, generates audio via Google Cloud TTS, and stitches everything into a video with subtitles using Puppeteer and ffmpeg. The result processes in about a minute at minimal cost. Hall also added browser-based TTS fallback using the Web Speech API, enabling rapid iteration without any API costs during development.

See Jason's original LinkedIn post introducing marptalk.

vac¶

vac grew from production needs at PlexusOne, which publishes many Marp presentations and wanted to turn them into narrated videos. The project also served as a way to exercise the OmniVoice libraries for multi-language and multi-provider TTS/STT workflows. Rather than being locked into a single provider, vac uses OmniVoice as a unified abstraction layer—allowing different providers for different languages or slides. A Chinese slide might use Deepgram while English slides use ElevenLabs, all in the same video. The tool also generates subtitles from actual audio transcription (STT) rather than estimating from word count, producing word-level timestamps and proper capitalization via dictionary-based correction.

Philosophy¶

The projects reflect different design philosophies:

marptalk optimizes for rapid iteration and accessibility. Browser TTS fallback means you can preview presentations instantly without API keys. YouTube chapter markers are auto-generated. LLM-assisted drafting via GitHub Issues lets you generate first drafts from a topic description. It's designed for quick experimentation.

vac optimizes for production workflows and flexibility. The decoupled architecture separates audio generation from video creation. JSON transcripts support per-slide voice overrides and multi-language content in a single file. Individual slide export targets platforms like Udemy. It's designed for complex, repeatable pipelines.

Both tools demonstrate that with modern AI voice services, the gap between "slides with speaker notes" and "polished video content" can be bridged automatically.

Feature Comparison¶

Overview¶

Aspect	vac	marptalk
Language	Go	Node.js
License	MIT	Apache-2.0
CLI Framework	Cobra	Commander
Primary TTS	ElevenLabs (via OmniVoice)	Google Cloud TTS
Primary STT	Deepgram (via OmniVoice)	N/A (duration-based timing)

TTS Provider Support¶

Feature	vac	marptalk
ElevenLabs	Primary
Google Cloud TTS		Primary
Deepgram TTS	Secondary
Browser TTS Fallback		(Web Speech API)
Provider Abstraction	OmniVoice	Single provider
Voice Cloning	(ElevenLabs)

Multi-Language Support¶

Feature	vac	marptalk
Multi-language transcripts	JSON with per-slide locales	Single language per run
Locale codes	BCP-47 (en-US, zh-Hans, etc.)	Language codes (en-US, es-ES)
Per-slide voice override
Mixed TTS providers per video	(e.g., ElevenLabs + Deepgram)

Voiceover Input Format¶

Feature	vac	marptalk
Inline HTML comments	`<!-- voiceover text -->`	`<!-- speaker notes -->`
JSON transcript	Structured, multi-language
Pause directives	`[PAUSE:1000]`
Per-segment voice settings

Subtitle Generation¶

Feature	vac	marptalk
SRT output
VTT output
Generation method	STT transcription (actual audio)	Word count estimation (150 wpm)
Word-level timestamps
Dictionary case correction	(tech terms, custom JSON)
YouTube chapters

Video Generation¶

Feature	vac	marptalk
Method	Image-based (Marp PNG export)	Static slide screenshots
Audio sync	Manifest-based (actual durations)	Audio file duration analysis
Soft subtitles
Hard subtitles (burned-in)
Crossfade transitions	`--transition`
Individual slide export	(Udemy-ready)
Mixed audio sample rates	(filter_complex concat)	N/A (single provider)

Workflow¶

Feature	vac	marptalk
Decoupled TTS/Video	Separate `tts` and `video` commands	`--generate-tts` flag
Audio manifest	JSON with timing info
Resume/skip existing	`--force` flag	`--no-generate-tts`
Debug mode	`MARP2VIDEO_DEBUG=1`	`DEBUG=1`

Unique Features¶

vac only¶

OmniVoice provider abstraction - Swap TTS/STT providers without code changes
Mixed TTS providers - Use ElevenLabs for some slides, Deepgram for others
STT-based subtitles - Word-level timestamps from actual audio transcription
Dictionary case correction - Fix capitalization of tech terms in subtitles
JSON transcripts - Per-slide language and voice overrides
Crossfade transitions - Smooth transitions between slides
Individual slide export - Ready for Udemy course uploads

marptalk only¶

Browser TTS fallback - No API costs for testing (Web Speech API)
YouTube chapter markers - Auto-generated chapter timestamps
LLM-assisted drafting - Generate presentations via GitHub Issues
Self-playing HTML - Presentation with playback controls
Zero API key iteration - Develop without any API credentials

Architecture Comparison¶

vac (Go)marptalk (Node.js)

vac/
├── cmd/vac/
│   ├── tts.go
│   ├── video.go
│   └── subtitle.go
├── pkg/
│   ├── parser/
│   ├── transcript/
│   ├── omnivoice/
│   ├── video/
│   └── orchestrator/
└── go.mod

marptalk/
├── src/
│   ├── generate.js (main)
│   ├── extract-notes.js
│   ├── generate-audio.js
│   ├── generate-html.js
│   ├── generate-subtitles.js
│   └── generate-video.js
└── package.json

When to Use Which¶

Use Case	Recommended
Quick prototyping without API costs	marptalk (browser TTS fallback)
High-quality production voices	vac (ElevenLabs)
Multi-language presentations	vac (JSON transcripts)
YouTube publishing with chapters	marptalk
Udemy course creation	vac (individual slide export)
Mixed TTS providers	vac (OmniVoice)
Google Cloud ecosystem	marptalk
Voice cloning	vac (ElevenLabs)
LLM-assisted content creation	marptalk (GitHub Issues workflow)

Summary¶

vac is more feature-rich for production workflows with its provider abstraction layer (OmniVoice), multi-language support, and STT-based subtitle generation. It handles complex scenarios like mixed audio sample rates from different TTS providers.

marptalk excels at rapid iteration with its browser TTS fallback and YouTube-focused features (chapter markers). Its LLM-assisted drafting workflow via GitHub Issues is innovative for content creation.