Skip to content

vac vs marptalk

Two tools solving the same problem: converting Marp markdown presentations into narrated videos with AI-generated voiceovers. Both emerged from the same insight—that presentations are more effective when you can see slides and hear narration together—but took different paths to get there.

Origins

marptalk

Jason Hall created marptalk after observing that short presentations with spoken narration and visual elements are an effective way to learn, but "not every bit of information on the internet is presented the way I prefer." His solution: a Node.js tool that takes Marp presentations with embedded speaker notes, generates audio via Google Cloud TTS, and stitches everything into a video with subtitles using Puppeteer and ffmpeg. The result processes in about a minute at minimal cost. Hall also added browser-based TTS fallback using the Web Speech API, enabling rapid iteration without any API costs during development.

See Jason's original LinkedIn post introducing marptalk.

vac

vac grew from production needs at PlexusOne, which publishes many Marp presentations and wanted to turn them into narrated videos. The project also served as a way to exercise the OmniVoice libraries for multi-language and multi-provider TTS/STT workflows. Rather than being locked into a single provider, vac uses OmniVoice as a unified abstraction layer—allowing different providers for different languages or slides. A Chinese slide might use Deepgram while English slides use ElevenLabs, all in the same video. The tool also generates subtitles from actual audio transcription (STT) rather than estimating from word count, producing word-level timestamps and proper capitalization via dictionary-based correction.

Philosophy

The projects reflect different design philosophies:

marptalk optimizes for rapid iteration and accessibility. Browser TTS fallback means you can preview presentations instantly without API keys. YouTube chapter markers are auto-generated. LLM-assisted drafting via GitHub Issues lets you generate first drafts from a topic description. It's designed for quick experimentation.

vac optimizes for production workflows and flexibility. The decoupled architecture separates audio generation from video creation. JSON transcripts support per-slide voice overrides and multi-language content in a single file. Individual slide export targets platforms like Udemy. It's designed for complex, repeatable pipelines.

Both tools demonstrate that with modern AI voice services, the gap between "slides with speaker notes" and "polished video content" can be bridged automatically.


Feature Comparison

Overview

Aspect vac marptalk
Language Go Node.js
License MIT Apache-2.0
CLI Framework Cobra Commander
Primary TTS ElevenLabs (via OmniVoice) Google Cloud TTS
Primary STT Deepgram (via OmniVoice) N/A (duration-based timing)

TTS Provider Support

Feature vac marptalk
ElevenLabs ✅ Primary ❌
Google Cloud TTS ❌ ✅ Primary
Deepgram TTS ✅ Secondary ❌
Browser TTS Fallback ❌ ✅ (Web Speech API)
Provider Abstraction ✅ OmniVoice ❌ Single provider
Voice Cloning ✅ (ElevenLabs) ❌

Multi-Language Support

Feature vac marptalk
Multi-language transcripts ✅ JSON with per-slide locales ❌ Single language per run
Locale codes BCP-47 (en-US, zh-Hans, etc.) Language codes (en-US, es-ES)
Per-slide voice override ✅ ❌
Mixed TTS providers per video ✅ (e.g., ElevenLabs + Deepgram) ❌

Voiceover Input Format

Feature vac marptalk
Inline HTML comments ✅ <!-- voiceover text --> ✅ <!-- speaker notes -->
JSON transcript ✅ Structured, multi-language ❌
Pause directives ✅ [PAUSE:1000] ❌
Per-segment voice settings ✅ ❌

Subtitle Generation

Feature vac marptalk
SRT output ✅ ✅
VTT output ✅ ❌
Generation method STT transcription (actual audio) Word count estimation (150 wpm)
Word-level timestamps ✅ ❌
Dictionary case correction ✅ (tech terms, custom JSON) ❌
YouTube chapters ❌ ✅

Video Generation

Feature vac marptalk
Method Image-based (Marp PNG export) Static slide screenshots
Audio sync Manifest-based (actual durations) Audio file duration analysis
Soft subtitles ✅ ✅
Hard subtitles (burned-in) ✅ ✅
Crossfade transitions ✅ --transition ❌
Individual slide export ✅ (Udemy-ready) ❌
Mixed audio sample rates ✅ (filter_complex concat) N/A (single provider)

Workflow

Feature vac marptalk
Decoupled TTS/Video ✅ Separate tts and video commands ✅ --generate-tts flag
Audio manifest ✅ JSON with timing info ❌
Resume/skip existing ✅ --force flag ✅ --no-generate-tts
Debug mode ✅ MARP2VIDEO_DEBUG=1 ✅ DEBUG=1

Unique Features

vac only

  • OmniVoice provider abstraction - Swap TTS/STT providers without code changes
  • Mixed TTS providers - Use ElevenLabs for some slides, Deepgram for others
  • STT-based subtitles - Word-level timestamps from actual audio transcription
  • Dictionary case correction - Fix capitalization of tech terms in subtitles
  • JSON transcripts - Per-slide language and voice overrides
  • Crossfade transitions - Smooth transitions between slides
  • Individual slide export - Ready for Udemy course uploads

marptalk only

  • Browser TTS fallback - No API costs for testing (Web Speech API)
  • YouTube chapter markers - Auto-generated chapter timestamps
  • LLM-assisted drafting - Generate presentations via GitHub Issues
  • Self-playing HTML - Presentation with playback controls
  • Zero API key iteration - Develop without any API credentials

Architecture Comparison

vac/
├── cmd/vac/
│   ├── tts.go
│   ├── video.go
│   └── subtitle.go
├── pkg/
│   ├── parser/
│   ├── transcript/
│   ├── omnivoice/
│   ├── video/
│   └── orchestrator/
└── go.mod
marptalk/
├── src/
│   ├── generate.js (main)
│   ├── extract-notes.js
│   ├── generate-audio.js
│   ├── generate-html.js
│   ├── generate-subtitles.js
│   └── generate-video.js
└── package.json

When to Use Which

Use Case Recommended
Quick prototyping without API costs marptalk (browser TTS fallback)
High-quality production voices vac (ElevenLabs)
Multi-language presentations vac (JSON transcripts)
YouTube publishing with chapters marptalk
Udemy course creation vac (individual slide export)
Mixed TTS providers vac (OmniVoice)
Google Cloud ecosystem marptalk
Voice cloning vac (ElevenLabs)
LLM-assisted content creation marptalk (GitHub Issues workflow)

Summary

vac is more feature-rich for production workflows with its provider abstraction layer (OmniVoice), multi-language support, and STT-based subtitle generation. It handles complex scenarios like mixed audio sample rates from different TTS providers.

marptalk excels at rapid iteration with its browser TTS fallback and YouTube-focused features (chapter markers). Its LLM-assisted drafting workflow via GitHub Issues is innovative for content creation.