Voiceover Formats¶

vac supports two formats for defining voiceovers.

Inline Comments (Simple)¶

Add voiceover text directly in your Marp markdown using HTML comments:

---
marp: true
---

# My Slide

Content here.

<!-- This text will be converted to speech
     and played while this slide is shown. -->

---

# Next Slide

<!-- Voiceover for the second slide. -->

Pause Directives¶

Control timing with [PAUSE:milliseconds] directives:

<!-- Welcome to this presentation.
     [PAUSE:1000]
     Let's explore the first topic.
     [PAUSE:500]
     Here we go! -->

Directive	Duration
`[PAUSE:500]`	0.5 seconds
`[PAUSE:1000]`	1 second
`[PAUSE:2000]`	2 seconds

Pause directives are removed from the spoken text automatically.

Pros and Cons¶

Pros	Cons
Simple, all-in-one file	Single language only
Easy to edit	Limited TTS control
No extra files	No voice customization per slide

JSON Transcripts (Advanced)¶

Use a separate transcript.json file for advanced features:

{
  "version": "1.0",
  "metadata": {
    "title": "My Presentation",
    "defaultLanguage": "en-US",
    "defaultVoice": {
      "provider": "elevenlabs",
      "voiceId": "pNInz6obpgDQGcFmaJgB",
      "voiceName": "Adam"
    }
  },
  "slides": [
    {
      "index": 0,
      "transcripts": {
        "en-US": {
          "segments": [
            { "text": "Welcome to this presentation.", "pause": 1000 },
            { "text": "Let's explore the first topic." }
          ]
        },
        "es-ES": {
          "segments": [
            { "text": "Bienvenido a esta presentación.", "pause": 1000 },
            { "text": "Exploremos el primer tema." }
          ]
        }
      }
    }
  ]
}

Usage¶

vac \
  --input slides.md \
  --transcript transcript.json \
  --lang es-ES \
  --output video_spanish.mp4

Features¶

Feature	Description
Multi-language	Multiple languages per slide
Voice per language	Different voices for each language
Segments	Fine-grained control over text chunks
Pause per segment	Precise timing control
SSML hints	Emphasis, prosody, pronunciation
Venue settings	Platform-specific voice tuning

Pros and Cons¶

Pros	Cons
Multi-language support	Separate file to maintain
Full TTS control	More complex structure
Voice per slide/segment	Requires JSON knowledge
SSML support	Must keep in sync with slides

Comparison¶

Feature	Inline Comments	JSON Transcript
Multi-language
Pause control
Voice per slide
SSML hints
Segment-level control
Single file workflow

When to Use Which¶

Inline Comments: Quick prototypes, single-language videos
JSON Transcript: Production content, courses, multi-language