Browser Video Recording¶
Record browser-driven demos with AI-generated voiceover. This feature automates browser interactions (navigation, clicks, scrolling) while generating synchronized narration in multiple languages.
Overview¶
The browser-video command is designed for creating product demos, tutorials, and walkthroughs that showcase web applications. Unlike slide-based videos, browser videos capture live browser interactions with voiceover narration.
flowchart LR
A[Config YAML] --> B[Load Steps]
B --> C[Generate TTS]
C --> D[Record Browser]
D --> E[Combine A/V]
E --> F[Video.mp4] Quick Start¶
1. Create a Config File¶
# demo.yaml
metadata:
title: "Product Demo"
defaultLanguage: "en-US"
defaultVoice:
provider: "elevenlabs"
voiceId: "pNInz6obpgDQGcFmaJgB" # Adam voice
segments:
- id: "segment_000"
type: "browser"
browser:
url: "https://example.com"
steps:
- action: "wait"
duration: 1000
voiceover:
en-US: "Welcome to our product demo."
- action: "click"
selector: "#login-button"
voiceover:
en-US: "Click the login button to get started."
2. Set API Keys¶
3. Generate Video¶
Config File Format¶
The config file defines browser segments with steps and voiceovers.
Full Schema¶
metadata:
title: "Demo Title"
defaultLanguage: "en-US"
defaultVoice:
provider: "elevenlabs" # or "deepgram"
voiceId: "voice-id"
model: "eleven_multilingual_v2" # ElevenLabs only
segments:
- id: "segment_001"
type: "browser"
browser:
url: "https://example.com"
viewport:
width: 1920
height: 1080
steps:
- action: "wait"
duration: 2000
voiceover:
en-US: "English narration"
fr-FR: "Narration française"
zh-Hans: "中文旁白"
- action: "click"
selector: "#button"
voiceover:
en-US: "Clicking the button"
- action: "scroll"
scrollY: 500
voiceover:
en-US: "Scrolling down"
- action: "type"
selector: "#input"
text: "Hello world"
voiceover:
en-US: "Typing in the input field"
Supported Actions¶
| Action | Parameters | Description |
|---|---|---|
wait | duration (ms) | Wait for specified duration |
click | selector | Click an element |
scroll | scrollX, scrollY (pixels) | Scroll horizontally and/or vertically |
input | selector, value | Type text into an element |
navigate | url | Navigate to a URL |
screenshot | - | Capture current state |
evaluate | script | Execute JavaScript |
hover | selector | Hover over an element |
keypress | key | Send keyboard input |
Scroll Options¶
| Parameter | Values | Description |
|---|---|---|
scrollX | integer | Horizontal scroll amount (pixels) |
scrollY | integer | Vertical scroll amount (pixels) |
scrollMode | relative (default), absolute | Relative scrolls by delta; absolute scrolls to position |
scrollBehavior | auto (default), smooth | Auto is instant; smooth animates the scroll |
Scroll Animation Limitation
Browser recordings capture 1 frame per step. Smooth scroll animations will appear as jump cuts in the final video. For smoother results, use multiple smaller scroll steps instead of one large scroll.
Multi-Language Support¶
Generate videos in multiple languages with a single command:
How It Works¶
- TTS Generation: Audio is generated for each language
- Timing Calculation: Per-voiceover durations are compared across languages
- Pace to Longest: Each step uses the maximum duration (e.g., French is often longer than English)
- Video Recording: Browser actions are timed to match the longest audio
- Audio Swap: Additional language versions swap in different audio tracks
Output Files¶
demo.mp4 # Primary language (first in --lang list)
demo_fr-FR.mp4 # French version (same video, different audio)
demo_zh-Hans.mp4 # Chinese version
Audio Caching¶
Use --audio-dir to cache TTS audio and avoid repeated API calls:
Cache Structure¶
audio/
├── en-US/
│ ├── segment_000.mp3 # Combined audio for segment
│ ├── segment_000.json # Timing metadata
│ └── segment_000/
│ ├── voiceover_000.mp3 # Individual voiceover audio
│ ├── voiceover_001.mp3
│ └── ...
├── fr-FR/
│ └── ...
└── zh-Hans/
└── ...
How Caching Works¶
- On first run, TTS audio is generated and saved to
--audio-dir - A JSON metadata file stores per-voiceover timing information
- On subsequent runs, existing audio is reused
- If you modify voiceover text, delete the corresponding audio file to regenerate
Subtitle Generation¶
Add subtitles to your videos:
# Simple subtitles from voiceover timing (no STT required)
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles
# Word-level subtitles using speech-to-text
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles-stt
# Burn subtitles into video (permanent, requires FFmpeg with libass)
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn
# Silent video with burned subtitles (no audio track)
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn --no-audio
FFmpeg libass Requirement
The --subtitles-burn flag requires FFmpeg compiled with libass support. See Troubleshooting for installation instructions.
Subtitle Formats¶
| Format | Output | Use Case |
|---|---|---|
| SRT | demo.srt | Most video players, YouTube |
| VTT | demo.vtt | Web browsers, HTML5 video |
Subtitle Options Comparison¶
| Option | Method | Accuracy | API Cost |
|---|---|---|---|
--subtitles | Voiceover timing | Sentence-level | None |
--subtitles-stt | Speech-to-text | Word-level | Deepgram API |
TTS Providers¶
ElevenLabs (Default)¶
High-quality AI voices with emotional range.
vac browser video --config demo.yaml --output demo.mp4 \
--provider elevenlabs \
--voice pNInz6obpgDQGcFmaJgB
Popular voice IDs:
| Voice | ID |
|---|---|
| Adam | pNInz6obpgDQGcFmaJgB |
| Rachel | 21m00Tcm4TlvDq8ikWAM |
| Domi | AZnzlk1XvdvUeBnXmlld |
Deepgram¶
Fast and cost-effective TTS.
vac browser video --config demo.yaml --output demo.mp4 \
--provider deepgram \
--voice aura-asteria-en
Advanced Usage¶
Headless Mode¶
Run without displaying the browser (useful for CI/CD):
Custom Resolution¶
Transitions Between Segments¶
Hardware-Accelerated Encoding¶
Use --fast for hardware-accelerated video encoding (VideoToolbox on macOS):
This significantly reduces encoding time for long videos.
Testing and Debugging¶
When iterating on demos, use --limit and --limit-steps to test partial content:
# Test only the first 2 segments
vac browser video --config demo.yaml --output demo.mp4 --limit 2
# Test only the first 3 browser steps
vac browser video --config demo.yaml --output demo.mp4 --limit-steps 3
# Combine both for fastest iteration
vac browser video --config demo.yaml --output demo.mp4 \
--limit 1 --limit-steps 3
This is useful for:
- Verifying subtitle timing
- Testing TTS voice settings
- Debugging browser automation steps
Step Duration Guidelines¶
When manually setting minDuration for steps, use these guidelines:
| Content Type | Recommended Duration |
|---|---|
| Short phrase (3-5 words) | 2000-3000ms |
| Medium sentence (8-12 words) | 3000-5000ms |
| Long sentence (15+ words) | 5000-8000ms |
| Complex explanation | 8000-12000ms |
Rule of thumb: ~150 words per minute = ~2.5 words per second
For a 10-word sentence:
- Base duration: 10 / 2.5 = 4 seconds = 4000ms
- Add 20% for French: 4800ms
- Add 500ms buffer: 5300ms
Let TTS Drive Timing
In most cases, you don't need to set minDuration manually. The tool automatically calculates timing from TTS audio duration and uses the longest language when generating multi-language videos.
Troubleshooting¶
Browser not opening¶
Ensure Chrome/Chromium is installed. The tool uses Rod for browser automation.
Video timing mismatch¶
If video finishes before audio, check:
- Each step has a
voiceoverwith text - Audio files are being generated correctly
- Try deleting cached audio and regenerating
API errors¶
- Verify API keys are set correctly
- Check account has sufficient credits
- Ensure voice ID is valid for the provider
Long TTS generation time¶
Use --audio-dir to cache audio:
# First run generates audio
vac browser video --config demo.yaml --output demo.mp4 \
--audio-dir ./audio
# Subsequent runs reuse cached audio
vac browser video --config demo.yaml --output demo.mp4 \
--audio-dir ./audio
Subtitle burning fails¶
The --subtitles-burn flag requires FFmpeg compiled with libass support.
Check if your FFmpeg has subtitle support:
If nothing is returned, install FFmpeg with libass:
Verify installation:
Alternative: Use --subtitles without --subtitles-burn to generate a separate .srt file that video players can load.
Subtitle text doesn't cycle properly¶
If subtitle text stays static for long periods, it may be a VFR (variable frame rate) issue. The tool automatically converts VFR to CFR (30fps) before burning subtitles. If you encounter issues:
- Check that your FFmpeg version is up to date
- Try regenerating the video with cached audio
- Use
--limit-steps 5to test a small portion first