How to Build an AI Video Production Workflow with Claude Code and HeyGen
Learn how to use Claude Code and HeyGen to automate AI video production from script to final render, including voice cloning and motion graphics.
What This Workflow Actually Does (and Why It’s Worth Building)
Producing a polished video used to mean juggling a scriptwriter, a voice actor, a motion graphics designer, and an editor — often across different software and timelines. AI has changed that math significantly. Today, a single engineer or content team can build an AI video production workflow that takes a brief and outputs a finished video with a realistic avatar, professional voiceover, and branded visuals.
This guide walks through exactly how to do that using Claude Code as your reasoning and orchestration layer, and HeyGen as your avatar and video rendering engine. By the end, you’ll have a working pipeline that goes from raw input — a topic, a URL, a product description — to a rendered video file, with minimal human intervention in between.
This isn’t a surface-level overview. We’ll cover the actual API calls, the workflow logic, where things break, and how to harden the pipeline for production use.
Understanding the Two Core Tools
Claude Code
Claude Code is Anthropic’s agentic coding environment. Unlike the standard Claude interface, Claude Code is designed to take on multi-step tasks — writing code, running it, reading file outputs, and iterating — in a loop. You give it a goal, and it figures out the steps.
For video production, Claude Code is useful as the “brain” of your pipeline. It can:
- Write and refine scripts based on a brief or source material
- Generate structured data (scene breakdowns, slide copy, voiceover text) in whatever format downstream tools expect
- Call external APIs programmatically and handle responses
- Make decisions when something fails or needs adjustment
Other agents start typing. Remy starts asking.
Scoping, trade-offs, edge cases — the real work. Before a line of code.
HeyGen
HeyGen is an AI video platform built around avatar-based video generation. You pick an avatar (or clone your own), provide a script, and HeyGen renders a video with the avatar speaking your text in a realistic voice.
The key features relevant to this workflow:
- Avatars — HeyGen has 100+ stock avatars, or you can create a custom avatar from video footage
- Voice cloning — Upload voice samples to generate a synthetic version of a specific voice
- Video API — A REST API that accepts JSON payloads and returns rendered video files
- Templates — Pre-built layouts with motion graphics, lower thirds, and branded elements
HeyGen’s API is well-documented and straightforward to call from Claude Code, which makes it a natural pairing.
Prerequisites Before You Start
Before writing a single line of code, get these pieces in place:
Accounts and API keys:
- A HeyGen account with API access (available on the Creator plan or higher)
- Claude Code installed and configured locally
- An Anthropic API key if you’re running Claude Code in non-interactive scripting mode
Assets:
- Your avatar ID from HeyGen (found in the Avatar Library)
- A voice ID — either from HeyGen’s voice library or a custom cloned voice
- Brand assets if you’re using HeyGen’s template system (logos, hex codes, font choices)
Local setup:
- Node.js or Python, depending on your preference for the scripting layer
curlor an HTTP client library for making API calls- A working directory with a predictable folder structure (more on this below)
A clean folder structure matters more than it sounds for automated workflows. When Claude Code is running autonomously, it needs to know exactly where to read inputs and write outputs. Use something like:
/project
/inputs ← briefs, source URLs, product docs
/scripts ← generated scripts (.txt or .json)
/renders ← downloaded video files
/logs ← API responses and error logs
Step 1: Generate the Script with Claude Code
The first stage of the workflow is producing a script from raw input. This is where Claude Code earns its place.
Defining the Script Brief
Create a brief file in /inputs with the key parameters for your video:
{
"topic": "Product update: new dashboard analytics features",
"audience": "existing B2B customers",
"tone": "professional but conversational",
"duration_target": "90 seconds",
"key_points": [
"New funnel visualization tool",
"Custom date range comparisons",
"Export to PDF one-click"
],
"cta": "Log in and explore the new dashboard"
}
Prompting Claude Code to Write the Script
In Claude Code, give it a direct instruction:
Read the brief in /inputs/brief.json. Write a video script for a 90-second avatar video.
Format the output as JSON with the following fields:
- "scenes": array of objects, each with "scene_number", "visual_note", and "voiceover_text"
- "total_word_count": estimated word count
- "estimated_duration": in seconds, assuming ~150 words per minute
Write in a clear, direct tone. No filler phrases. Each scene should be 15–25 seconds of spoken content.
Save the output to /scripts/script_v1.json.
Claude Code will read the brief, generate the script in the exact format you specified, and write the file. If the first pass isn’t quite right, you can ask it to revise specific scenes or adjust the tone — this is faster than hand-editing and keeps the format consistent.
Why JSON Matters Here
Structured JSON output is important because HeyGen’s API expects structured input. By having Claude Code generate a scene-by-scene JSON object, you avoid any parsing step between script generation and video creation. The data flows directly from one stage to the next.
Step 2: Prepare the HeyGen API Payload
Once you have your script JSON, Claude Code can use it to build the API request payload for HeyGen.
Basic Video Generation Payload
HeyGen’s video generation endpoint accepts a payload that looks roughly like this:
{
"video_inputs": [
{
"character": {
"type": "avatar",
"avatar_id": "YOUR_AVATAR_ID",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"input_text": "Welcome back. Today we're walking through three new features...",
"voice_id": "YOUR_VOICE_ID",
"speed": 1.0
},
"background": {
"type": "color",
"value": "#1A1A2E"
}
}
],
"dimension": {
"width": 1280,
"height": 720
}
}
For a multi-scene video, each scene in your script becomes its own entry in the video_inputs array.
Having Claude Code Build the Payload
Ask Claude Code to do this automatically:
Read /scripts/script_v1.json.
For each scene in the "scenes" array, build a HeyGen video_inputs entry using:
- avatar_id: "YOUR_AVATAR_ID"
- voice_id: "YOUR_VOICE_ID"
- background color: "#1A1A2E" for even scenes, "#0F3460" for odd scenes
Combine all entries into a valid HeyGen API payload.
Save to /scripts/heygen_payload.json.
This step is where having a consistent scene format pays off. Claude Code can iterate over the array predictably because the structure is uniform.
Step 3: Submit the Job and Poll for Status
HeyGen video rendering is asynchronous. You submit a job, get back a video_id, and then poll a status endpoint until the video is ready.
Submitting the Video Generation Request
Claude Code can run this shell command directly:
curl -X POST https://api.heygen.com/v2/video/generate \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d @/scripts/heygen_payload.json
The response will include a video_id. Claude Code should save this to a log file so you don’t lose it if something interrupts the process.
Polling for Completion
Videos typically take 2–10 minutes to render depending on length and complexity. Claude Code can handle the polling loop:
Poll the HeyGen video status endpoint every 30 seconds using the video_id saved in /logs/job.json.
When the status returns "completed", extract the download URL from the response and save it to /logs/render_complete.json.
If the status returns "failed", log the error details and stop.
HeyGen’s status endpoint returns one of: pending, processing, completed, or failed. The completed response includes a direct download URL for the rendered video file.
Downloading the Rendered File
Once you have the download URL, Claude Code can fetch the file:
curl -L "DOWNLOAD_URL" -o /renders/video_final.mp4
At this point you have a rendered MP4 with your avatar speaking your script.
Step 4: Add Motion Graphics and Post-Production
A plain avatar video is functional, but branded motion graphics and text overlays make it feel finished. There are a few approaches here depending on your toolset.
Option A: Use HeyGen Templates
HeyGen has a template system that lets you pre-configure layouts with branded elements — lower thirds, intro/outro sequences, logo placement, and background graphics. If you set up a template in the HeyGen dashboard, you can reference it in your API payload using the template_id field.
This is the lowest-friction option if your branding needs are consistent across videos. Set it up once, reference it in every job.
Option B: Post-Process with FFmpeg
For more control, use FFmpeg as a post-processing step after downloading the video. Claude Code can generate and run FFmpeg commands:
Using FFmpeg, do the following to /renders/video_final.mp4:
1. Add a text overlay "New Dashboard Features" in the lower third for the first 5 seconds
2. Overlay the logo watermark from /assets/logo.png at 5% opacity in the top right
3. Add a 1-second fade in at the start and a 1-second fade out at the end
Save the result to /renders/video_branded.mp4
Claude Code handles FFmpeg surprisingly well — it knows the filter syntax, can chain operations, and will debug errors if a command fails.
Option C: Subtitle Generation
HeyGen can generate auto-subtitles on render. Enable this in your payload with:
"caption": {
"enabled": true,
"style": "default"
}
Or generate an SRT file from your script JSON (Claude Code can do this directly since you already have timestamped voiceover text) and burn subtitles in during FFmpeg post-processing.
Step 5: Automate the Full Pipeline
Running each step manually defeats the purpose. The real goal is a single-command pipeline that takes a brief file and outputs a finished, branded video.
Wrapping Everything in a Script
Have Claude Code write a shell script or Python script that runs all stages in sequence:
- Read brief from
/inputs/brief.json - Generate script and save to
/scripts/ - Build HeyGen payload
- Submit rendering job
- Poll until complete
- Download rendered video
- Apply FFmpeg post-processing
- Move finished file to
/renders/final/
Claude Code is particularly useful here because it can write the orchestration script itself — you describe what you want, it writes the code, and you can run it end-to-end.
Adding Error Handling
Production pipelines need to handle failures gracefully. Common failure points in this workflow:
- HeyGen API rate limits — Add exponential backoff on polling and submission
- Script quality issues — Add a Claude review step that evaluates the script before submission
- Render failures — Log the full error response and retry with a simplified payload
- Network interruptions during download — Use
curl --retry 3to handle transient issues
Claude Code can add these failure modes to the orchestration script if you ask it to: “Add error handling for HeyGen rate limits and download retries.”
How MindStudio Fits Into This Workflow
If you want to run this pipeline without managing infrastructure, or if you need non-technical team members to trigger video generation without touching code, MindStudio’s AI Media Workbench is worth looking at.
MindStudio is a no-code platform for building AI agents and automated workflows. Its media workbench gives you access to video and image generation tools in a visual interface — including the ability to chain steps like script generation, video rendering, and post-processing into a single automated workflow that anyone can run.
More relevant to developers: MindStudio’s Agent Skills Plugin (available as an npm package, @mindstudio-ai/agent) lets external agents — including Claude Code — call MindStudio capabilities as simple method calls. So if you’ve already built the Claude Code pipeline described above, you can offload specific steps (like subtitle generation, clip merging, or workflow execution) to MindStudio without rebuilding anything.
For teams that want the full pipeline to run on a schedule or be triggered by external events (a new product release, a form submission, a Slack message), MindStudio’s visual workflow builder handles that orchestration layer without requiring a server to manage.
You can start for free at mindstudio.ai and explore the media tools without needing to set up API keys or accounts for individual services.
Common Mistakes and How to Avoid Them
Sending Too Much Text per Scene
HeyGen has a character limit per video_input entry. If your voiceover text is too long for a single scene, the API will return an error. Keep individual scene scripts under 400 words — Claude Code can split longer content automatically if you instruct it to.
Not Validating Script Timing
A 90-second brief doesn’t guarantee a 90-second video. Word count is a rough proxy, but avatar speaking speed, pauses, and sentence structure all affect duration. After the first render, measure the actual duration and ask Claude Code to trim or expand the script accordingly.
Hardcoding Asset IDs
Avatar IDs and voice IDs change if you update your HeyGen account assets. Store these in a config file rather than baking them into scripts directly. Claude Code will respect a config file if you tell it to read from one.
Skipping the Review Step
Fully autonomous pipelines are tempting, but skipping a human review step (even a quick one) before submitting a render job is risky — especially early on. Add a pause in the pipeline after script generation that shows the output and waits for a y/n confirmation before proceeding.
Frequently Asked Questions
Can Claude Code call the HeyGen API directly without a separate script?
Yes. Claude Code can make HTTP requests using curl or Python’s requests library directly from its agentic loop. You don’t need a separate wrapper script — though writing one makes the pipeline reusable and easier to audit.
Does HeyGen support custom voice cloning via API?
HeyGen supports voice cloning, but the cloning setup (uploading samples and training the voice model) is done through the HeyGen dashboard, not the API. Once a voice clone is created, you reference its voice_id in API calls like any other voice.
How long does HeyGen take to render a 90-second video?
Typically 3–7 minutes for a standard-resolution video. Higher-resolution outputs (4K) or complex template-based videos may take longer. The polling approach described in this guide handles variable render times without requiring a fixed wait.
What’s the cost of running this pipeline at scale?
HeyGen pricing is credit-based, with roughly 1 credit per minute of rendered video. At current pricing, a 90-second video costs around 1.5 credits. Claude Code usage is billed per token through the Anthropic API — a full script generation plus payload-building run uses approximately 2,000–5,000 tokens, which is a fraction of a cent. For high-volume production, Claude API costs are negligible compared to HeyGen rendering costs.
Can this workflow produce videos in multiple languages?
Yes. HeyGen supports multilingual voiceovers, and its avatar lip-sync adjusts to match the language. You’d add a translation step to the Claude Code pipeline — translate the script to each target language, generate separate payloads, and submit parallel rendering jobs. HeyGen’s voice library includes voices in over 40 languages.
Is Claude Code the only way to orchestrate this, or can other agents do it?
Any agent or scripting environment that can make HTTP calls can orchestrate this workflow. Claude Code is convenient because of its ability to reason about errors, revise scripts, and write the orchestration code itself. But you could use LangChain, CrewAI, a simple Python cron job, or — as mentioned above — MindStudio’s visual workflow builder if you prefer a no-code approach.
Key Takeaways
- Claude Code handles the thinking: Script generation, payload construction, error reasoning, and orchestration logic all benefit from a language model in the loop.
- HeyGen handles the rendering: Its API is clean, well-documented, and supports avatars, voice cloning, and branded templates.
- Async rendering requires polling: Build the wait-and-retry logic early — it’s where most pipelines fail.
- FFmpeg fills the post-production gap: Subtitles, watermarks, fades, and clip edits are all scriptable without a GUI.
- Structure your data from the start: Consistent JSON between stages is what makes full automation possible without brittle parsing logic.
If you want to run this kind of pipeline without managing code or infrastructure, MindStudio’s AI Media Workbench gives you the same capabilities in a visual builder — and integrates with 1,000+ other tools if you want to connect video production to your broader content or marketing stack.

