How to Build an AI Video Production Workflow with Claude Code and HeyGen

What This Workflow Actually Does (and Why It’s Worth Building)

Producing a polished video used to mean juggling a scriptwriter, a voice actor, a motion graphics designer, and an editor — often across different software and timelines. AI has changed that math significantly. Today, a single engineer or content team can build an AI video production workflow that takes a brief and outputs a finished video with a realistic avatar, professional voiceover, and branded visuals.

This guide walks through exactly how to do that using Claude Code as your reasoning and orchestration layer, and HeyGen as your avatar and video rendering engine. By the end, you’ll have a working pipeline that goes from raw input — a topic, a URL, a product description — to a rendered video file, with minimal human intervention in between.

This isn’t a surface-level overview. We’ll cover the actual API calls, the workflow logic, where things break, and how to harden the pipeline for production use.

Understanding the Two Core Tools

Claude Code

Claude Code is Anthropic’s agentic coding environment. Unlike the standard Claude interface, Claude Code is designed to take on multi-step tasks — writing code, running it, reading file outputs, and iterating — in a loop. You give it a goal, and it figures out the steps.

For video production, Claude Code is useful as the “brain” of your pipeline. It can:

Write and refine scripts based on a brief or source material
Generate structured data (scene breakdowns, slide copy, voiceover text) in whatever format downstream tools expect
Call external APIs programmatically and handle responses
Make decisions when something fails or needs adjustment

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

HeyGen

HeyGen is an AI video platform built around avatar-based video generation. You pick an avatar (or clone your own), provide a script, and HeyGen renders a video with the avatar speaking your text in a realistic voice.

The key features relevant to this workflow:

Avatars — HeyGen has 100+ stock avatars, or you can create a custom avatar from video footage
Voice cloning — Upload voice samples to generate a synthetic version of a specific voice
Video API — A REST API that accepts JSON payloads and returns rendered video files
Templates — Pre-built layouts with motion graphics, lower thirds, and branded elements

HeyGen’s API is well-documented and straightforward to call from Claude Code, which makes it a natural pairing.

Prerequisites Before You Start

Before writing a single line of code, get these pieces in place:

Accounts and API keys:

A HeyGen account with API access (available on the Creator plan or higher)
Claude Code installed and configured locally
An Anthropic API key if you’re running Claude Code in non-interactive scripting mode

Assets:

Your avatar ID from HeyGen (found in the Avatar Library)
A voice ID — either from HeyGen’s voice library or a custom cloned voice
Brand assets if you’re using HeyGen’s template system (logos, hex codes, font choices)

Local setup:

Node.js or Python, depending on your preference for the scripting layer
curl or an HTTP client library for making API calls
A working directory with a predictable folder structure (more on this below)

A clean folder structure matters more than it sounds for automated workflows. When Claude Code is running autonomously, it needs to know exactly where to read inputs and write outputs. Use something like:

/project
  /inputs        ← briefs, source URLs, product docs
  /scripts       ← generated scripts (.txt or .json)
  /renders       ← downloaded video files
  /logs          ← API responses and error logs

Step 1: Generate the Script with Claude Code

The first stage of the workflow is producing a script from raw input. This is where Claude Code earns its place.

Defining the Script Brief

Create a brief file in /inputs with the key parameters for your video:

{
  "topic": "Product update: new dashboard analytics features",
  "audience": "existing B2B customers",
  "tone": "professional but conversational",
  "duration_target": "90 seconds",
  "key_points": [
    "New funnel visualization tool",
    "Custom date range comparisons",
    "Export to PDF one-click"
  ],
  "cta": "Log in and explore the new dashboard"
}

Prompting Claude Code to Write the Script

In Claude Code, give it a direct instruction:

Read the brief in /inputs/brief.json. Write a video script for a 90-second avatar video. 
Format the output as JSON with the following fields:
- "scenes": array of objects, each with "scene_number", "visual_note", and "voiceover_text"
- "total_word_count": estimated word count
- "estimated_duration": in seconds, assuming ~150 words per minute

Write in a clear, direct tone. No filler phrases. Each scene should be 15–25 seconds of spoken content.
Save the output to /scripts/script_v1.json.

Claude Code will read the brief, generate the script in the exact format you specified, and write the file. If the first pass isn’t quite right, you can ask it to revise specific scenes or adjust the tone — this is faster than hand-editing and keeps the format consistent.

Why JSON Matters Here

Structured JSON output is important because HeyGen’s API expects structured input. By having Claude Code generate a scene-by-scene JSON object, you avoid any parsing step between script generation and video creation. The data flows directly from one stage to the next.

Step 2: Prepare the HeyGen API Payload

Once you have your script JSON, Claude Code can use it to build the API request payload for HeyGen.

Basic Video Generation Payload

HeyGen’s video generation endpoint accepts a payload that looks roughly like this:

{
  "video_inputs": [
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "input_text": "Welcome back. Today we're walking through three new features...",
        "voice_id": "YOUR_VOICE_ID",
        "speed": 1.0
      },
      "background": {
        "type": "color",
        "value": "#1A1A2E"
      }
    }
  ],
  "dimension": {
    "width": 1280,
    "height": 720
  }
}

For a multi-scene video, each scene in your script becomes its own entry in the video_inputs array.

Having Claude Code Build the Payload

Ask Claude Code to do this automatically:

Read /scripts/script_v1.json. 
For each scene in the "scenes" array, build a HeyGen video_inputs entry using:
- avatar_id: "YOUR_AVATAR_ID"
- voice_id: "YOUR_VOICE_ID"
- background color: "#1A1A2E" for even scenes, "#0F3460" for odd scenes

Combine all entries into a valid HeyGen API payload.
Save to /scripts/heygen_payload.json.

This step is where having a consistent scene format pays off. Claude Code can iterate over the array predictably because the structure is uniform.

Step 3: Submit the Job and Poll for Status

HeyGen video rendering is asynchronous. You submit a job, get back a video_id, and then poll a status endpoint until the video is ready.

Submitting the Video Generation Request

Claude Code can run this shell command directly:

curl -X POST https://api.heygen.com/v2/video/generate \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/scripts/heygen_payload.json

The response will include a video_id. Claude Code should save this to a log file so you don’t lose it if something interrupts the process.

Polling for Completion

Videos typically take 2–10 minutes to render depending on length and complexity. Claude Code can handle the polling loop:

Poll the HeyGen video status endpoint every 30 seconds using the video_id saved in /logs/job.json.
When the status returns "completed", extract the download URL from the response and save it to /logs/render_complete.json.
If the status returns "failed", log the error details and stop.

HeyGen’s status endpoint returns one of: pending, processing, completed, or failed. The completed response includes a direct download URL for the rendered video file.

Downloading the Rendered File

Once you have the download URL, Claude Code can fetch the file:

curl -L "DOWNLOAD_URL" -o /renders/video_final.mp4

At this point you have a rendered MP4 with your avatar speaking your script.

Step 4: Add Motion Graphics and Post-Production

A plain avatar video is functional, but branded motion graphics and text overlays make it feel finished. There are a few approaches here depending on your toolset.

Option A: Use HeyGen Templates

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

HeyGen has a template system that lets you pre-configure layouts with branded elements — lower thirds, intro/outro sequences, logo placement, and background graphics. If you set up a template in the HeyGen dashboard, you can reference it in your API payload using the template_id field.

This is the lowest-friction option if your branding needs are consistent across videos. Set it up once, reference it in every job.

Option B: Post-Process with FFmpeg

For more control, use FFmpeg as a post-processing step after downloading the video. Claude Code can generate and run FFmpeg commands:

Using FFmpeg, do the following to /renders/video_final.mp4:
1. Add a text overlay "New Dashboard Features" in the lower third for the first 5 seconds
2. Overlay the logo watermark from /assets/logo.png at 5% opacity in the top right
3. Add a 1-second fade in at the start and a 1-second fade out at the end
Save the result to /renders/video_branded.mp4

Claude Code handles FFmpeg surprisingly well — it knows the filter syntax, can chain operations, and will debug errors if a command fails.

Option C: Subtitle Generation

HeyGen can generate auto-subtitles on render. Enable this in your payload with:

"caption": {
  "enabled": true,
  "style": "default"
}

Or generate an SRT file from your script JSON (Claude Code can do this directly since you already have timestamped voiceover text) and burn subtitles in during FFmpeg post-processing.

Step 5: Automate the Full Pipeline

Running each step manually defeats the purpose. The real goal is a single-command pipeline that takes a brief file and outputs a finished, branded video.

Wrapping Everything in a Script

Have Claude Code write a shell script or Python script that runs all stages in sequence:

Read brief from /inputs/brief.json
Generate script and save to /scripts/
Build HeyGen payload
Submit rendering job
Poll until complete
Download rendered video
Apply FFmpeg post-processing
Move finished file to /renders/final/

Claude Code is particularly useful here because it can write the orchestration script itself — you describe what you want, it writes the code, and you can run it end-to-end.

Adding Error Handling

Production pipelines need to handle failures gracefully. Common failure points in this workflow:

HeyGen API rate limits — Add exponential backoff on polling and submission
Script quality issues — Add a Claude review step that evaluates the script before submission
Render failures — Log the full error response and retry with a simplified payload
Network interruptions during download — Use curl --retry 3 to handle transient issues

Claude Code can add these failure modes to the orchestration script if you ask it to: “Add error handling for HeyGen rate limits and download retries.”

How MindStudio Fits Into This Workflow

If you want to run this pipeline without managing infrastructure, or if you need non-technical team members to trigger video generation without touching code, MindStudio’s AI Media Workbench is worth looking at.

MindStudio is a no-code platform for building AI agents and automated workflows. Its media workbench gives you access to video and image generation tools in a visual interface — including the ability to chain steps like script generation, video rendering, and post-processing into a single automated workflow that anyone can run.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

goremy.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

More relevant to developers: MindStudio’s Agent Skills Plugin (available as an npm package, @mindstudio-ai/agent) lets external agents — including Claude Code — call MindStudio capabilities as simple method calls. So if you’ve already built the Claude Code pipeline described above, you can offload specific steps (like subtitle generation, clip merging, or workflow execution) to MindStudio without rebuilding anything.

For teams that want the full pipeline to run on a schedule or be triggered by external events (a new product release, a form submission, a Slack message), MindStudio’s visual workflow builder handles that orchestration layer without requiring a server to manage.

You can start for free at mindstudio.ai and explore the media tools without needing to set up API keys or accounts for individual services.

Common Mistakes and How to Avoid Them

Sending Too Much Text per Scene

HeyGen has a character limit per video_input entry. If your voiceover text is too long for a single scene, the API will return an error. Keep individual scene scripts under 400 words — Claude Code can split longer content automatically if you instruct it to.

Not Validating Script Timing

A 90-second brief doesn’t guarantee a 90-second video. Word count is a rough proxy, but avatar speaking speed, pauses, and sentence structure all affect duration. After the first render, measure the actual duration and ask Claude Code to trim or expand the script accordingly.

Hardcoding Asset IDs

Avatar IDs and voice IDs change if you update your HeyGen account assets. Store these in a config file rather than baking them into scripts directly. Claude Code will respect a config file if you tell it to read from one.

Skipping the Review Step

Fully autonomous pipelines are tempting, but skipping a human review step (even a quick one) before submitting a render job is risky — especially early on. Add a pause in the pipeline after script generation that shows the output and waits for a y/n confirmation before proceeding.

Frequently Asked Questions

Can Claude Code call the HeyGen API directly without a separate script?

Yes. Claude Code can make HTTP requests using curl or Python’s requests library directly from its agentic loop. You don’t need a separate wrapper script — though writing one makes the pipeline reusable and easier to audit.

Does HeyGen support custom voice cloning via API?

HeyGen supports voice cloning, but the cloning setup (uploading samples and training the voice model) is done through the HeyGen dashboard, not the API. Once a voice clone is created, you reference its voice_id in API calls like any other voice.

How long does HeyGen take to render a 90-second video?

Typically 3–7 minutes for a standard-resolution video. Higher-resolution outputs (4K) or complex template-based videos may take longer. The polling approach described in this guide handles variable render times without requiring a fixed wait.

What’s the cost of running this pipeline at scale?

HeyGen pricing is credit-based, with roughly 1 credit per minute of rendered video. At current pricing, a 90-second video costs around 1.5 credits. Claude Code usage is billed per token through the Anthropic API — a full script generation plus payload-building run uses approximately 2,000–5,000 tokens, which is a fraction of a cent. For high-volume production, Claude API costs are negligible compared to HeyGen rendering costs.

Can this workflow produce videos in multiple languages?

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

Yes. HeyGen supports multilingual voiceovers, and its avatar lip-sync adjusts to match the language. You’d add a translation step to the Claude Code pipeline — translate the script to each target language, generate separate payloads, and submit parallel rendering jobs. HeyGen’s voice library includes voices in over 40 languages.

Is Claude Code the only way to orchestrate this, or can other agents do it?

Any agent or scripting environment that can make HTTP calls can orchestrate this workflow. Claude Code is convenient because of its ability to reason about errors, revise scripts, and write the orchestration code itself. But you could use LangChain, CrewAI, a simple Python cron job, or — as mentioned above — MindStudio’s visual workflow builder if you prefer a no-code approach.

Key Takeaways

Claude Code handles the thinking: Script generation, payload construction, error reasoning, and orchestration logic all benefit from a language model in the loop.
HeyGen handles the rendering: Its API is clean, well-documented, and supports avatars, voice cloning, and branded templates.
Async rendering requires polling: Build the wait-and-retry logic early — it’s where most pipelines fail.
FFmpeg fills the post-production gap: Subtitles, watermarks, fades, and clip edits are all scriptable without a GUI.
Structure your data from the start: Consistent JSON between stages is what makes full automation possible without brittle parsing logic.

If you want to run this kind of pipeline without managing code or infrastructure, MindStudio’s AI Media Workbench gives you the same capabilities in a visual builder — and integrates with 1,000+ other tools if you want to connect video production to your broader content or marketing stack.