How to Use the AutoResearch Loop for Cold Email Optimization with GitHub Actions

Q: What is an AutoResearch Loop in the context of cold email?

An AutoResearch Loop is an automated testing cycle that continuously improves your cold email copy. It works by defining a baseline (your current best-performing email), generating an AI-powered challenger variant, testing the challenger against the baseline, and promoting the winner. The loop runs on a schedule — typically via a tool like GitHub Actions — so the optimization happens without manual intervention.

Q: How do I choose the right evaluation window for cold email A/B tests?

The right evaluation window depends on your send volume. A general rule: wait until each variant has at least 50 sends, then compare. In time terms, this usually means 3–7 days for high-volume senders and 2–4 weeks for lower-volume campaigns. Cutting the window short leads to false positives — challengers that appeared to win but were just lucky.

Q: Can I use this approach with any cold email platform?

Yes, as long as your platform exposes an API for fetching campaign analytics and updating sequence content. Most modern platforms — Instantly, Lemlist, Smartlead, Apollo, Outreach — support this. Check that your API tier includes write access to sequence steps, not just read access, since the loop needs to push challenger variants back to the platform.

Q: What AI model works best for generating cold email challengers?

GPT-4o and Claude 3.5 Sonnet both perform well for structured copywriting tasks like this. The quality of the output depends more on prompt design than model choice. A well-structured prompt that diagnoses a specific weakness and constrains the rewrite to one variable will outperform a generic "rewrite this email" prompt regardless of model.

Q: How do I handle personalization variables in the AI-generated challengers?

Treat personalization tokens (like {first_name} or {company}) as literals when passing the email to the AI model. Include an instruction in your prompt to preserve any text wrapped in curly braces. After the model returns the challenger, validate that all original personalization tokens are still present before deploying.

Q: Is GitHub Actions the best scheduler for this kind of automation?

GitHub Actions works well if you already use GitHub and want a zero-infrastructure solution. For more complex setups — multiple campaigns, multi-step workflows, or cross-platform integrations — dedicated automation platforms like MindStudio or n8n give you more control and visibility. GitHub Actions is a solid starting point, but it's not the only option.

Why Most Cold Email Testing Never Gets Past a Single A/B Test

Cold email reply rates hover between 1% and 5% for most outbound teams. The campaigns consistently hitting double digits have one thing in common: they test continuously, not occasionally.

The problem is that manual A/B testing is slow. You set up a variant, wait a week, check the results, decide whether to update your template, and then start over. By the time you’ve run a few cycles, the market has shifted or your prospect list has changed. The AutoResearch Loop for cold email optimization solves this by automating the entire cycle — from generating challenger variants to promoting winners — and running it on a schedule using GitHub Actions.

This guide walks you through the full setup: connecting your cold email platform API, defining a reply rate metric, building the challenger-baseline loop logic, and scheduling it with GitHub Actions so it runs without you.

What the AutoResearch Loop Actually Does

The AutoResearch Loop is a self-improving testing framework. It’s built around a simple idea: your best-performing email becomes the baseline, a new AI-generated variant challenges it, the winner becomes the new baseline, and the cycle repeats.

Here’s the loop in plain terms:

Pull performance data — Fetch reply rates for your active email sequences via your cold email platform’s API.
Identify the baseline — The email variant with the highest reply rate becomes the reference point.
Generate a challenger — An AI model rewrites the baseline using a different angle, subject line, or call-to-action.
Deploy the challenger — Push the new variant to a test segment of your active campaign.
Evaluate the result — After a defined window, compare reply rates. Promote the winner.
Repeat — GitHub Actions triggers the loop on a cron schedule (daily, weekly, or whatever cadence fits your send volume).

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

The “AutoResearch” part refers to the AI autonomously analyzing what’s working in the current baseline, researching why it works, and using that analysis to generate a challenger that targets a specific improvement — not just a random rewrite.

Prerequisites

Before building the loop, make sure you have the following in place.

A cold email platform with an API. Most modern platforms support this — Instantly, Lemlist, Smartlead, Apollo, and Outreach all expose endpoints for retrieving campaign stats and updating sequence steps. You’ll need an API key and access to at least two endpoints: one for pulling analytics (sent, opened, replied) and one for updating or creating sequence variants.

A GitHub repository. This is where your loop script and GitHub Actions workflow file live. A private repo works fine.

An AI model API key. OpenAI, Anthropic (Claude), or any model capable of following structured prompts. You’ll use this to generate challenger variants.

Enough send volume to get statistical signal. If you’re sending fewer than 50 emails per variant per week, your reply rate data will be too noisy to act on. The loop still works, but your evaluation window needs to be longer.

Step 1 — Connect Your Cold Email Platform API

Start by writing a lightweight API client for your platform. The goal is two functions: one to fetch campaign stats and one to update a sequence step.

Fetching Campaign Analytics

Most cold email APIs return analytics at the campaign or sequence level. Here’s the structure you want to pull for each sequence step (email in the sequence):

{
  "step_id": "step_abc123",
  "subject": "Quick question about {company}",
  "body": "...",
  "stats": {
    "sent": 120,
    "opened": 44,
    "replied": 6
  }
}

Calculate reply rate as replied / sent. Open rate is useful context, but reply rate is the metric that matters for this loop.

Store your API key as a GitHub Actions secret. Never hardcode credentials in your repository.

import os
import requests

API_KEY = os.environ["COLD_EMAIL_API_KEY"]
BASE_URL = "https://api.yourplatform.com/v1"

def get_sequence_steps(campaign_id):
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.get(f"{BASE_URL}/campaigns/{campaign_id}/steps", headers=headers)
    response.raise_for_status()
    return response.json()["steps"]

Updating a Sequence Step

When the loop promotes a challenger, it needs to update the active sequence step. Check your platform’s API docs for the correct endpoint — most use a PATCH or PUT on the step resource.

def update_step_body(step_id, new_subject, new_body):
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {"subject": new_subject, "body": new_body}
    response = requests.patch(f"{BASE_URL}/steps/{step_id}", json=payload, headers=headers)
    response.raise_for_status()
    return response.json()

Step 2 — Define Your Reply Rate Metric

A clean metric definition prevents the loop from making bad decisions. There are a few things to nail down before the loop runs.

Minimum Sample Size

Set a floor on how many sends are required before a step’s reply rate is considered reliable. A common starting point is 50 sends. If a step hasn’t reached that threshold, skip it during the evaluation phase — don’t promote or demote based on insufficient data.

MIN_SENDS = 50

def is_statistically_eligible(step):
    return step["stats"]["sent"] >= MIN_SENDS

Evaluation Window

Define how long a challenger gets to run before comparison. For most outbound campaigns, 7 days is a reasonable window. For high-volume senders, 3 days may be enough.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Store the challenger’s deployment timestamp so the evaluation logic can check whether the window has passed.

Winning Threshold

Avoid promoting a challenger based on noise. Require a meaningful improvement over the baseline — at least a 20% relative increase in reply rate (e.g., 3% → 3.6%) before promoting.

MIN_RELATIVE_IMPROVEMENT = 0.20

def challenger_wins(baseline_rate, challenger_rate):
    if baseline_rate == 0:
        return challenger_rate > 0
    return (challenger_rate - baseline_rate) / baseline_rate >= MIN_RELATIVE_IMPROVEMENT

Step 3 — Build the Challenger Generation Logic

This is where the AI model comes in. The goal is to give the model a structured prompt that produces a challenger variant targeting a specific weakness in the baseline.

Analyzing the Baseline

Before generating a challenger, have the AI analyze what’s working and what might be holding the baseline back. Feed it the current email and its stats, and ask for a critique.

import openai

def analyze_baseline(subject, body, reply_rate):
    prompt = f"""
You are analyzing a cold email for a B2B outbound campaign.

Subject: {subject}
Body:
{body}

Current reply rate: {reply_rate:.1%}

Identify the single biggest weakness in this email that is likely reducing replies. 
Be specific. Do not rewrite the email yet — just diagnose.
"""
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Generating the Challenger

Pass the diagnosis back to the model and ask it to generate a challenger that addresses only that weakness. Keeping the scope narrow makes it easier to attribute a performance difference to a specific change.

def generate_challenger(subject, body, diagnosis):
    prompt = f"""
You are rewriting a cold email to fix a specific problem.

Original subject: {subject}
Original body:
{body}

Problem to fix: {diagnosis}

Rewrite the email to address this specific problem. Keep everything else the same.
Return your response as JSON with keys "subject" and "body".
"""
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    import json
    return json.loads(response.choices[0].message.content)

Step 4 — Write the Main Loop Script

Now put the pieces together in a single script that the GitHub Actions workflow will call.

import json
import os
from datetime import datetime, timedelta

# Load state from a JSON file (persisted between runs via GitHub Actions cache or a simple file commit)
def load_state():
    try:
        with open("loop_state.json", "r") as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

def save_state(state):
    with open("loop_state.json", "w") as f:
        json.dump(state, f, indent=2)

def run_loop(campaign_id):
    state = load_state()
    steps = get_sequence_steps(campaign_id)
    
    for step in steps:
        step_id = step["step_id"]
        stats = step["stats"]
        
        if not is_statistically_eligible(step):
            print(f"Step {step_id}: not enough data, skipping.")
            continue
        
        reply_rate = stats["replied"] / stats["sent"]
        step_state = state.get(step_id, {})
        
        # Check if a challenger is currently running
        if "challenger" in step_state:
            challenger = step_state["challenger"]
            deployed_at = datetime.fromisoformat(challenger["deployed_at"])
            
            if datetime.utcnow() < deployed_at + timedelta(days=7):
                print(f"Step {step_id}: challenger still in evaluation window.")
                continue
            
            # Evaluate challenger
            challenger_rate = challenger.get("current_reply_rate", 0)
            if challenger_wins(reply_rate, challenger_rate):
                print(f"Step {step_id}: challenger wins! Promoting.")
                update_step_body(step_id, challenger["subject"], challenger["body"])
                state[step_id] = {"baseline_reply_rate": challenger_rate}
            else:
                print(f"Step {step_id}: baseline holds. Discarding challenger.")
                state[step_id] = {"baseline_reply_rate": reply_rate}
        
        else:
            # Generate new challenger
            print(f"Step {step_id}: generating challenger.")
            diagnosis = analyze_baseline(step["subject"], step["body"], reply_rate)
            challenger = generate_challenger(step["subject"], step["body"], diagnosis)
            challenger["deployed_at"] = datetime.utcnow().isoformat()
            
            # In a real setup, you'd push the challenger to a test segment here
            # For simplicity, we store it in state and update the step
            update_step_body(step_id, challenger["subject"], challenger["body"])
            
            state[step_id] = {
                "baseline_reply_rate": reply_rate,
                "challenger": challenger
            }
    
    save_state(state)

if __name__ == "__main__":
    CAMPAIGN_ID = os.environ["CAMPAIGN_ID"]
    run_loop(CAMPAIGN_ID)

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

Step 5 — Schedule the Loop with GitHub Actions

GitHub Actions handles the scheduling. Create a workflow file at .github/workflows/autoresearch-loop.yml.

name: AutoResearch Cold Email Loop

on:
  schedule:
    - cron: '0 8 * * 1'  # Every Monday at 8am UTC
  workflow_dispatch:       # Allow manual trigger

jobs:
  run-loop:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: pip install openai requests
      
      - name: Run AutoResearch Loop
        env:
          COLD_EMAIL_API_KEY: $
          OPENAI_API_KEY: $
          CAMPAIGN_ID: $
        run: python loop.py
      
      - name: Commit updated state
        run: |
          git config user.name "autoresearch-bot"
          git config user.email "bot@yourcompany.com"
          git add loop_state.json
          git diff --staged --quiet || git commit -m "Update loop state [skip ci]"
          git push

A few things to note about this setup:

The [skip ci] tag in the commit message prevents the state commit from triggering another workflow run.
Secrets (COLD_EMAIL_API_KEY, OPENAI_API_KEY, CAMPAIGN_ID) are stored in your repository’s Settings → Secrets and variables → Actions.
The workflow_dispatch trigger lets you run the loop manually from the GitHub UI anytime you want to test.
State is persisted by committing loop_state.json back to the repository. For more robust state management, you could use a database or a cloud storage bucket instead.

Adjusting the Schedule

The cron expression 0 8 * * 1 runs every Monday at 8:00 AM UTC. Adjust based on your send volume:

High volume (500+ sends/week per step): 0 8 * * * — daily
Medium volume (100–500 sends/week): 0 8 * * 1 — weekly
Low volume (<100 sends/week): 0 8 1 * * — monthly

Step 6 — Add a Notification Step

A loop that runs silently is hard to trust. Add a Slack or email notification so you know when a challenger was promoted or rejected.

      - name: Send Slack notification
        if: always()
        uses: slackapi/slack-github-action@v1.27.0
        with:
          payload: |
            {
              "text": "AutoResearch Loop completed for campaign $. Check loop_state.json for results."
            }
        env:
          SLACK_WEBHOOK_URL: $

You can make this richer by having the loop script write a summary to a file, then read it in the notification step.

Where MindStudio Fits Into This Workflow

The setup above requires writing and maintaining Python scripts plus a GitHub Actions YAML file. That’s manageable for an engineer, but it creates a bottleneck if the marketing team wants to adjust the challenger generation prompt, change the evaluation window, or run the loop against a different campaign without filing a ticket.

MindStudio solves this. You can build the entire AutoResearch Loop as a visual AI agent with a no-code workflow — connecting your cold email platform via one of 1,000+ integrations, calling an AI model to generate challenger variants, and scheduling it to run automatically as a background agent. The marketing team can update the generation prompt directly in the UI without touching code.

Hermes, walked through line by line — free 1-hour workshop

If you want to keep the GitHub Actions orchestration but augment it with smarter AI reasoning, MindStudio’s webhook and API endpoint agents can handle the analysis and generation step. Your GitHub Actions workflow calls the MindStudio endpoint, passes the current baseline email and stats, and gets back a structured challenger variant — no OpenAI API key setup required, and you can swap models (GPT-4o, Claude, Gemini) without changing any code.

The Agent Skills Plugin (@mindstudio-ai/agent) also lets you call MindStudio capabilities directly from a custom agent, so if you’re running a more complex outbound automation stack with LangChain or CrewAI, you can plug in MindStudio’s email and workflow capabilities as typed method calls.

You can try MindStudio free at mindstudio.ai.

Common Mistakes and How to Avoid Them

Acting on Too Little Data

The most common mistake is running the evaluation too early. If you promote a challenger after 20 sends with a 10% reply rate, you might just be looking at an outlier. Enforce the minimum sample size strictly — don’t let impatience override the logic.

Testing Multiple Variables at Once

If you change the subject line, the opening line, and the call-to-action simultaneously, you can’t know which change drove the improvement. The loop’s analysis-and-diagnosis step is designed to isolate one variable per iteration. Respect that — if the model tries to rewrite everything, tighten the prompt.

Forgetting About Sequence Position

An email that works well as Step 1 (first touch) may perform differently as Step 3 (follow-up). The loop should track step position in its state and generate challengers that account for the context — a follow-up that references no prior contact will feel off.

Not Auditing the AI Output

The loop runs autonomously, but the generated challengers should be reviewed periodically. An AI model can occasionally produce email copy that’s off-brand, too aggressive, or just weird. Add a manual review flag to the workflow — hold challengers that exceed a confidence threshold for human approval before deployment.

Over-Scheduling

Running the loop daily when you only have 50 sends per week creates noise, not signal. Match the schedule to your send volume. The loop’s value comes from consistent, reliable iteration — not speed.

Frequently Asked Questions

What is an AutoResearch Loop in the context of cold email?

An AutoResearch Loop is an automated testing cycle that continuously improves your cold email copy. It works by defining a baseline (your current best-performing email), generating an AI-powered challenger variant, testing the challenger against the baseline, and promoting the winner. The loop runs on a schedule — typically via a tool like GitHub Actions — so the optimization happens without manual intervention.

How do I choose the right evaluation window for cold email A/B tests?

The right evaluation window depends on your send volume. A general rule: wait until each variant has at least 50 sends, then compare. In time terms, this usually means 3–7 days for high-volume senders and 2–4 weeks for lower-volume campaigns. Cutting the window short leads to false positives — challengers that appeared to win but were just lucky.

Can I use this approach with any cold email platform?

Yes, as long as your platform exposes an API for fetching campaign analytics and updating sequence content. Most modern platforms — Instantly, Lemlist, Smartlead, Apollo, Outreach — support this. Check that your API tier includes write access to sequence steps, not just read access, since the loop needs to push challenger variants back to the platform.

What AI model works best for generating cold email challengers?

GPT-4o and Claude 3.5 Sonnet both perform well for structured copywriting tasks like this. The quality of the output depends more on prompt design than model choice. A well-structured prompt that diagnoses a specific weakness and constrains the rewrite to one variable will outperform a generic “rewrite this email” prompt regardless of model.

How do I handle personalization variables in the AI-generated challengers?

Treat personalization tokens (like {first_name} or {company}) as literals when passing the email to the AI model. Include an instruction in your prompt to preserve any text wrapped in curly braces. After the model returns the challenger, validate that all original personalization tokens are still present before deploying.

Is GitHub Actions the best scheduler for this kind of automation?

GitHub Actions works well if you already use GitHub and want a zero-infrastructure solution. For more complex setups — multiple campaigns, multi-step workflows, or cross-platform integrations — dedicated automation platforms like MindStudio or n8n give you more control and visibility. GitHub Actions is a solid starting point, but it’s not the only option.

Key Takeaways

The AutoResearch Loop automates the full cold email optimization cycle: pull data, analyze the baseline, generate a challenger, test it, and promote the winner.
Define a clean reply rate metric with a minimum sample size and a winning threshold before running any evaluations — otherwise the loop will act on noise.
GitHub Actions handles the scheduling with a simple cron expression and stores state by committing a JSON file back to the repository.
Isolate one variable per challenger iteration. Changing multiple elements at once makes it impossible to understand what drove a change in performance.
Add a Slack or email notification so the loop doesn’t run silently — visibility builds trust in the automation over time.
For teams that want to manage prompts, campaigns, and scheduling without code, MindStudio can run the same loop as a visual AI agent with no infrastructure to maintain. Start free at mindstudio.ai.

Why Most Cold Email Testing Never Gets Past a Single A/B Test

What the AutoResearch Loop Actually Does

Remy is new. The platform isn't.

Prerequisites

Step 1 — Connect Your Cold Email Platform API

Fetching Campaign Analytics

Updating a Sequence Step

Step 2 — Define Your Reply Rate Metric

Minimum Sample Size

Evaluation Window

Seven tools to build an app. Or just Remy.

Winning Threshold

Step 3 — Build the Challenger Generation Logic

Analyzing the Baseline

Generating the Challenger

Step 4 — Write the Main Loop Script

One coffee. One working app.

Step 5 — Schedule the Loop with GitHub Actions

Adjusting the Schedule

Step 6 — Add a Notification Step

Where MindStudio Fits Into This Workflow

Common Mistakes and How to Avoid Them

Acting on Too Little Data

Testing Multiple Variables at Once

Forgetting About Sequence Position

Not Auditing the AI Output

Over-Scheduling

Frequently Asked Questions

What is an AutoResearch Loop in the context of cold email?

How do I choose the right evaluation window for cold email A/B tests?

Can I use this approach with any cold email platform?

What AI model works best for generating cold email challengers?

How do I handle personalization variables in the AI-generated challengers?

Is GitHub Actions the best scheduler for this kind of automation?

Key Takeaways

Related Articles

AI Image Generation + Airtable: Automate Visual Content Pipelines

Connecting AI Image Models to Google Sheets for Automated Workflows

Integrating AI Video Generation with Slack for Team Review Workflows

How to Connect AI Agents to Airtable for Automated Data Workflows