How to Build a ChatGPT Workspace Agent: Step-by-Step Tutorial
ChatGPT Workspace Agents let you automate multi-step tasks across Slack, Gmail, and Linear. Here's how to build, test, and deploy one from scratch.
What a ChatGPT Workspace Agent Actually Does
Most people use ChatGPT as a chat window. You type, it responds, you copy the output somewhere. That works for one-off tasks, but it doesn’t scale — and it doesn’t touch your actual tools.
A ChatGPT workspace agent is different. Instead of treating GPT as a chatbot, you wire it up as an autonomous decision-maker that can read your Gmail inbox, post to Slack, create Linear tickets, and move between those systems based on what it finds. It acts, not just responds.
This tutorial walks through building one from scratch. By the end, you’ll have a working agent that monitors a Gmail label, reasons about incoming messages, routes them to Slack or Linear based on content, and handles the whole sequence without you touching it.
If you’re new to how agents differ from simpler automations, the overview of agentic workflows vs traditional automation is worth a quick read before you start.
Prerequisites: What You Need Before You Build
Before writing any logic, get these in place:
Accounts and API access:
- OpenAI account with API access (GPT-4o or GPT-4-turbo recommended)
- Google Cloud project with Gmail API enabled
- Slack app with appropriate OAuth scopes
- Linear API key (found in Linear → Settings → API)
Technical requirements:
- Python 3.10+ or Node.js 18+ (this guide uses Python)
- Basic familiarity with REST APIs and JSON
- A place to run the agent — a server, cloud function, or local machine to start
Scopes and permissions:
- Gmail:
gmail.readonly,gmail.labels,gmail.modify - Slack:
chat:write,channels:read,users:read - Linear: Read/write access to your chosen team and project
If you’ve never built an AI agent before, the complete beginner’s guide to AI agents covers the core concepts without assuming a technical background.
Step 1: Define the Agent’s Job
The most common mistake when building workspace agents is starting with the code instead of the behavior. Before you touch an API, write down what your agent needs to do in plain English.
Here’s the job definition for the agent we’re building:
“When a new email arrives in the ‘Support’ Gmail label, read the subject and body. If it describes a bug or technical issue, create a Linear ticket in the Engineering team’s backlog. If it’s a general question or sales inquiry, post a summary to the #inbox Slack channel. In either case, mark the Gmail message as processed by applying a ‘Handled’ label.”
That’s the full spec. Three inputs, two decision paths, one cleanup action. Simple enough to build in an afternoon, but realistic enough to be genuinely useful.
Write yours with the same structure:
- What triggers the agent?
- What information does it need to read?
- What decisions does it make?
- What actions does it take?
- How does it clean up after itself?
Step 2: Set Up Your Gmail Watcher
The agent needs to poll Gmail (or respond to push notifications) to check for new messages. We’ll use polling for simplicity — a cron job that runs every few minutes.
Authenticate with Google
First, set up credentials:
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
SCOPES = ['https://www.googleapis.com/auth/gmail.modify']
def get_gmail_service():
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
return build('gmail', 'v1', credentials=creds)
On first run, you’ll go through an OAuth flow to generate token.json. Store this securely — it’s effectively a password.
Fetch Unprocessed Emails
def get_unprocessed_emails(service, label_id):
results = service.users().messages().list(
userId='me',
labelIds=[label_id],
q='-label:handled'
).execute()
return results.get('messages', [])
def get_email_content(service, message_id):
message = service.users().messages().get(
userId='me',
id=message_id,
format='full'
).execute()
# Extract subject and body from payload
headers = message['payload']['headers']
subject = next(h['value'] for h in headers if h['name'] == 'Subject')
# Body extraction varies by email structure
return subject, extract_body(message['payload'])
For AI email parsing that handles complex email structures including multipart messages, you’ll want a more robust extract_body function that handles both plain text and HTML parts.
Step 3: Add the GPT Decision Layer
This is where the agent’s intelligence lives. You’re not writing if/else rules — you’re asking GPT to read the email and classify it.
Set Up the OpenAI Client
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
Write the Classification Prompt
The system prompt defines how the agent reasons. This is worth spending time on — a poorly written prompt produces inconsistent classifications. For guidance on structuring agent prompts effectively, see how to write effective prompts for AI agents.
SYSTEM_PROMPT = """
You are an email triage agent for a software company.
Your job is to classify incoming support emails and decide how to route them.
Given an email subject and body, respond with a JSON object containing:
- "category": either "bug_report", "technical_issue", "sales_inquiry", or "general_question"
- "priority": "high", "medium", or "low"
- "summary": a one-sentence summary of the email (max 100 characters)
- "action": either "create_linear_ticket" or "post_to_slack"
Rules:
- Bug reports and technical issues → create_linear_ticket
- Sales inquiries and general questions → post_to_slack
- Mark anything mentioning system outages or data loss as high priority
- Be concise. The summary will appear in Slack or as a ticket title.
Respond with valid JSON only. No explanation.
"""
def classify_email(subject, body):
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Subject: {subject}\n\nBody: {body}"}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
Using response_format: json_object enforces structured output — the model won’t return prose when you need a parseable response.
Step 4: Wire Up the Action Handlers
Once the agent classifies an email, it needs to actually do something. Here are the two action handlers.
Create a Linear Ticket
import requests
LINEAR_API_URL = "https://api.linear.app/graphql"
LINEAR_API_KEY = "your-linear-api-key"
TEAM_ID = "your-team-id"
def create_linear_ticket(summary, priority, email_body):
priority_map = {"high": 1, "medium": 2, "low": 3}
mutation = """
mutation CreateIssue($title: String!, $description: String!, $teamId: String!, $priority: Int!) {
issueCreate(input: {
title: $title,
description: $description,
teamId: $teamId,
priority: $priority
}) {
success
issue {
id
url
}
}
}
"""
response = requests.post(
LINEAR_API_URL,
headers={"Authorization": LINEAR_API_KEY},
json={
"query": mutation,
"variables": {
"title": summary,
"description": email_body,
"teamId": TEAM_ID,
"priority": priority_map.get(priority, 2)
}
}
)
return response.json()['data']['issueCreate']['issue']
Post to Slack
from slack_sdk import WebClient
slack_client = WebClient(token="your-slack-bot-token")
def post_to_slack(summary, category, priority, channel="#inbox"):
priority_emoji = {"high": "🔴", "medium": "🟡", "low": "🟢"}
slack_client.chat_postMessage(
channel=channel,
text=f"{priority_emoji.get(priority, '⚪')} *New {category.replace('_', ' ').title()}*\n{summary}"
)
For a deeper look at building agents that live natively in Slack rather than just posting to it, see AI agents for Slack and Teams.
Step 5: Add the Cleanup Step and Loop Logic
After the agent acts, it needs to mark the email so it isn’t processed again on the next run.
def mark_as_handled(service, message_id, handled_label_id):
service.users().messages().modify(
userId='me',
id=message_id,
body={
'addLabelIds': [handled_label_id],
'removeLabelIds': []
}
).execute()
The Main Loop
Now connect everything:
def run_agent():
service = get_gmail_service()
emails = get_unprocessed_emails(service, label_id="Label_Support")
for email_ref in emails:
try:
# Read
subject, body = get_email_content(service, email_ref['id'])
# Decide
classification = classify_email(subject, body)
# Act
if classification['action'] == 'create_linear_ticket':
ticket = create_linear_ticket(
classification['summary'],
classification['priority'],
body
)
print(f"Created ticket: {ticket['url']}")
else:
post_to_slack(
classification['summary'],
classification['category'],
classification['priority']
)
# Clean up
mark_as_handled(service, email_ref['id'], handled_label_id="Label_Handled")
except Exception as e:
print(f"Failed to process {email_ref['id']}: {e}")
# Don't mark as handled — retry next cycle
if __name__ == "__main__":
run_agent()
The try/except pattern matters. If the agent fails mid-process, you don’t want to lose the email — you want to retry it. Only mark something handled if the full sequence succeeded.
This kind of conditional logic and branching in agentic workflows becomes more important as your agent handles more edge cases.
Step 6: Add Multi-Step Reasoning for Complex Cases
The basic agent works for clean-cut emails. But real inboxes are messy. An email might describe both a bug and a billing question. A customer might send five follow-ups in a thread. A vague subject line might not reflect the actual content.
This is where multi-step reasoning in AI agents becomes useful. Instead of one classification call, you add a reasoning step before the action.
REASONING_PROMPT = """
You are analyzing an email to decide how to handle it.
Think through these questions before classifying:
1. Is there a clear primary issue in this email, or multiple issues?
2. Is there urgency indicated (words like "urgent", "broken", "can't access", "outage")?
3. Does the sender seem to be an existing customer or a new prospect?
4. Is there enough information to take action, or does this need a human?
After reasoning, provide your final classification as JSON.
"""
You can also add a “needs_human” category for emails the agent isn’t confident about — then route those to a Slack channel where a person reviews them. The agent handles what it can and escalates what it can’t. That’s a much more reliable system than an agent that guesses on everything.
Step 7: Test Before You Deploy
Testing an agent that touches real inboxes, Slack channels, and issue trackers is harder than testing a function. Side effects are real. Here’s a safe approach.
Use Dry Run Mode
Add a flag that prevents actions from firing:
def run_agent(dry_run=False):
# ... fetch and classify ...
if dry_run:
print(f"[DRY RUN] Would {classification['action']}: {classification['summary']}")
else:
# actual actions
Run with dry_run=True first. Inspect the classifications. Check that the GPT responses make sense against real emails you’ve collected.
Test Against Known Cases
Build a small test set: 10–15 emails you know the correct classification for. Run the agent against them and check the output before pointing it at a live inbox.
test_emails = [
{
"subject": "App crashing on login",
"body": "Your app crashes every time I try to log in. iOS 17, iPhone 15.",
"expected_action": "create_linear_ticket"
},
{
"subject": "Pricing question",
"body": "What does your enterprise plan include?",
"expected_action": "post_to_slack"
}
]
Common Issues to Watch For
- Token limits: Very long email threads may exceed context limits. Truncate body to 2,000–3,000 characters before passing to GPT.
- Malformed JSON: Even with
json_objectmode, validate the response structure before acting on it. - Rate limits: Gmail API has quota limits. Add delays between requests if you’re processing a batch.
- Duplicate actions: If your polling runs overlap, you might process the same email twice. A database flag or idempotency check prevents this.
For more on safe agent design, these five rules for preventing data loss in agents are worth reviewing before you go live.
Step 8: Deploy and Schedule the Agent
Option 1: Cron Job on a Server
The simplest option. Add a cron entry to run the agent every 5 minutes:
*/5 * * * * /usr/bin/python3 /home/ubuntu/workspace_agent/agent.py >> /var/log/agent.log 2>&1
This works fine for low-volume inboxes. It’s not real-time, but for most support workflows a 5-minute delay is acceptable.
Option 2: Cloud Functions
For serverless deployment, package the agent as an AWS Lambda or Google Cloud Function and trigger it on a schedule via EventBridge or Cloud Scheduler. This removes the need for a persistent server.
Option 3: Gmail Push Notifications
For near real-time response, use Gmail’s push notification API (Pub/Sub). Google sends a webhook to your endpoint when new mail arrives in a watched label. More complex to set up, but useful if your use case requires immediate response.
For guidance on deploying agents across multiple channels once the core logic is working, see how to deploy AI agents across Slack and Microsoft Teams.
Environment Variables
Never hardcode credentials. Use environment variables:
import os
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
LINEAR_API_KEY = os.environ["LINEAR_API_KEY"]
SLACK_BOT_TOKEN = os.environ["SLACK_BOT_TOKEN"]
Store these in .env locally (and add .env to .gitignore). In production, use your platform’s secrets manager.
Step 9: Monitor and Iterate
An agent running unsupervised will eventually hit edge cases you didn’t anticipate. Build in observability from day one.
Log Everything
Log the input, the classification, the action taken, and the outcome:
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s',
handlers=[
logging.FileHandler('agent.log'),
logging.StreamHandler()
]
)
# In the main loop:
logging.info(f"Email {message_id}: classified as {classification['category']}, action: {classification['action']}")
Track Classification Accuracy
After a week of running, review a sample of processed emails. Did the agent route them correctly? If you’re seeing consistent misclassifications in a particular category, refine the system prompt.
Set Up Alerting
If the agent errors out three times in a row, you want to know. A simple counter that sends a Slack message when errors exceed a threshold is enough:
consecutive_errors = 0
MAX_ERRORS = 3
if error:
consecutive_errors += 1
if consecutive_errors >= MAX_ERRORS:
slack_client.chat_postMessage(
channel="#alerts",
text=f"⚠️ Workspace agent has failed {consecutive_errors} times. Check logs."
)
else:
consecutive_errors = 0
Where Remy Fits In
The tutorial above gives you a working agent, but it also gives you a lot to manage: OAuth flows, API clients, error handling, deployment scripts, environment variables. You’re effectively building and maintaining infrastructure alongside the intelligence.
Remy takes a different approach. Instead of writing all of that glue code yourself, you describe the application in a spec — what it should do, what data it works with, what the business rules are — and Remy compiles that into a full-stack app with the backend, database, auth, and deployment already handled.
For a workspace agent, that means you could describe the routing logic and integrations in a structured spec document, and Remy generates the implementation that runs it. When you need to change a rule or add a new integration, you update the spec and recompile. You’re not hunting through code files to find where the Slack logic lives.
This is especially useful if you’re building multiple agents — one for support triage, one for scheduling, one for weekly reporting — because the spec format gives you a consistent, readable source of truth for all of them.
You can explore this approach at mindstudio.ai/remy.
Frequently Asked Questions
What is a ChatGPT workspace agent?
A ChatGPT workspace agent is an automated system that uses GPT models to perform multi-step tasks across workplace tools like Gmail, Slack, Linear, or Google Drive. Unlike a chatbot that responds to prompts, a workspace agent reads from real systems, makes decisions, and takes actions — writing tickets, sending messages, updating records — based on what it finds.
Do I need to know how to code to build a workspace agent?
The tutorial above requires basic Python knowledge. But if you’d prefer a more accessible starting point, no-code AI agent builders can handle many workspace automation use cases without writing code. The tradeoff is flexibility — custom integrations and complex decision logic are easier to implement in code.
How do I handle emails the agent can’t confidently classify?
Add a fallback category — "needs_human" — and route those emails to a Slack channel where a person reviews them. Set a confidence threshold: if the model’s reasoning suggests uncertainty, escalate rather than guess. This is a common pattern in agentic workflows with conditional logic and makes agents much more reliable in production.
How much does it cost to run a GPT-powered workspace agent?
At current OpenAI pricing, a GPT-4o call processing a typical email (subject + body, maybe 500 tokens total) costs roughly $0.001–$0.003 per email. For a team receiving 200 support emails per day, that’s under $1/day in inference costs. The bigger costs are usually engineering time and any third-party API subscriptions.
Is it safe to give an agent access to Gmail and Slack?
It depends on your setup. Use read-only scopes where possible, and only grant write access to the specific resources the agent needs. Store credentials in a secrets manager, not in code. Limit the agent to specific Gmail labels and Slack channels rather than giving it full account access. For production deployments, review AI agent compliance considerations to understand what’s required under your regulatory context.
What’s the difference between a workspace agent and a Zapier automation?
Zapier automations are rule-based: if X, then Y. They don’t read content and reason about it — they just react to triggers according to fixed conditions. A workspace agent uses a language model to interpret unstructured content and make judgment calls. If you’re routing emails based on their content (not just which folder they’re in), you need reasoning — which is why AI-native workflows outperform simple Zapier + GPT combinations for this kind of task.
Key Takeaways
- Start with behavior, not code. Write out what the agent should do in plain English before touching an API.
- GPT handles classification; your code handles actions. Don’t let the model write to Slack directly — separate reasoning from execution.
- Test with dry run mode before going live. Real inboxes, real Slack channels, real tickets — side effects matter.
- Build in a human escalation path. Agents should handle what they’re confident about and escalate what they’re not.
- Observe before you trust. Log inputs and outputs from day one. Review a sample after the first week to validate classification quality.
- The integration code is a solved problem. If maintaining OAuth flows and deployment scripts isn’t where you want to spend time, try Remy as a way to describe the application and let the infrastructure be handled for you.