Deploying AI Apps: The Hidden Infrastructure Costs Nobody Warns You About

The $800 Vercel Bill Nobody Warned You About

Two weeks of building with an AI coding agent. Dozens of rapid iterations. A product that actually works. Then the invoice arrives.

$800. For a side project that’s barely launched.

This isn’t a hypothetical. It’s a scenario playing out constantly among technical founders and indie hackers who’ve embraced AI-assisted development. The tools let you ship faster than ever — but the deployment infrastructure costs don’t scale the same way your productivity does. And the defaults on most platforms are quietly optimized for spending more, not less.

This article breaks down exactly where that money goes, which platform settings are responsible, and how to configure your deployment correctly before you push to production. Whether you’re on Vercel, Railway, Fly.io, Render, or anything else — these patterns apply.

Why AI-Assisted Development Creates a New Cost Problem

AI coding tools have genuinely changed how fast you can go from idea to deployed app. What used to take weeks now takes days or hours. But speed without cost awareness is expensive in a very specific way.

The problem isn’t that the platforms are scamming you. It’s that:

You’re deploying far more often. AI agents iterate fast, and continuous deployment means every git push triggers a build. Thirty iterations in a day means thirty builds.
Default settings assume you know what you’re doing. Most platforms configure themselves for scale, not frugality.
AI-generated apps often have architectural inefficiencies. Functions that should be cached make repeated calls. Database queries that should be indexed aren’t. These aren’t bugs in your logic — they’re patterns that an AI didn’t optimize for your cost profile.

The result: you get a working app and a surprising infrastructure bill.

The Specific Line Items That Add Up

Let’s get concrete. Here are the main cost drivers on typical deployment platforms, and why they catch people off guard.

Serverless Function Invocations

Platforms like Vercel charge per function invocation after your free tier. On the Hobby plan, you get 100,000 invocations per month. Sounds like a lot. But if your AI-assisted app is making API calls for every interaction, and you have even a modest number of test users poking at it, you can burn through that fast.

The specific trap: AI-generated frontends often make more API calls than necessary. Requests that could be batched aren’t. Data that could be fetched once and cached gets fetched on every render. This is a known pattern — it’s one of the reasons AI-generated apps struggle in production.

Build Minutes

Every deployment is a build. Every build uses compute. Free tiers on most platforms give you between 100 and 500 build minutes per month. If you’re iterating aggressively — which is the whole point of using an AI coding agent — those minutes evaporate quickly.

A single Next.js build can take 3–8 minutes depending on project size. Twenty deploys a day is easily 60–160 build minutes per day. Do the math: you can blow through a monthly free tier in three to five days of active development.

The fix is simple: don’t auto-deploy every branch and PR. Configure your deployment settings so only main triggers production builds. Use preview deployments sparingly or disable them entirely until you need them.

Preview Deployments (and the Bandwidth They Use)

Speaking of preview deployments — they’re on by default on Vercel and most similar platforms. Every pull request, every branch push, gets its own live URL. This is useful for team review. It’s expensive if you’re a solo founder pushing twenty branches a day.

Preview deployments also count toward your bandwidth and storage limits. If you’re testing with any amount of media or file uploads, this adds up.

Data Transfer and Bandwidth

Bandwidth is the cost that hides in plain sight. Most platforms offer 100GB/month on free tiers, which sounds enormous. But if your app serves images, large JSON payloads, or does a lot of frontend hydration (common with AI-generated React apps that aren’t optimized), you can hit limits faster than expected.

The deeper issue: AI-generated apps often don’t implement proper caching headers. Every asset gets re-fetched instead of served from a CDN cache. This multiplies bandwidth costs significantly.

Database Connection Limits and Compute

If you’re using Supabase or a similar managed database on the free tier, you have connection pool limits. Serverless functions, by design, spin up many short-lived instances. Each instance wants its own database connection. Without connection pooling configured (PgBouncer on Supabase, for example), you’ll hit connection limits and either get errors or be forced to upgrade.

On the free Supabase tier, you get 60 concurrent connections. A moderately active serverless app can exhaust those fast.

Platform-Specific Cost Traps

Vercel

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

Vercel’s pricing has gotten more complex. The main cost traps:

Edge middleware running on every request, even static files, counts toward invocations.
Image optimization — Vercel’s next/image component does on-demand image optimization, which consumes function invocations and has a separate quota.
Fluid compute (formerly always-on functions) — enabled by default on some configurations, charges for CPU time even when idle.

The configuration fix: in your vercel.json, explicitly set function regions to a single region (not all regions by default), and be deliberate about what routes use edge functions vs. standard serverless.

Railway

Railway’s billing model is usage-based, which is refreshingly transparent but can surprise people who assume it’s cheap. You pay for CPU and memory even when your app is doing nothing, because Railway keeps services running by default.

The main trap: running multiple services (web server, database, Redis) without sleep/idle settings. A PostgreSQL instance on Railway costs roughly $0.000463/CPU hour and $0.000231/GB/hour for memory. Small individually. Significant if you forget to shut down a staging environment.

Fly.io

Fly.io charges for machine-hours — the time your VMs are running. Apps don’t auto-sleep unless you configure it. The default fly.toml scales up but doesn’t necessarily scale down to zero.

If you deploy with the defaults and forget about it, you’ll pay for idle compute around the clock.

Render

Render’s free tier spins down inactive services after 15 minutes of inactivity, which is good for cost. The trap: cold starts on the free tier can take 30–60 seconds, which pushes some people to upgrade to a paid plan with always-on compute, jumping from $0 to $7/month per service.

For AI apps that need a real backend, you often end up paying for at least two services: the web app and the database. Small amounts individually, but worth factoring in.

The Deeper Issue: Architecture, Not Just Settings

Tweaking platform settings gets you partway there. But the bigger cost driver is often the architecture of the app itself.

AI coding agents generate code that works. They don’t always generate code that’s efficient at scale or cost-optimized for your platform. This is worth understanding clearly before you deploy. As explored in the hidden costs of AI-assisted development, the real bills often come from code patterns, not platform choices.

The N+1 Query Problem

This is a classic backend inefficiency that AI-generated code often introduces. For every item in a list, the app makes a separate database query instead of fetching everything in one query. Ten items = ten database round trips. A hundred items = a hundred.

At low traffic this is invisible. As usage grows, it creates both performance problems and cost problems (more database compute, more serverless function duration).

Over-Fetching Data

AI-generated API routes often return entire database rows when only a few fields are needed. This means:

Larger payloads = more bandwidth
More data to deserialize = more function CPU time
More data sent over the network = slower app

The fix is selecting only the columns you need. Simple, but AI agents don’t always do this by default.

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

Missing Caching

A well-configured deployment should aggressively cache static assets and set appropriate Cache-Control headers on API responses where the data doesn’t change often. AI-generated apps often skip this entirely, meaning every user request hits your serverless function and database instead of a cached response.

Even adding stale-while-revalidate headers can cut your function invocation count in half on read-heavy routes.

A Configuration Checklist Before You Deploy

Before you push to production, go through these settings. Most of them are free to implement and can cut your infrastructure costs significantly.

Vercel-Specific

Set vercel.json to limit function deployment to a single region unless you genuinely need multi-region
Disable automatic preview deployments for branches you don’t need reviewed
Check if image optimization is enabled and whether you actually need it
Set maxDuration on serverless functions to a conservative value (10–15 seconds) to prevent runaway executions
Audit which routes use Edge Runtime vs. Node.js runtime — Edge is cheaper per invocation but not always appropriate

Database

Configure connection pooling (PgBouncer on Supabase, or the built-in connection pooler)
Add indexes on columns you query or filter by frequently
Review your ORM’s query output — check whether it’s doing N+1 queries

General Platform Settings

Set up spending alerts/limits at your cloud provider level
Disable auto-deployment for non-production branches
Review all running services and shut down anything you’re not actively using
Configure auto-sleep or scale-to-zero on staging environments

Application-Level

Add Cache-Control headers to static assets (1 year for hashed filenames)
Add Cache-Control: s-maxage=60, stale-while-revalidate on API responses where appropriate
Audit API calls in your frontend — look for waterfalls and redundant fetches
Ensure environment variables aren’t being fetched at runtime when they could be inlined at build time

If you want a more comprehensive view of what needs to be in place before launch, the technical founder’s checklist before launching a web app covers this well.

The AI Agent Cost Layer on Top

There’s another cost dimension that’s easy to overlook: the AI model API costs from the coding agent itself.

When you’re using Claude Code, Cursor, or a similar agent to iterate quickly, you’re making a lot of API calls. Long context windows, large codebases, repeated back-and-forth — these add up faster than you’d expect. This is separate from your deployment costs but compounds the overall financial picture.

Understanding how to manage token costs when building with AI agents is worth doing before you start iterating heavily. The short version: keep your context focused, break large tasks into smaller ones, and watch your session length.

What Production-Ready Actually Requires

There’s a meaningful gap between “this app works in my testing” and “this app won’t surprise me with a bill.” Understanding what production-ready actually means goes beyond code quality.

A production-ready deployment has:

Spending limits configured at the platform level, not just monitored after the fact
Logging and alerting that tells you when something is behaving unexpectedly (unexpected spike in function calls, high error rates)
Caching implemented at the appropriate layers
Database queries audited for efficiency
Environment parity — your local environment should behave close enough to production that you catch issues before they cost you money

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

The guide to deploying AI agents to production with budget guardrails is specifically useful if your app is doing AI inference as part of its core functionality — that’s yet another cost layer that needs explicit limits.

How Remy Handles This

The infrastructure cost problem is fundamentally a configuration and architecture problem. Most developers encounter it because they’re assembling pieces — hosting here, database there, auth somewhere else — and the defaults don’t talk to each other intelligently.

Remy takes a different approach. You describe your application in a spec — annotated markdown that captures what the app does, the data it works with, and the rules it follows. Remy compiles that into a full-stack app with a real backend, a SQL database, auth, and deployment already wired together. The infrastructure is integrated, not assembled.

This matters for costs because the infrastructure decisions are made coherently rather than as separate choices. You’re not picking a Vercel plan, then a Supabase tier, then figuring out how to get them to talk to each other with appropriate connection pooling. The stack is built as a unit.

There are no platform fees during the alpha — you pay for inference costs (the raw model usage) at cost, without markup. Apps deploy on push to main and are live on a real URL. The architectural patterns Remy generates are designed to work correctly at deployment — proper backend methods, typed queries, auth with real sessions — not just in a sandbox.

If you’re tired of debugging infrastructure bills instead of building features, try Remy at mindstudio.ai/remy.

The Broader Pattern: Fast Shipping Needs Cost Awareness

AI tools have genuinely changed the speed at which a technical founder can ship. But speed creates new categories of risk that didn’t exist when iteration was slow.

When you deployed once a week, overspending on infrastructure was hard. When you deploy twenty times a day, it’s easy. The tools for building fast need to be paired with the discipline to configure correctly.

The indie hacker’s guide to shipping full-stack apps with AI puts this well: speed is only valuable if the thing you shipped can keep running without draining your runway.

A few habits that prevent the $800 bill:

Set a billing alert on day one. Every major cloud platform lets you set an email alert at a dollar threshold. Set yours at $50. Then $100. You want to know before you’re surprised.
Review your active services weekly. It takes five minutes to audit what’s running and what’s idle.
Treat defaults as suspect. Every platform default is configured for a general use case, not your specific app. Review them before you push.
Add connection pooling before you get database errors. This one bites people the first time they have more than a handful of concurrent users.

Frequently Asked Questions

Why is my Vercel bill so high if I’m on a small project?

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

The most common culprits are preview deployments (every branch push creates a new deployment), serverless function invocations from API routes that get called too frequently, and image optimization consuming its own separate quota. Check your Vercel usage dashboard — it breaks down by category. Also look at whether you have middleware running on all routes, including static files.

Do I need to pay for Vercel to deploy a real app?

The Hobby plan is free and sufficient for many small apps. The issues usually arise when you hit the free tier limits — 100GB bandwidth, 100K function invocations, 100 build minutes. For a solo project with moderate traffic, you can stay on Hobby for a while if you configure things correctly. The jump to Pro ($20/month) is warranted when you need team features, more build minutes, or higher invocation limits consistently.

What’s the cheapest way to deploy a full-stack app?

For a full-stack app with a backend and database, a common cost-effective setup is Fly.io or Railway for the backend (with scale-to-zero configured), Supabase on the free tier for the database (PostgreSQL with connection pooling), and Vercel or Netlify for the frontend. Done carefully, this stack can run at essentially $0 until you have meaningful traffic — though you need to configure sleep/idle settings explicitly.

How do I set spending limits on AWS, GCP, or Azure?

All three major cloud providers support billing alerts, but only GCP and some AWS services support hard spending caps that actually stop service. On AWS, you set up a billing alert via CloudWatch and a budget in AWS Budgets — these send notifications but don’t automatically shut down services. GCP lets you set budget alerts and can cap certain API services. Azure has spending limits for some subscription types. The safest approach: set an alert at 50% of your expected monthly budget, another at 100%, and a final one at 150%.

Why do AI-generated apps sometimes use more infrastructure than hand-written code?

AI agents optimize for correctness and working functionality, not cost efficiency. They’ll generate patterns that work — database calls, API routes, frontend fetches — but won’t necessarily batch queries, implement caching, or minimize round-trips. The code is correct but not optimal for your cost profile. This is one of the hidden costs of AI-assisted development that doesn’t show up in the code review but does show up in the bill.

What should I check before deploying an AI app to production?

At minimum: configure spending alerts, set up connection pooling for your database, audit your frontend for redundant API calls, disable preview deployments you don’t need, set function timeout limits, and add basic caching headers. For a more complete list, review the 7 things you must set up before deploying an AI agent to production.

Key Takeaways

The biggest cost drivers are usually configuration, not code — preview deployments, missing connection pooling, and no caching are free to fix.
AI-assisted development means more deploys, which means more build minutes and more infrastructure events. Default settings weren’t designed for this pace.
Platform defaults favor capability over frugality. Review every default before you push to production.
Set billing alerts before you deploy, not after you get the invoice. Every platform has them. Use them.
Architecture matters as much as settings. N+1 queries, over-fetching, and missing caches create cost problems that no platform setting can fully fix.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

If you want to build without the infrastructure assembly — and without the surprise bills that come from misconfigured platforms — try Remy at mindstudio.ai/remy. The stack is integrated, the deployment is managed, and you’re not debugging Vercel settings instead of building your product.