AI Cybersecurity in 2025: How Agents Are Finding Zero-Day Exploits

The Machines Are Finding Holes Humans Missed

A team of university researchers published a paper in early 2024 that quietly alarmed the security community. Their finding: GPT-4, given the right scaffolding, could autonomously identify and exploit one-day vulnerabilities in real-world software with an 87% success rate. No human in the loop. No manual analysis. Just an AI agent reading CVE descriptions and writing working exploits.

That was one-day vulnerabilities — flaws that were already disclosed but unpatched. Zero-day exploits, the ones no one knows about yet, were supposed to be different. Harder. Requiring years of human expertise, intuition, and time.

That assumption is now outdated.

AI cybersecurity has entered a new phase in 2025. Agents aren’t just assisting analysts — they’re independently discovering zero-day vulnerabilities in production software, operating systems, and open-source libraries. This changes threat modeling, security hiring, vulnerability disclosure, and enterprise risk in ways most organizations aren’t prepared for.

Here’s what’s actually happening, why it matters, and what security teams should do about it.

What Zero-Day Vulnerabilities Are (and Why They’re So Dangerous)

A zero-day vulnerability is a security flaw that the software vendor doesn’t know exists yet. The name comes from the idea that developers have had “zero days” to fix it — because they don’t know it needs fixing.

RWORK ORDER · NO. 0001ACCEPTED 09:42

YOU ASKED FOR

Sales CRM with pipeline view and email integration.

✓ DONE

REMY DELIVERED

Same day.

yourapp.msagent.ai

AGENTS ASSIGNEDDesign · Engineering · QA · Deploy

These are the most valuable bugs in existence. Nation-state intelligence agencies pay millions for them. Criminal organizations exploit them silently for months before anyone notices. When a zero-day gets weaponized before a patch exists, defenders are completely blind.

The traditional discovery process looked like this:

A highly skilled security researcher spends weeks or months manually auditing code
They identify a suspicious pattern, then spend more time crafting a proof-of-concept exploit
If they’re responsible, they disclose it to the vendor and wait for a patch
If they’re not, they sell it — zero-day brokers like Zerodium have published price lists showing iOS exploits worth over $2.5 million

The bottleneck has always been human labor. There’s a finite number of people skilled enough to do this work, and there’s an infinite amount of code to audit.

AI agents are attacking that bottleneck directly.

How AI Agents Are Finding Zero-Days

The Architecture Behind Autonomous Vulnerability Discovery

Modern AI agents finding zero-days aren’t just running a language model over source code and hoping for the best. They combine several capabilities:

Static analysis at scale. LLMs can read and reason about large codebases, identifying patterns that match known vulnerability classes — buffer overflows, use-after-free bugs, SQL injection vectors, race conditions — faster than any human team.

Dynamic fuzzing with intelligent guidance. Traditional fuzzers generate random inputs and hope something breaks. AI-guided fuzzers use model reasoning to generate targeted inputs that are more likely to trigger edge cases in specific code paths.

Exploit chain reasoning. Finding a bug is one thing. Turning it into a working exploit requires understanding memory layout, bypassing mitigations like ASLR and stack canaries, and chaining multiple smaller flaws together. AI agents are now demonstrating the ability to reason through these multi-step chains autonomously.

Feedback loop learning. Agents can observe the results of their probes — crashes, error codes, timing differences — and adjust their approach. This iterative loop dramatically compresses the time from “suspicious code pattern” to “working proof of concept.”

Google’s Big Sleep and Project Naptime

Google DeepMind’s Project Zero team has been at the frontier of this work. Their “Project Naptime” framework (later renamed “Big Sleep”) gave an AI agent the ability to use debuggers, run code, examine memory, and iterate on exploit attempts — the same workflow a human vulnerability researcher would use.

In late 2024, Big Sleep found a real, previously unknown vulnerability in SQLite — a zero-day in production software used by billions of devices. It was the first publicly documented case of an AI agent finding an exploitable memory safety bug in widely-deployed software that human researchers had missed.

The SQLite finding wasn’t a lab demo. It was real code, real impact, and real proof that the capability exists at scale.

DARPA’s AI Cyber Challenge

DARPA’s AIxCC (AI Cyber Challenge) competition ran through 2024 with $20 million in prizes. Teams built autonomous systems that could find and patch vulnerabilities in open-source software. The results were striking — top competitors demonstrated systems that could identify security flaws in Linux kernel code and other critical infrastructure software.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

DARPA’s stated goal was to accelerate the defensive side of this equation: if AI can find vulnerabilities, AI should be able to patch them too. The best teams’ systems did both.

Academic Research: The 87% Number

The University of Illinois Urbana-Champaign study that made waves in 2024 showed GPT-4 agents successfully exploiting known vulnerabilities when given CVE descriptions. The same agents, when given no description at all, still found exploits about 7% of the time.

That 7% number sounds small. But at AI speed and cost — where an agent can audit thousands of code paths simultaneously — 7% across a massive surface area means the actual vulnerability discovery rate is significant.

Follow-up research has pushed these numbers higher with better scaffolding, more targeted prompting, and specialized fine-tuning on security-relevant data.

The Double-Edged Problem

Defensive AI: The Case for Speed

The optimistic case for AI-powered vulnerability discovery is straightforward: there’s far more code in the world than security researchers can audit, and the gap is growing. Open-source software underpins virtually every enterprise stack, and most of it gets minimal security review.

AI agents offer a path to continuous, automated security auditing. Instead of running a pentest once a year, organizations could run AI-powered vulnerability scanning continuously — catching flaws before attackers do.

Major cloud providers are already moving this direction. Microsoft’s Security Copilot, Google’s security AI investments, and a wave of startups are all building tools that use AI to assist — and increasingly automate — vulnerability discovery and triage.

For defenders, speed matters enormously. The average time between vulnerability disclosure and active exploitation has shrunk to days, sometimes hours. An AI that can identify and help patch a zero-day in hours rather than weeks is genuinely valuable.

Offensive AI: The Proliferation Risk

The pessimistic case is equally straightforward: the same capability that helps defenders find bugs also helps attackers find them.

Historically, creating a zero-day exploit required elite skill — the kind you find at nation-state intelligence agencies or among a small community of independent researchers. AI agents lower that bar. A motivated attacker with access to capable AI models and basic security knowledge can now conduct vulnerability research that previously required years of expertise.

The Cybersecurity and Infrastructure Security Agency (CISA) and its counterparts in other countries have flagged AI-assisted exploitation as an escalating threat. Nation-state actors — already sophisticated — are among the most likely early adopters of these capabilities.

The asymmetry is uncomfortable: defenders need to protect everything. Attackers need to find one hole.

The Open-Source Surface Area Problem

Open-source software creates a specific challenge. The code is public, which means anyone — human or AI — can audit it. Security researchers can find and responsibly disclose vulnerabilities. But so can malicious actors who have no intention of telling anyone.

The Log4j vulnerability in 2021 demonstrated what happens when a critical flaw exists in a library used by millions of applications. Finding that class of bug before attackers do — at scale, continuously — is exactly what AI agents are now positioned to do.

The question is who runs those agents and what they do with the results.

What This Means for Enterprise Security Teams

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Your Attack Surface Just Got Harder to Defend

If AI agents can find zero-days faster and cheaper than human researchers, your threat model needs updating. The list of organizations with the capability to discover and exploit unknown vulnerabilities in your software just got longer.

That means:

Patch cycles need to accelerate. The window between vulnerability disclosure and exploitation is shrinking. Security teams that take weeks to test and deploy patches are exposed.
Assume more unknown vulnerabilities exist. Runtime protection, network segmentation, and anomaly detection matter more when you can’t assume your code is clean.
Third-party and open-source risk increases. Your direct code gets regular review. The open-source libraries you depend on may not.

AI for Defense: What Actually Works Now

Several AI-powered defensive capabilities are mature enough to deploy today:

Automated code scanning. Tools like GitHub’s Copilot Autofix, Snyk’s AI features, and dedicated security platforms use AI to flag vulnerable code patterns during development — before they ship.

Threat intelligence synthesis. AI agents can monitor threat feeds, CVE databases, vendor advisories, and dark web sources simultaneously, surfacing relevant intelligence faster than any human team.

Incident response triage. When alerts fire, AI agents can contextualize them against your environment, assess severity, and recommend responses — compressing the time between detection and containment.

Red team augmentation. Security teams are using AI to run continuous, automated penetration testing against their own infrastructure — the kind of ongoing pressure testing that previously required expensive external consultants.

The Talent and Tooling Gap

One persistent challenge: the security professionals who can effectively deploy and oversee AI agents are rare. You need people who understand both security tradecraft and how to work with AI systems — how to evaluate their outputs, catch their errors, and build guardrails.

This is creating a new category of security role that most organizations don’t have a hiring pipeline for yet.

The Vulnerability Disclosure Landscape Is Changing

Responsible Disclosure Gets Complicated

The security community has a long-standing norm around responsible disclosure: find a bug, notify the vendor, give them time to patch, then publish. This norm mostly works when the discoverer is a human researcher with professional incentives to behave responsibly.

AI agents complicate this. If an agent autonomously discovers a zero-day, who is the “discoverer”? Who decides to disclose? What’s the right timeline when discovery happens at machine speed?

The organizations running these agents need clear policies before their systems find something significant. Right now, most don’t have them.

Bug Bounty Programs and AI Submissions

Several major bug bounty programs have updated their rules to address AI-assisted submissions. The concern is that AI agents could flood programs with low-quality, machine-generated vulnerability reports — or that someone could deploy an agent to systematically mine bounties at scale.

HackerOne and Bugcrowd have both updated their terms to require disclosure of AI assistance and to address quality thresholds. This is an evolving area, and the norms aren’t settled.

Building Security Workflows with AI Agents

Where AI Agents Fit in a Security Stack

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

Most security teams don’t need to build their own vulnerability discovery AI — that’s a specialized, resource-intensive capability. But there’s a large adjacent space where AI agents provide immediate value without requiring research-level sophistication:

Automated CVE monitoring and alerting — agents that watch for new vulnerabilities affecting your specific software inventory
Security questionnaire processing — automating the intake and response to vendor security assessments
Log and alert triage — reducing the noise from SIEM systems by having agents pre-analyze and prioritize
Compliance documentation — maintaining and updating security documentation for SOC 2, ISO 27001, and similar frameworks

Where MindStudio Fits

This is where platforms like MindStudio become relevant for security-adjacent workflows. Building a custom AI agent used to mean standing up infrastructure, managing API keys, handling rate limiting, and writing significant amounts of glue code.

MindStudio’s visual builder lets security teams create agents that connect to their existing tools — Slack, Jira, Google Workspace, and hundreds of others — without that overhead. A team could build an agent that monitors a CVE feed, cross-references against their software inventory, drafts prioritized remediation tickets, and notifies the relevant engineer, all without writing code.

The same agent architecture works for security questionnaire automation, incident report generation, or compliance workflow management — the kind of operational security work that’s important but doesn’t require a $500-an-hour consultant.

For teams that do want to get into more technical territory, MindStudio supports custom Python and JavaScript functions, so agents can call external security APIs, run scripts, or integrate with specialized tooling.

You can start building for free at MindStudio — most simple agents take under an hour to build and deploy.

If you’re interested in what kinds of agents security teams are building, exploring the MindStudio use case library gives a concrete sense of what’s practical today.

FAQ: AI and Zero-Day Vulnerability Discovery

Can AI really find zero-day vulnerabilities on its own?

Yes, in demonstrated research settings. Google’s Big Sleep agent found a real zero-day in SQLite in 2024. Academic research has shown LLM-powered agents successfully exploiting vulnerabilities with minimal human assistance. The capability exists — the questions now are about scale, reliability, and who has access to it.

How is AI finding zero-days different from traditional fuzzing?

Traditional fuzzing generates random or mutated inputs and monitors for crashes. It’s effective but inefficient — most inputs don’t trigger interesting behavior. AI-guided fuzzing uses model reasoning to generate targeted inputs, focusing on code paths that pattern-match to known vulnerability classes. This dramatically improves efficiency. AI agents also go further by reasoning about exploit chains and mitigation bypasses, not just finding crashes.

Should enterprises be worried about AI-powered attacks targeting them?

Yes, with appropriate calibration. Nation-state actors and sophisticated criminal groups are the most likely early adopters of AI-assisted exploitation. For most enterprises, the more immediate risk is the compression of time between public vulnerability disclosure and active exploitation — AI accelerates attacker weaponization of known CVEs, not just zero-days. Patching speed and attack surface reduction matter more than ever.

How are AI models being used in defensive security right now?

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

Practically deployed defensive AI capabilities include: automated code scanning during development, threat intelligence aggregation and summarization, SIEM alert triage and prioritization, automated incident response playbooks, and security documentation maintenance. The most mature tools are integrated into existing developer and security workflows rather than deployed as standalone systems.

What are the risks of using AI for security research?

The main risks are: AI agents producing false positives (wasting researcher time on non-vulnerabilities), agents making mistakes that could cause unintended damage during active testing, responsible disclosure complications when AI-assisted discovery happens at scale, and the same tools being used offensively. Organizations using AI for security research need clear governance policies, human review processes, and scoped testing environments.

Will AI replace human security researchers?

Not in the near term, and probably not entirely. What AI agents excel at — systematic pattern matching, high-volume analysis, exploit chain reasoning on known vulnerability classes — is a subset of what skilled human researchers do. Human intuition, creativity, and contextual judgment still matter for novel attack categories and complex system analysis. The more likely near-term outcome is that AI dramatically amplifies the productivity of skilled researchers, with a smaller team capable of covering much more ground.

Key Takeaways

AI agents are no longer just assisting security researchers — they’re autonomously discovering zero-day vulnerabilities in real-world software, as demonstrated by Google’s Big Sleep and academic research in 2024.
The same capability that helps defenders find bugs faster also lowers the barrier for malicious actors to conduct sophisticated vulnerability research.
Enterprise security teams need to update their threat models: faster patch cycles, stronger runtime protections, and clearer policies on AI-assisted vulnerability disclosure.
Practical AI security applications available today include CVE monitoring agents, alert triage automation, compliance workflow management, and security documentation generation.
The vulnerability disclosure norms and bug bounty rules built around human researchers are actively being rewritten to account for AI-assisted discovery.
Building operational security workflows with AI agents doesn’t require a research budget — platforms like MindStudio make it practical to deploy capable agents connected to your existing security tooling without engineering overhead.