Jul 5, 2026

We Let Autonomous AI Attackers Hack Our Own App

The most useful test of a pentest tool is running it against yourself. Here's what we learned pointing Deep Scan at our own production app.

Why test your own product this way

A scanner vendor telling you their scanner is good is not evidence. Running the actual tool against the actual product, with the same authorization and scope constraints a real customer would use, is a much more honest test. So we did.

What the agent actually did

Given a verified target and consent, the agent worked through the same categories a human pentester would: authorization boundaries on every API route, IDOR checks across resource IDs, webhook signature verification, rate-limit probing on AI endpoints, and workflow-order checks on multi-step flows like checkout and onboarding. Each attempt that succeeded produced a reproducible proof-of-concept — request, response, and the exact steps to reproduce it.

What surprised us

The agent didn't just re-run known patterns — it chained a low-severity information leak with a separate endpoint to escalate the impact, the kind of connection static scanning doesn't make
Findings that looked cosmetic in isolation were flagged as higher severity once the agent demonstrated a working exploit chain
The free re-test caught a case where a fix closed the original vector but left an adjacent one open — exactly the scenario a one-time manual pentest wouldn't catch until the next engagement, months later

What this means for "can AI really pentest"

The honest answer from our own test: for web application logic and configuration-driven vulnerabilities, yes — reliably, and with evidence, not guesses. See Can AI Perform Penetration Testing? for where the current limits are.

Run the same AI pentest against your own app.

Start Scanning →

We Let Autonomous AI Attackers Hack Our Own App

Why test your own product this way

What the agent actually did

What surprised us

What this means for "can AI really pentest"

Related articles