Skip to content

We Let Autonomous AI Attackers Hack Our Own App

The most useful test of a pentest tool is running it against yourself. Here's what we learned pointing Deep Scan at our own production app.

Why test your own product this way

A scanner vendor telling you their scanner is good is not evidence. Running the actual tool against the actual product, with the same authorization and scope constraints a real customer would use, is a much more honest test. So we did.

What the agent actually did

Given a verified target and consent, the agent worked through the same categories a human pentester would: authorization boundaries on every API route, IDOR checks across resource IDs, webhook signature verification, rate-limit probing on AI endpoints, and workflow-order checks on multi-step flows like checkout and onboarding. Each attempt that succeeded produced a reproducible proof-of-concept — request, response, and the exact steps to reproduce it.

What surprised us

  • The agent didn't just re-run known patterns — it chained a low-severity information leak with a separate endpoint to escalate the impact, the kind of connection static scanning doesn't make
  • Findings that looked cosmetic in isolation were flagged as higher severity once the agent demonstrated a working exploit chain
  • The free re-test caught a case where a fix closed the original vector but left an adjacent one open — exactly the scenario a one-time manual pentest wouldn't catch until the next engagement, months later

What this means for "can AI really pentest"

The honest answer from our own test: for web application logic and configuration-driven vulnerabilities, yes — reliably, and with evidence, not guesses. See Can AI Perform Penetration Testing? for where the current limits are.

Run the same AI pentest against your own app.

Start Scanning →

Related articles

We Let Autonomous AI Attackers Hack Our Own App | Vezraa