We Let Autonomous AI Attackers Hack Our Own App
The most useful test of a pentest tool is running it against yourself. Here's what we learned pointing Deep Scan at our own production app.
Why test your own product this way
A scanner vendor telling you their scanner is good is not evidence. Running the actual tool against the actual product, with the same authorization and scope constraints a real customer would use, is a much more honest test. So we did.
What the agent actually did
Given a verified target and consent, the agent worked through the same categories a human pentester would: authorization boundaries on every API route, IDOR checks across resource IDs, webhook signature verification, rate-limit probing on AI endpoints, and workflow-order checks on multi-step flows like checkout and onboarding. Each attempt that succeeded produced a reproducible proof-of-concept — request, response, and the exact steps to reproduce it.
What surprised us
- The agent didn't just re-run known patterns — it chained a low-severity information leak with a separate endpoint to escalate the impact, the kind of connection static scanning doesn't make
- Findings that looked cosmetic in isolation were flagged as higher severity once the agent demonstrated a working exploit chain
- The free re-test caught a case where a fix closed the original vector but left an adjacent one open — exactly the scenario a one-time manual pentest wouldn't catch until the next engagement, months later
What this means for "can AI really pentest"
The honest answer from our own test: for web application logic and configuration-driven vulnerabilities, yes — reliably, and with evidence, not guesses. See Can AI Perform Penetration Testing? for where the current limits are.
Run the same AI pentest against your own app.
Start Scanning →