The Agents pipeline runs automatically after a dork search when --analyze is active (or when the output file is .json). It requires no external AI — every step uses regex, heuristics, and structural analysis only.
# Automatic (output .json triggers a prompt)
python dorkeye.py -d "site:example.com" -o results.json
# Explicit
python dorkeye.py -d "site:example.com" -o results.json --analyze
# With page content download
python dorkeye.py -d "site:example.com" -o results.json --analyze --analyze-fetch
# Standalone on saved results
python dorkeye_agents.py Dump/results.json --analyze-fetch --analyze-fmt html
| Step | Agent | Input | Output |
|---|---|---|---|
| 1 | TriageAgent | All results | triage_score, triage_label, triage_reason per result |
| 2 | PageFetchAgent | HIGH / CRITICAL results | page_content, response_headers, fetch_status |
| 3 | SecurityAgent | page_content + response_headers + URL |
security_verdict dict with threat_level, threat_score, indicators |
| 4 | HeaderIntelAgent | response_headers |
header_intel (info leaks, missing headers, outdated versions) |
| 5 | TechFingerprintAgent | page_content + headers + URL |
tech_fingerprint (techs, versions, CVE dorks) |
| 6 | SecretsAgent | page_content + snippet |
secrets list with type, value, severity, context |
| 7 | PiiDetectorAgent | page_content + snippet |
pii_found list with type, censored value, context |
| 8 | EmailHarvesterAgent | page_content + snippet |
emails_found list, global dedup |
| 9 | SubdomainHarvesterAgent | All text fields | subdomains list per result, global map |
| 10 | LLM Analysis | All triaged results | analysis dict (optional — requires dorkeye_llm_plugin.py) |
| 11 | ReportAgent | Everything above | HTML / MD / JSON / TXT report |
| 12 | DBScanAgent | Hosts extracted from results | db_scan per-host findings, DBScanReport |
| 13 | DorkCrawlerAgent | CVE dorks + subdomain seeds | Follow-up dork results merged into pipeline |
Assigns a score (0–100) and label (CRITICAL / HIGH / MEDIUM / LOW / SKIP) to every result.
Scoring — two phases:
Phase 1 — regex pattern matching (cap: 60 points):
| Pattern matched | Points |
|---|---|
.env, .git, .sql, backup files |
38 |
Private key (BEGIN PRIVATE KEY) |
45 |
AWS key ID (AKIA...) |
42 |
| JWT token in text | 36 |
| phpMyAdmin / Adminer / pgAdmin | 35 |
| API key / secret token | 28 |
Config file (config.php, settings.py) |
28 |
| SQLi candidate URL pattern | 26 |
| Credentials / password in text | 24 |
| DevOps panels (Jenkins, Kibana, Grafana) | 18–22 |
| Cloud storage URLs | 18 |
| Directory listing | 22 |
| Server info exposed | 20 |
| Log files | 20 |
| Error / debug / traceback | 14 |
| … 8 more rules | varies |
Phase 2 — runtime bonuses from existing result data:
| Condition | Bonus |
|---|---|
SQLi confirmed — confidence critical |
+30 |
SQLi confirmed — confidence high |
+22 |
SQLi confirmed — confidence medium or low |
+12 |
accessible == True and status_code == 200 |
+8 |
| URL has ≥ 5 GET parameters | +10 |
| URL has 2–4 GET parameters | +5 |
Labels:
| Score | Label |
|---|---|
| ≥ 90 | CRITICAL |
| ≥ 70 | HIGH |
| ≥ 50 | MEDIUM |
| ≥ 20 | LOW |
| < 20 | SKIP |
Threat-detection middleware that operates in two modes:
--analyze pipeline) — analyses and tags every result with a security_verdict, no blocking.security_blocked = True) results classified as DANGEROUS or CRITICAL.It also hooks into the live scanning flow via security_scan_hook(url, content, headers) so threats can be intercepted before results are saved.
Detection categories:
| Category | Description |
|---|---|
phishing |
Brand impersonation, credential harvesting pages, suspicious redirects |
malware |
JS code execution, obfuscated payloads, file droppers |
exploit |
Reverse shells, SQLi, XXE, SSTI, deserialization payloads |
obfuscation |
Hex/Unicode escaping, base64 chains, high-entropy strings |
suspicious_pattern |
Hidden iframes, missing security headers, executable downloads |
Threat scoring — weighted model:
The final threat_score (0–100) is computed as:
| Score range | Threat level |
|---|---|
| ≤ 15 | CLEAN |
| 16 – 40 | LOW |
| 41 – 70 | SUSPICIOUS |
| 71 – 90 | DANGEROUS |
| > 90 | CRITICAL |
DANGEROUS and CRITICAL results are automatically blocked in active mode.
CLI flags:
--no-security # Disable SecurityAgent entirely
--security-mode active # active: block DANGEROUS/CRITICAL | passive: report only (default)
--security-quarantine # Save blocked content to dorkeye_quarantine/
Output per result:
{
"security_verdict": {
"url": "https://target.com/shell.php",
"threat_level": "DANGEROUS",
"threat_score": 78,
"badge": "🔴 DANGEROUS",
"blocked": false,
"summary": "Reverse shell pattern detected in page content",
"scan_duration_ms": 12.4,
"timestamp": "2025-01-01T12:00:00+00:00",
"indicators": [
{
"category": "exploit",
"description": "Reverse shell pattern",
"severity": 60,
"evidence": "bash -i >& /dev/tcp/..."
}
]
}
}
Pipeline-level output keys (added to the top-level report):
| Key | Content |
|---|---|
security_stats |
Counters: clean, low, suspicious, dangerous, critical, blocked, mode |
security_threats |
List of all verdicts with threat_level ≥ LOW |
Inline usage (scanning flow):
from dorkeye_agents import security_scan_hook, get_security_agent
# Quick hook (uses global singleton)
verdict = security_scan_hook(url, response_text, resp_headers)
if verdict.blocked:
continue # skip malicious result
# Fine-grained control
agent = get_security_agent(mode="active", quarantine_dir="dorkeye_quarantine")
verdict = agent.scan_single(url, content, headers)
Downloads the actual HTML content of HIGH and CRITICAL results for deeper analysis.
Features:
response_headers dict and fetch_status code into the result — consumed by HeaderIntelAgent at zero extra HTTP costCLI flags:
--analyze-fetch # enable download
--analyze-fetch-max 50 # download up to 50 pages (default: 20)
Analyzes response_headers saved by PageFetchAgent. Zero additional HTTP requests.
Info leak detection — scans these headers:
server · x-powered-by · x-aspnet-version · x-aspnetmvc-version
x-generator · x-drupal-cache · x-wordpress-cache
x-runtime · x-rack-cache · via · x-debug · x-cache-debug
Outdated version detection — extracts version strings for: Apache, Nginx, PHP, OpenSSL, IIS, Tomcat, Jetty, Lighttpd.
Missing security headers — flags absence of:
| Header | Risk |
|---|---|
Strict-Transport-Security |
HSTS absent — MITM risk |
Content-Security-Policy |
CSP absent — XSS risk |
X-Frame-Options |
Clickjacking protection absent |
X-Content-Type-Options |
MIME sniffing protection absent |
Referrer-Policy |
Referrer-Policy absent |
Permissions-Policy |
Permissions-Policy absent |
Output per result:
{
"header_intel": {
"info_leaks": [{"header": "x-powered-by", "value": "PHP/5.6.40", "version": "PHP/5.6"}],
"missing_security": [{"header": "content-security-policy", "reason": "CSP absent — XSS risk"}],
"outdated": [{"header": "server", "value": "Apache/2.2.34", "version": "Apache/2.2"}]
}
}
Identifies technologies from page_content, response_headers, snippet, URL, and title. Attempts version extraction where possible.
35 signatures in 7 categories:
| Category | Technologies |
|---|---|
| CMS | WordPress, Joomla, Drupal, Magento, PrestaShop, TYPO3, Shopify, Wix |
| Framework | Laravel, Django, Rails, Flask, Express.js, Next.js, Nuxt.js |
| JS libraries | jQuery (versioned), React, Vue.js, Angular, Bootstrap (versioned) |
| Server | Apache, Nginx, IIS, OpenSSL (all versioned) |
| Language | PHP, Python, Node.js (all versioned) |
| DevOps | Jenkins, GitLab, Kibana, Grafana, Docker, Kubernetes, Elasticsearch |
| DB panels | phpMyAdmin, Adminer, pgAdmin |
CVE dork generation — for 10 tech families, targeted dorks are generated and fed to DorkCrawlerAgent:
site:target.com inurl:wp-login.php
site:target.com inurl:xmlrpc.php
site:target.com inurl:app/kibana
Output per result:
{
"tech_fingerprint": {
"techs": [
{"name": "WordPress", "category": "cms"},
{"name": "jQuery", "category": "js_lib", "version": "3.6.0"},
{"name": "PHP", "category": "lang", "version": "7.4"}
],
"cve_dorks": ["site:target.com inurl:wp-login.php", "..."]
}
}
Scans page_content and snippet for 50+ credential and secret patterns.
Secret categories with severity:
| Severity | Types |
|---|---|
| CRITICAL | Private keys, AWS keys, bcrypt hashes, NTLM hashes, Stripe keys |
| HIGH | DB connections, JWTs, GCP keys, Azure keys, GitHub PATs, passwords, SendGrid, Twilio, GitLab PAT, Docker PAT, NPM token |
| MEDIUM | Generic API keys, tokens, Slack keys, webhooks, SSH credentials, .env variables, Mailgun, Heroku |
| LOW | MD5 / SHA1 / SHA256 / SHA512 hashes, internal IPs |
Features:
severity field on every finding$2y$, MD5 (32 hex), SHA1 (40 hex), SHA256 (64 hex), SHA512 (128 hex), NTLM pairsOutput per result:
{
"secrets": [
{
"type": "AWS_KEY",
"detection": "REGEX",
"value": "AKIA…0A2",
"confidence": "HIGH",
"severity": "CRITICAL",
"context": "...aws_access_key_id = AKIA...",
"source": "https://target.com/config.php",
"desc": "AWS Access Key ID"
}
]
}
Detects personally identifiable information, separated from SecretsAgent by design — PII requires different handling than technical credentials. Patterns are organised by geographic area.
Detected types:
| Type | Coverage |
|---|---|
EMAIL |
Standard email format — global |
PHONE_US |
US/Canada — NANP format with optional +1 |
PHONE_EU |
EU + UK + CH + NO — 22 country codes (+30 to +421) |
PHONE_ME |
Middle East — EG, TR, AF, IR, LB, JO, SY, IQ, KW, SA, YE, OM, PS, AE, IL, BH, QA |
PHONE_AS |
Asia-Pacific — MY, AU, ID, PH, NZ, SG, TH, JP, KR, VN, CN, HK, MO, KH, LA, BD, TW, IN, PK, LK, MM |
IBAN |
Generic IBAN — covers EU, UK, and Middle East banking formats |
TAX_ID_US |
SSN (NNN-NN-NNNN) and EIN (NN-NNNNNNN) |
TAX_ID_EU |
EU VAT number with ISO country prefix (DE, FR, IT, ES, PL, and 18 more) |
TAX_ID_ME |
Keyword-anchored: SA VAT (15 digits), AE TRN, EG, TR, IR |
TAX_ID_AS |
IN PAN card, CN USCC (18 chars), JP My Number, KR TRN, SG UEN, AU ABN |
NIN_EU |
EU national identity numbers — BSN, PESEL, personnummer, SVNR, NIR |
NID_ME |
Emirates ID (784-format), SA national ID, keyword-anchored |
NID_AS |
SG NRIC, KR RRN, IN Aadhaar (XXXX XXXX XXXX), keyword-anchored |
CREDIT_CARD |
Visa, Mastercard, Discover, Amex — Luhn-validated |
SSN_US |
US SSN with exclusion of invalid blocks (000, 666, 9xx) |
DOB |
Date of birth — keyword-anchored, multilingual labels (EN/ES/DE/AR/ZH/KO) |
PASSPORT |
Generic machine-readable passport format — global |
PUBLIC_IP |
Non-RFC-1918, non-loopback IPv4 — global |
Credit card numbers are validated with the Luhn algorithm — false positives from random numeric strings are eliminated. Values are censored to 4 visible characters per end.
Collects email addresses from snippet and page content, deduplicates globally across all results, and categorizes by prefix.
| Category | Prefix patterns |
|---|---|
admin |
admin, administrator, root, sysadmin, webmaster, hostmaster, postmaster |
security |
security, abuse, vuln, pentest, csirt, cert, soc, noc, infosec |
info |
info, contact, hello, support, help, service, sales, marketing |
noreply |
noreply, no-reply, donotreply, mailer-daemon, bounce |
personal |
everything else |
Global dedup: same address found in 10 pages = counted once. Results sorted by category priority (admin first, noreply last).
Extracts subdomains from all text fields (URL, snippet, page_content, title). Deduplicates globally per base domain.
Base domain extraction: takes the last two labels — api.v2.target.com → target.com.
Follow-up dork generation — 3 dork variants per subdomain:
site:api.target.com
site:api.target.com inurl:admin
site:api.target.com inurl:.env OR inurl:.git
These are merged with TechFingerprintAgent’s CVE dorks and passed to DorkCrawlerAgent as seeds for the next round.
Scans exposed database ports on all unique hosts extracted from dork results. Runs after the main analysis pipeline and produces a dedicated DBScanReport saved alongside the main output file.
Location: DorkEye/Tools/db_portscan.py
Detection coverage:
| Service | Port(s) | Probe type |
|---|---|---|
| MySQL | 3306 | TCP banner |
| PostgreSQL | 5432 | TCP banner |
| MongoDB | 27017, 27018*, 27019* | OP_MSG isMaster handshake |
| Redis | 6379 | PING → +PONG |
| Elasticsearch | 9200, 9300 | HTTP GET / — checks cluster_name, version |
| CouchDB | 5984 | HTTP GET / — checks couchdb, Welcome |
| InfluxDB | 8086 | HTTP GET /ping (204 = alive) |
| Neo4j | 7474 | HTTP GET / — checks neo4j, bolt |
| Memcached | 11211 | stats\r\n → STAT |
| MSSQL | 1433 | TCP banner |
| Oracle | 1521 | TCP banner |
| Cassandra | 9042 | TCP banner |
| RethinkDB | 28015, 5000* | TCP banner |
| DB2 | 50000* | TCP banner |
| Riak | 8098 | HTTP GET / |
* non-default — included only when --ports is set explicitly.
Severity model:
| Outcome | Severity | Meaning |
|---|---|---|
| Port open + no-auth confirmed | CRITICAL | Data directly accessible without credentials |
| Port open + service banner confirmed | HIGH | Auth likely required but service is exposed |
| Port open, service unconfirmed | MEDIUM | Port responding, service unclear from banner |
| Port closed / filtered / timeout | INFO | Not reported in findings |
No-auth probe logic per service:
| Service | No-auth trigger |
|---|---|
| Redis | +PONG received after PING |
| Elasticsearch | HTTP 200 with cluster_name + version in body |
| CouchDB | HTTP 200 with couchdb + Welcome in body |
| InfluxDB | HTTP 204 on /ping |
| Neo4j | HTTP 200 with neo4j + bolt in body |
| MongoDB | isMaster / isWritablePrimary in OP_MSG reply |
| Memcached | STAT lines returned on stats command |
Dork-to-port hints — if a result’s URL, title, or snippet matches a known DB keyword, those ports are promoted to the front of the scan queue for that host:
| Keyword pattern | Hinted ports |
|---|---|
phpmyadmin, mysqladmin |
3306 |
pgadmin, postgresql |
5432 |
mongodb, robo3t |
27017, 27018 |
redis, redisinsight |
6379 |
elasticsearch, kibana |
9200, 9300 |
couchdb, fauxton |
5984 |
influx |
8086 |
neo4j |
7474 |
mssql, sqlserver |
1433 |
oracle, tns listener |
1521 |
cassandra |
9042 |
memcache |
11211 |
CLI flags:
--dbscan # Enable DBScanAgent in the pipeline
--dbscan-timeout 2.5 # TCP connect timeout in seconds (default: 2.5)
--dbscan-threads 60 # Worker threads per host (default: 60)
--dbscan-ports 3306 5432 6379 # Override default port list
--dbscan-max-hosts 200 # Max hosts to scan (default: 200)
--dbscan-stealth # Add 1.5–3.5s random delay between hosts
Standalone usage:
# Scan all hosts in a results file (default ports)
python db_portscan.py results.json
# Custom timeout and thread count
python db_portscan.py results.json --timeout 3 --threads 80
# Target specific ports only
python db_portscan.py results.json --ports 3306 5432 27017 6379
# Stealth mode with host cap
python db_portscan.py results.json --stealth --max-hosts 50
# Custom output path
python db_portscan.py results.json --out Dump/custom_scan
Output files:
Dump/<stem>_dbscan_<ts>.json # Full structured report
Dump/<stem>_dbscan_<ts>.txt # Human-readable summary, usable as reference list
Output structure (JSON):
{
"generated_at": "2025-01-01 12:00:00",
"stats": {
"hosts_scanned": 12,
"ports_scanned": 192,
"open_ports": 7,
"critical": 2,
"high": 3,
"medium": 2
},
"hosts": [
{
"host": "target.com",
"scanned": 16,
"duration": 4.12,
"critical": 1,
"high": 1,
"open_ports": [6379, 9200],
"findings": [
{
"host": "target.com",
"port": 6379,
"service": "Redis",
"status": "open",
"severity": "CRITICAL",
"probe": "redis",
"no_auth": true,
"banner": "",
"detail": "Unauthenticated PING/PONG — data directly accessible",
"source_url": "https://target.com/redisinsight/",
"timestamp": "2025-01-01 12:00:01"
}
]
}
]
}
Python integration:
from Tools.db_portscan import DBPortScanAgent, save_dbscan_report
agent = DBPortScanAgent(
timeout = 2.5,
threads = 60,
stealth = False,
max_hosts = 200,
)
report = agent.run(results) # results: list[dict] from DorkEye pipeline
report.print_summary() # terminal summary with CRITICAL highlights
save_dbscan_report(report, out_path) # writes .json + .txt
Produces the final analysis report. Accepts html, md, json, txt.
HTML report sections:
JSON report top-level keys:
{
"meta": { "generated_at": "...", "target": "...", "engine": "DorkEye + Agents" },
"metrics": { "total": N, "by_label": {...}, "secrets": N, "pii": N, "emails": N, "subdomains": N },
"analysis": {},
"secrets": [...],
"pii": [...],
"emails": [...],
"subdomains": { "target.com": ["api.target.com", "..."] },
"cve_dorks": [...],
"db_scan": { "stats": {...}, "hosts": [...] },
"results": [...]
}
dorkeye_agents.py can run directly on any existing DorkEye result file:
# Basic analysis
python dorkeye_agents.py Dump/results.json
# With page fetch
python dorkeye_agents.py Dump/results.json --analyze-fetch --analyze-fetch-max 50
# HTML report to specific path
python dorkeye_agents.py Dump/results.json \
--analyze-fetch --analyze-fmt html --analyze-out report.html
# With target label for the report title
python dorkeye_agents.py Dump/results.json --target "example.com" --analyze-fetch
# Skip LLM triage (regex only, even if LLM plugin available)
python dorkeye_agents.py Dump/results.json --analyze-no-llm-triage
# Full pipeline including DB port scan
python dorkeye_agents.py Dump/results.json --analyze-fetch --dbscan --dbscan-stealth