DorkEye

image

AI Agents | DorkEye Project

The Agents pipeline runs automatically after a dork search when --analyze is active (or when the output file is .json). It requires no external AI — every step uses regex, heuristics, and structural analysis only.


Activate

# Automatic (output .json triggers a prompt)
python dorkeye.py -d "site:example.com" -o results.json

# Explicit
python dorkeye.py -d "site:example.com" -o results.json --analyze

# With page content download
python dorkeye.py -d "site:example.com" -o results.json --analyze --analyze-fetch

# Standalone on saved results
python dorkeye_agents.py Dump/results.json --analyze-fetch --analyze-fmt html

Pipeline — 13 Steps

Step Agent Input Output
1 TriageAgent All results triage_score, triage_label, triage_reason per result
2 PageFetchAgent HIGH / CRITICAL results page_content, response_headers, fetch_status
3 SecurityAgent page_content + response_headers + URL security_verdict dict with threat_level, threat_score, indicators
4 HeaderIntelAgent response_headers header_intel (info leaks, missing headers, outdated versions)
5 TechFingerprintAgent page_content + headers + URL tech_fingerprint (techs, versions, CVE dorks)
6 SecretsAgent page_content + snippet secrets list with type, value, severity, context
7 PiiDetectorAgent page_content + snippet pii_found list with type, censored value, context
8 EmailHarvesterAgent page_content + snippet emails_found list, global dedup
9 SubdomainHarvesterAgent All text fields subdomains list per result, global map
10 LLM Analysis All triaged results analysis dict (optional — requires dorkeye_llm_plugin.py)
11 ReportAgent Everything above HTML / MD / JSON / TXT report
12 DBScanAgent Hosts extracted from results db_scan per-host findings, DBScanReport
13 DorkCrawlerAgent CVE dorks + subdomain seeds Follow-up dork results merged into pipeline

TriageAgent

Assigns a score (0–100) and label (CRITICAL / HIGH / MEDIUM / LOW / SKIP) to every result.

Scoring — two phases:

Phase 1 — regex pattern matching (cap: 60 points):

Pattern matched Points
.env, .git, .sql, backup files 38
Private key (BEGIN PRIVATE KEY) 45
AWS key ID (AKIA...) 42
JWT token in text 36
phpMyAdmin / Adminer / pgAdmin 35
API key / secret token 28
Config file (config.php, settings.py) 28
SQLi candidate URL pattern 26
Credentials / password in text 24
DevOps panels (Jenkins, Kibana, Grafana) 18–22
Cloud storage URLs 18
Directory listing 22
Server info exposed 20
Log files 20
Error / debug / traceback 14
… 8 more rules varies

Phase 2 — runtime bonuses from existing result data:

Condition Bonus
SQLi confirmed — confidence critical +30
SQLi confirmed — confidence high +22
SQLi confirmed — confidence medium or low +12
accessible == True and status_code == 200 +8
URL has ≥ 5 GET parameters +10
URL has 2–4 GET parameters +5

Labels:

Score Label
≥ 90 CRITICAL
≥ 70 HIGH
≥ 50 MEDIUM
≥ 20 LOW
< 20 SKIP

SecurityAgent

Threat-detection middleware that operates in two modes:

It also hooks into the live scanning flow via security_scan_hook(url, content, headers) so threats can be intercepted before results are saved.

Detection categories:

Category Description
phishing Brand impersonation, credential harvesting pages, suspicious redirects
malware JS code execution, obfuscated payloads, file droppers
exploit Reverse shells, SQLi, XXE, SSTI, deserialization payloads
obfuscation Hex/Unicode escaping, base64 chains, high-entropy strings
suspicious_pattern Hidden iframes, missing security headers, executable downloads

Threat scoring — weighted model:

The final threat_score (0–100) is computed as:

Score range Threat level
≤ 15 CLEAN
16 – 40 LOW
41 – 70 SUSPICIOUS
71 – 90 DANGEROUS
> 90 CRITICAL

DANGEROUS and CRITICAL results are automatically blocked in active mode.

CLI flags:

--no-security                  # Disable SecurityAgent entirely
--security-mode active         # active: block DANGEROUS/CRITICAL | passive: report only (default)
--security-quarantine          # Save blocked content to dorkeye_quarantine/

Output per result:

{
  "security_verdict": {
    "url":              "https://target.com/shell.php",
    "threat_level":     "DANGEROUS",
    "threat_score":     78,
    "badge":            "🔴 DANGEROUS",
    "blocked":          false,
    "summary":          "Reverse shell pattern detected in page content",
    "scan_duration_ms": 12.4,
    "timestamp":        "2025-01-01T12:00:00+00:00",
    "indicators": [
      {
        "category":    "exploit",
        "description": "Reverse shell pattern",
        "severity":    60,
        "evidence":    "bash -i >& /dev/tcp/..."
      }
    ]
  }
}

Pipeline-level output keys (added to the top-level report):

Key Content
security_stats Counters: clean, low, suspicious, dangerous, critical, blocked, mode
security_threats List of all verdicts with threat_level ≥ LOW

Inline usage (scanning flow):

from dorkeye_agents import security_scan_hook, get_security_agent

# Quick hook (uses global singleton)
verdict = security_scan_hook(url, response_text, resp_headers)
if verdict.blocked:
    continue  # skip malicious result

# Fine-grained control
agent = get_security_agent(mode="active", quarantine_dir="dorkeye_quarantine")
verdict = agent.scan_single(url, content, headers)

PageFetchAgent

Downloads the actual HTML content of HIGH and CRITICAL results for deeper analysis.

Features:

CLI flags:

--analyze-fetch               # enable download
--analyze-fetch-max 50        # download up to 50 pages (default: 20)

HeaderIntelAgent

Analyzes response_headers saved by PageFetchAgent. Zero additional HTTP requests.

Info leak detection — scans these headers:

server · x-powered-by · x-aspnet-version · x-aspnetmvc-version
x-generator · x-drupal-cache · x-wordpress-cache
x-runtime · x-rack-cache · via · x-debug · x-cache-debug

Outdated version detection — extracts version strings for: Apache, Nginx, PHP, OpenSSL, IIS, Tomcat, Jetty, Lighttpd.

Missing security headers — flags absence of:

Header Risk
Strict-Transport-Security HSTS absent — MITM risk
Content-Security-Policy CSP absent — XSS risk
X-Frame-Options Clickjacking protection absent
X-Content-Type-Options MIME sniffing protection absent
Referrer-Policy Referrer-Policy absent
Permissions-Policy Permissions-Policy absent

Output per result:

{
  "header_intel": {
    "info_leaks":       [{"header": "x-powered-by", "value": "PHP/5.6.40", "version": "PHP/5.6"}],
    "missing_security": [{"header": "content-security-policy", "reason": "CSP absent — XSS risk"}],
    "outdated":         [{"header": "server", "value": "Apache/2.2.34", "version": "Apache/2.2"}]
  }
}

TechFingerprintAgent

Identifies technologies from page_content, response_headers, snippet, URL, and title. Attempts version extraction where possible.

35 signatures in 7 categories:

Category Technologies
CMS WordPress, Joomla, Drupal, Magento, PrestaShop, TYPO3, Shopify, Wix
Framework Laravel, Django, Rails, Flask, Express.js, Next.js, Nuxt.js
JS libraries jQuery (versioned), React, Vue.js, Angular, Bootstrap (versioned)
Server Apache, Nginx, IIS, OpenSSL (all versioned)
Language PHP, Python, Node.js (all versioned)
DevOps Jenkins, GitLab, Kibana, Grafana, Docker, Kubernetes, Elasticsearch
DB panels phpMyAdmin, Adminer, pgAdmin

CVE dork generation — for 10 tech families, targeted dorks are generated and fed to DorkCrawlerAgent:

site:target.com inurl:wp-login.php
site:target.com inurl:xmlrpc.php
site:target.com inurl:app/kibana

Output per result:

{
  "tech_fingerprint": {
    "techs": [
      {"name": "WordPress", "category": "cms"},
      {"name": "jQuery", "category": "js_lib", "version": "3.6.0"},
      {"name": "PHP", "category": "lang", "version": "7.4"}
    ],
    "cve_dorks": ["site:target.com inurl:wp-login.php", "..."]
  }
}

SecretsAgent

Scans page_content and snippet for 50+ credential and secret patterns.

Secret categories with severity:

Severity Types
CRITICAL Private keys, AWS keys, bcrypt hashes, NTLM hashes, Stripe keys
HIGH DB connections, JWTs, GCP keys, Azure keys, GitHub PATs, passwords, SendGrid, Twilio, GitLab PAT, Docker PAT, NPM token
MEDIUM Generic API keys, tokens, Slack keys, webhooks, SSH credentials, .env variables, Mailgun, Heroku
LOW MD5 / SHA1 / SHA256 / SHA512 hashes, internal IPs

Features:

Output per result:

{
  "secrets": [
    {
      "type":       "AWS_KEY",
      "detection":  "REGEX",
      "value":      "AKIA…0A2",
      "confidence": "HIGH",
      "severity":   "CRITICAL",
      "context":    "...aws_access_key_id = AKIA...",
      "source":     "https://target.com/config.php",
      "desc":       "AWS Access Key ID"
    }
  ]
}

PiiDetectorAgent

Detects personally identifiable information, separated from SecretsAgent by design — PII requires different handling than technical credentials. Patterns are organised by geographic area.

Detected types:

Type Coverage
EMAIL Standard email format — global
PHONE_US US/Canada — NANP format with optional +1
PHONE_EU EU + UK + CH + NO — 22 country codes (+30 to +421)
PHONE_ME Middle East — EG, TR, AF, IR, LB, JO, SY, IQ, KW, SA, YE, OM, PS, AE, IL, BH, QA
PHONE_AS Asia-Pacific — MY, AU, ID, PH, NZ, SG, TH, JP, KR, VN, CN, HK, MO, KH, LA, BD, TW, IN, PK, LK, MM
IBAN Generic IBAN — covers EU, UK, and Middle East banking formats
TAX_ID_US SSN (NNN-NN-NNNN) and EIN (NN-NNNNNNN)
TAX_ID_EU EU VAT number with ISO country prefix (DE, FR, IT, ES, PL, and 18 more)
TAX_ID_ME Keyword-anchored: SA VAT (15 digits), AE TRN, EG, TR, IR
TAX_ID_AS IN PAN card, CN USCC (18 chars), JP My Number, KR TRN, SG UEN, AU ABN
NIN_EU EU national identity numbers — BSN, PESEL, personnummer, SVNR, NIR
NID_ME Emirates ID (784-format), SA national ID, keyword-anchored
NID_AS SG NRIC, KR RRN, IN Aadhaar (XXXX XXXX XXXX), keyword-anchored
CREDIT_CARD Visa, Mastercard, Discover, Amex — Luhn-validated
SSN_US US SSN with exclusion of invalid blocks (000, 666, 9xx)
DOB Date of birth — keyword-anchored, multilingual labels (EN/ES/DE/AR/ZH/KO)
PASSPORT Generic machine-readable passport format — global
PUBLIC_IP Non-RFC-1918, non-loopback IPv4 — global

Credit card numbers are validated with the Luhn algorithm — false positives from random numeric strings are eliminated. Values are censored to 4 visible characters per end.


EmailHarvesterAgent

Collects email addresses from snippet and page content, deduplicates globally across all results, and categorizes by prefix.

Category Prefix patterns
admin admin, administrator, root, sysadmin, webmaster, hostmaster, postmaster
security security, abuse, vuln, pentest, csirt, cert, soc, noc, infosec
info info, contact, hello, support, help, service, sales, marketing
noreply noreply, no-reply, donotreply, mailer-daemon, bounce
personal everything else

Global dedup: same address found in 10 pages = counted once. Results sorted by category priority (admin first, noreply last).


SubdomainHarvesterAgent

Extracts subdomains from all text fields (URL, snippet, page_content, title). Deduplicates globally per base domain.

Base domain extraction: takes the last two labels — api.v2.target.comtarget.com.

Follow-up dork generation — 3 dork variants per subdomain:

site:api.target.com
site:api.target.com inurl:admin
site:api.target.com inurl:.env OR inurl:.git

These are merged with TechFingerprintAgent’s CVE dorks and passed to DorkCrawlerAgent as seeds for the next round.


DBScanAgent

Scans exposed database ports on all unique hosts extracted from dork results. Runs after the main analysis pipeline and produces a dedicated DBScanReport saved alongside the main output file.

Location: DorkEye/Tools/db_portscan.py

Detection coverage:

Service Port(s) Probe type
MySQL 3306 TCP banner
PostgreSQL 5432 TCP banner
MongoDB 27017, 27018*, 27019* OP_MSG isMaster handshake
Redis 6379 PING+PONG
Elasticsearch 9200, 9300 HTTP GET / — checks cluster_name, version
CouchDB 5984 HTTP GET / — checks couchdb, Welcome
InfluxDB 8086 HTTP GET /ping (204 = alive)
Neo4j 7474 HTTP GET / — checks neo4j, bolt
Memcached 11211 stats\r\nSTAT
MSSQL 1433 TCP banner
Oracle 1521 TCP banner
Cassandra 9042 TCP banner
RethinkDB 28015, 5000* TCP banner
DB2 50000* TCP banner
Riak 8098 HTTP GET /

* non-default — included only when --ports is set explicitly.

Severity model:

Outcome Severity Meaning
Port open + no-auth confirmed CRITICAL Data directly accessible without credentials
Port open + service banner confirmed HIGH Auth likely required but service is exposed
Port open, service unconfirmed MEDIUM Port responding, service unclear from banner
Port closed / filtered / timeout INFO Not reported in findings

No-auth probe logic per service:

Service No-auth trigger
Redis +PONG received after PING
Elasticsearch HTTP 200 with cluster_name + version in body
CouchDB HTTP 200 with couchdb + Welcome in body
InfluxDB HTTP 204 on /ping
Neo4j HTTP 200 with neo4j + bolt in body
MongoDB isMaster / isWritablePrimary in OP_MSG reply
Memcached STAT lines returned on stats command

Dork-to-port hints — if a result’s URL, title, or snippet matches a known DB keyword, those ports are promoted to the front of the scan queue for that host:

Keyword pattern Hinted ports
phpmyadmin, mysqladmin 3306
pgadmin, postgresql 5432
mongodb, robo3t 27017, 27018
redis, redisinsight 6379
elasticsearch, kibana 9200, 9300
couchdb, fauxton 5984
influx 8086
neo4j 7474
mssql, sqlserver 1433
oracle, tns listener 1521
cassandra 9042
memcache 11211

CLI flags:

--dbscan                       # Enable DBScanAgent in the pipeline
--dbscan-timeout 2.5           # TCP connect timeout in seconds (default: 2.5)
--dbscan-threads 60            # Worker threads per host (default: 60)
--dbscan-ports 3306 5432 6379  # Override default port list
--dbscan-max-hosts 200         # Max hosts to scan (default: 200)
--dbscan-stealth               # Add 1.5–3.5s random delay between hosts

Standalone usage:

# Scan all hosts in a results file (default ports)
python db_portscan.py results.json

# Custom timeout and thread count
python db_portscan.py results.json --timeout 3 --threads 80

# Target specific ports only
python db_portscan.py results.json --ports 3306 5432 27017 6379

# Stealth mode with host cap
python db_portscan.py results.json --stealth --max-hosts 50

# Custom output path
python db_portscan.py results.json --out Dump/custom_scan

Output files:

Dump/<stem>_dbscan_<ts>.json   # Full structured report
Dump/<stem>_dbscan_<ts>.txt    # Human-readable summary, usable as reference list

Output structure (JSON):

{
  "generated_at": "2025-01-01 12:00:00",
  "stats": {
    "hosts_scanned": 12,
    "ports_scanned": 192,
    "open_ports":    7,
    "critical":      2,
    "high":          3,
    "medium":        2
  },
  "hosts": [
    {
      "host":      "target.com",
      "scanned":   16,
      "duration":  4.12,
      "critical":  1,
      "high":      1,
      "open_ports": [6379, 9200],
      "findings": [
        {
          "host":       "target.com",
          "port":       6379,
          "service":    "Redis",
          "status":     "open",
          "severity":   "CRITICAL",
          "probe":      "redis",
          "no_auth":    true,
          "banner":     "",
          "detail":     "Unauthenticated PING/PONG — data directly accessible",
          "source_url": "https://target.com/redisinsight/",
          "timestamp":  "2025-01-01 12:00:01"
        }
      ]
    }
  ]
}

Python integration:

from Tools.db_portscan import DBPortScanAgent, save_dbscan_report

agent = DBPortScanAgent(
    timeout   = 2.5,
    threads   = 60,
    stealth   = False,
    max_hosts = 200,
)
report = agent.run(results)          # results: list[dict] from DorkEye pipeline
report.print_summary()               # terminal summary with CRITICAL highlights
save_dbscan_report(report, out_path) # writes .json + .txt

ReportAgent

Produces the final analysis report. Accepts html, md, json, txt.

HTML report sections:

JSON report top-level keys:

{
  "meta":       { "generated_at": "...", "target": "...", "engine": "DorkEye + Agents" },
  "metrics":    { "total": N, "by_label": {...}, "secrets": N, "pii": N, "emails": N, "subdomains": N },
  "analysis":   {},
  "secrets":    [...],
  "pii":        [...],
  "emails":     [...],
  "subdomains": { "target.com": ["api.target.com", "..."] },
  "cve_dorks":  [...],
  "db_scan":    { "stats": {...}, "hosts": [...] },
  "results":    [...]
}

Standalone Usage

dorkeye_agents.py can run directly on any existing DorkEye result file:

# Basic analysis
python dorkeye_agents.py Dump/results.json

# With page fetch
python dorkeye_agents.py Dump/results.json --analyze-fetch --analyze-fetch-max 50

# HTML report to specific path
python dorkeye_agents.py Dump/results.json \
  --analyze-fetch --analyze-fmt html --analyze-out report.html

# With target label for the report title
python dorkeye_agents.py Dump/results.json --target "example.com" --analyze-fetch

# Skip LLM triage (regex only, even if LLM plugin available)
python dorkeye_agents.py Dump/results.json --analyze-no-llm-triage

# Full pipeline including DB port scan
python dorkeye_agents.py Dump/results.json --analyze-fetch --dbscan --dbscan-stealth