352 words
2 minutes
robots.txt

Search Engine Discovery (OSINT)#

Search Engine Discovery is the practice of using public search engines to gather information about websites, organizations, or individuals. It’s a form of Open Source Intelligence (OSINT) that leverages advanced search techniques to extract valuable data from publicly accessible sources.

Why It Matters#

  • Open Source: Legal and ethical since it uses public data.
  • Extensive Reach: Accesses a wide range of indexed online content.
  • User-Friendly: No deep technical expertise required.
  • Free Tool: Cost-effective for researchers, analysts, and security professionals.

Applications#

  • Security Assessment: Detect exposed endpoints, sensitive documents, and credentials.
  • Competitive Intelligence: Analyze competitors’ offerings and strategies.
  • Investigative Journalism: Trace hidden connections and transactions.
  • Threat Intelligence: Identify and monitor emerging cyber threats.

Search Operators Cheat Sheet#

OperatorDescriptionExampleUse Case
site:Restrict search to a specific domainsite.comView all indexed pages on a site
inurl:Search term within URLinurlFind login or admin pages
filetype:Search for a specific file typefiletypeLocate PDFs, docs, etc.
intitle:Term in page titleintitle:“confidential reportDiscover sensitive or titled content
intext:Term in body textintext:“reset password”Find mentions in site content
cache:View cached versioncache.comSee previous content of a site
link:Find sites linking to a URLlink.comAnalyze backlinks
related:Show similar sitesrelated.comDiscover competitors or alternative domains
info:Show page summaryinfo.comBasic metadata and indexing
define:Get definitionsdefineClarify terms quickly
numrange:Filter within a number rangenumrange:1000-2000Find pages mentioning numbers in a range
allintext:All terms must be in bodyallintextreset passwordMatch multiple body terms
allinurl:All terms must be in URLallinurlpanelSearch for specific structures
allintitle:All terms must be in titleallintitlereport 2023Find precisely titled pages
OREither of the terms”ubuntu” OR “debian”Broaden your search scope
ANDBoth terms requiredsite.com AND inurlNarrow to more specific queries
NOT or -Exclude termssite.com -inurlRemove unwanted results
*Wildcardfiletypeuser* guideMatch anything between or after words
..Numerical range”price” 100..500Find products within price range
" "Exact phrase”security best practices”Avoid variations, get precise matches

Google Dorking (aka Google Hacking)#

Google Dorking uses the above search operators to uncover sensitive or hidden data from websites. While often used by security researchers and ethical hackers, it must be handled responsibly.

Common Google Dork Examples#

PurposeDork Example
Find Login Pagessite.com inurlOR inurl
Exposed Filessite.com filetypeOR filetypeOR filetype
Config Filesinurl.php OR extOR ext
Database BackupsinurlOR filetype
Sensitive Info Leaksintitle.of passwd OR intext

For a full collection, refer to the Google Hacking Database (GHDB).

Limitations#

  • Search engines don’t index everything (deep web, dynamic content).
  • Some data is intentionally hidden or protected by firewalls/robots.txt.
  • Always ensure you’re operating within legal and ethical boundaries.
robots.txt
https://fuwari.vercel.app/posts/search-engine-discovery/
Author
Ranjung Yeshi Norbu
Published at
2025-04-21