Back to Home

Methodology

How we ingest, process, and analyze declassified records.

1. Document Ingestion

Our automated pipeline regularly crawls four official government sources: the National Archives (NARA), CIA FOIA Reading Room, FBI Vault, and NSA FOIA releases. We check for new documents at least daily.

Ingestion Steps

  1. 1. Discover new document listings from source indexes
  2. 2. Download PDF/media files to our secure storage
  3. 3. Generate SHA-256 hash for integrity verification
  4. 4. Record canonical source URL and mirrored location
  5. 5. Store first-seen and last-checked timestamps

2. Dual-Layer Model

We maintain strict separation between original documents and our analysis layer:

Immutable Layer

  • • Original PDFs/media files
  • • Never altered or modified
  • • Cryptographic hash verification
  • • Full provenance chain

Intelligence Layer

  • • Extracted text and metadata
  • • Entity tags and classifications
  • • Correlation links
  • • Fully rebuildable from originals

3. Data Extraction

Our extraction process identifies factual elements only. We do not interpret or draw conclusions.

Extracted Elements

  • • Dates and timestamps
  • • Geographic locations
  • • Agency/organization names
  • • Personnel names (as stated)
  • • Document classification markings
  • • Evidence type (memo, report, photo)
  • • Object descriptions (as quoted)
  • • Witness statements (verbatim)

4. Correlation Logic

Cases are linked based on explicit, explainable attributes:

  • Temporal proximity: Events occurring within the same time window
  • Geographic proximity: Events in nearby locations
  • Shared entities: Same agencies, personnel, or organizations mentioned
  • Similar descriptors: Matching object descriptions or characteristics

5. Limitations

We acknowledge the following limitations:

  • • OCR extraction from scanned documents may contain errors
  • • Some documents are heavily redacted, limiting extractable data
  • • Location normalization may be imprecise for historical place names
  • • Correlation strength is algorithmic, not a measure of significance
  • • We cannot verify the accuracy of original government records