Skip to content

Storage Layer (MongoDB + PostgreSQL)

This layer stores both raw and processed news data.


Components

MongoDB (Raw Storage)

  • Stores unprocessed HTML and metadata.
  • Schema-less design allows flexible storage of raw articles.
  • Useful for auditing and reprocessing pipelines.

PostgreSQL (Clean Storage)

  • Stores structured, analytics-ready data.
  • Supports indexes and optimized queries for downstream analytics.
  • Versioned datasets for reproducibility.

Data Flow

flowchart LR
    Consumer --> MongoDB[Raw Storage]
    Consumer --> Postgres[Clean Storage]

Key Highlights

  • Separation of raw vs cleaned data ensures reproducibility.
  • Optimized for analytics and reporting in PostgreSQL.
  • Centralized backups for reliability and disaster recovery.