Docker Deployment¶
This guide describes how to run the BBC News ETL pipeline locally using Docker Compose. Each component is containerized for portability and reproducibility.
Prerequisites¶
- Docker >= 24.x
- Docker Compose >= 2.x
- Python 3.11 (for local development, optional if using pre-built images)
Services Overview¶
The Docker Compose stack includes:
- Primary Producer → generates work queue
- Producers → scrape articles and push tasks
- RabbitMQ → message queue (Task Queue & DLQ)
- Consumers (ETL Workers) → fetch and transform messages
- MongoDB → raw data storage
- PostgreSQL → cleaned data storage
- Prometheus → metrics collection
- Grafana → dashboards
- Promtail → Loki → centralized logging
Getting Started¶
- Clone the repository:
git clone https://github.com/Rahul-404/bbc_news_etl_pipeline.git
cd bbc_news_etl_pipeline
- Build Docker images (or pull pre-built images from registry):
docker-compose build
- Start the stack:
docker-compose up -d
- Verify services:
docker-compose ps
-
Access dashboards:
-
Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
- Loki: http://localhost:3100
Notes¶
- Each producer requires its own Selenium driver instance. Docker Compose sets up individual containers with separate drivers.
- Work queue initialization is handled by the primary producer.
- DLQ messages can be inspected via RabbitMQ management UI or by custom ETL scripts.
- Logs are forwarded via Promtail → Loki for centralized querying.