Kubernetes Deployment¶
This guide describes deploying the BBC News ETL pipeline in production-like Kubernetes environments, including KEDA autoscaling.
Prerequisites¶
- Kubernetes cluster (Minikube, Kind, or cloud provider)
- Helm >= 3.x
- kubectl CLI
- Optional: KEDA for autoscaling
Architecture¶
- Primary Producer → initializes work queue
- Multiple Producers → scrape articles, deduplicate, publish tasks
- RabbitMQ → message broker for Task Queue & DLQ
- Consumers (ETL Workers) → fetch and process tasks
- MongoDB / PostgreSQL → data storage
- Prometheus / Grafana / Loki + Promtail → observability
- KEDA → scales producers/consumers based on queue length
Deployment Steps¶
- Clone the repo:
git clone https://github.com/Rahul-404/bbc_news_etl_pipeline.git
cd bbc_news_etl_pipeline
- Install RabbitMQ, MongoDB, PostgreSQL, Prometheus, Grafana, Loki using Helm charts:
helm install rabbitmq ./helm/rabbitmq
helm install mongo ./helm/mongodb
helm install postgres ./helm/postgres
helm install prometheus ./helm/prometheus
helm install grafana ./helm/grafana
helm install loki ./helm/loki
- Deploy Primary Producer, Producers, and Consumers:
kubectl apply -f k8s/producers/
kubectl apply -f k8s/consumers/
-
Configure KEDA for horizontal scaling:
-
Producers scale based on Work Queue length.
- Consumers scale based on Task Queue depth.
-
Example KEDA ScaledObject YAML is included in
k8s/keda/
. -
Verify Pods and Services:
kubectl get pods
kubectl get svc
-
Access dashboards:
-
Grafana: http://
:3000 - Prometheus: http://
:9090 - Loki: http://
:3100
Notes¶
- Each Producer pod contains its own Selenium driver for scraping.
- DLQ handling is automatic: failed ETL messages remain in RabbitMQ for manual inspection.
- Logging and metrics are fully integrated, ready for production-grade monitoring.
- Helm charts allow environment-specific configurations via
values.yaml
.