Skip to content

Metrics (Prometheus)

The BBC News ETL Pipeline exposes rich metrics via Prometheus to provide real-time insights into pipeline performance, queue health, processing rates, and failures.


1. Overview

Prometheus collects metrics from producers, consumers, and RabbitMQ. Metrics allow:

  • Monitoring throughput and latency.
  • Tracking queue depths and backlogs.
  • Alerting on failures, DLQ accumulation, or resource saturation.
  • Observing scaling behavior when KEDA adjusts the number of producers/consumers.

2. Metrics Sources

Component Metrics Type Description
Producers Counter / Gauge Number of article links scraped, success/failure counts, active tasks, work queue size
Consumers (ETL Workers) Counter / Gauge Messages consumed, ETL success/failure counts, retry attempts, DLQ counts
RabbitMQ Queue metrics Task queue depth, DLQ depth, message publish/consume rate, queue latency
Kubernetes / KEDA Custom metrics Pod replicas, CPU/Memory usage, scaling events

3. Example Metrics

Producer Metrics

bbc_producer_articles_scraped_total{section="world"} 1234
bbc_producer_scrape_failures_total{section="tech"} 12
bbc_producer_workqueue_length 15

Consumer Metrics

bbc_consumer_messages_processed_total 4567
bbc_consumer_etl_failures_total 34
bbc_consumer_dlq_messages_total 5

RabbitMQ Metrics

rabbitmq_queue_messages{queue="task_queue"} 120
rabbitmq_queue_messages{queue="dlq_queue"} 3
rabbitmq_queue_consumers 4

4. Integration

  • Prometheus Exporter is embedded in each component (Python prometheus_client).
  • Metrics are scraped automatically by the Prometheus server.
  • Producers, Consumers, and RabbitMQ expose /metrics endpoints.

Example:

from prometheus_client import start_http_server, Counter

articles_scraped = Counter('bbc_producer_articles_scraped_total', 'Total articles scraped')
start_http_server(8000)  # exposes /metrics
  • Prometheus pulls metrics at a configurable interval (default: 15s).

5. Best Practices

  • Use meaningful labels (e.g., section, status, queue) for filtering in Grafana.
  • Set alerts for:

  • Work queue backlog > threshold

  • DLQ accumulation > threshold
  • ETL failure rate > threshold
  • Combine Prometheus metrics with Loki logs for full observability.
  • Ensure metrics endpoints are secured in production deployments.