fetch_ml/monitoring/README.md
Jeremie Fraeys 7948639b1e
docs: update documentation for streamlined Makefile
- Replace 'make test-full' with 'make test' throughout docs
- Replace 'make self-cleanup' with 'make clean'
- Replace 'make tech-excellence' with 'make complete-suite'
- Replace 'make deploy-up' with 'make dev-up'
- Update docker-compose commands to docker compose v2
- Update CI workflow to use new Makefile targets
2026-03-04 13:22:29 -05:00

4.1 KiB

Monitoring Stack

Directory Structure (Canonical)

All monitoring configuration lives under monitoring/.

monitoring/
  prometheus/
    prometheus.yml                # Prometheus scrape configuration
  grafana/
    dashboards/                   # Grafana dashboards (JSON)
    provisioning/
      datasources/                # Grafana data sources (Prometheus/Loki)
      dashboards/                 # Grafana dashboard provider (points at dashboards/)
  loki-config.yml                 # Loki configuration
  promtail-config.yml             # Promtail configuration

What is "Grafana provisioning"?

Grafana provisioning is how Grafana auto-configures itself on startup (no clicking in the UI):

  • grafana/provisioning/datasources/*.yml
    • Defines where Grafana reads data from (e.g. Prometheus at http://prometheus:9090, Loki at http://loki:3100).
  • grafana/provisioning/dashboards/*.yml
    • Tells Grafana to load dashboard JSON files from /var/lib/grafana/dashboards.
  • grafana/dashboards/*.json
    • The dashboards themselves.

Source of truth

  • Dashboards: edit/add JSON in monitoring/grafana/dashboards/.
  • Grafana provisioning: edit files in monitoring/grafana/provisioning/.
  • Prometheus scrape config: edit monitoring/prometheus/prometheus.yml.

scripts/setup_monitoring.py is intentionally provisioning-only:

  • It (re)writes Grafana datasources and the dashboard provider.
  • It does not create or overwrite any dashboard JSON files.

Quick Start

# Start deployment
make dev-up

# Access services
open http://localhost:3000  # Grafana (admin/admin123)
open http://localhost:9090  # Prometheus

Services

Grafana (Port 3000)

Main monitoring dashboard

Prometheus (Port 9090)

Metrics collection and storage

Loki (Port 3100)

Log aggregation

Dashboards

Available dashboard configurations in grafana/dashboards/:

  • load-test-performance.json - Load test metrics
  • websocket-performance.json - WebSocket performance
  • system-health.json - System health monitoring
  • rsync-performance.json - Rsync performance metrics

Importing Dashboards

  1. Go to Grafana → "+" → "Import"
  2. Upload JSON files from grafana/dashboards/ directory
  3. Select Prometheus data source

Configuration Files

  • prometheus/prometheus.yml - Prometheus configuration
  • loki-config.yml - Loki configuration
  • promtail-config.yml - Promtail configuration
  • security_rules.yml - Security rules

Usage

  1. Start monitoring stack: make dev-up
  2. Access Grafana: http://localhost:3000 (admin/admin123)
  3. Import dashboards from grafana/dashboards/ directory
  4. View metrics and test results in real-time

Health Endpoints

The API server provides health check endpoints for monitoring:

  • /health - Overall service health (for Docker healthcheck)
  • /health/live - Liveness probe (is the service running?)
  • /health/ready - Readiness probe (can the service accept traffic?)

Testing Health Endpoints

# Basic health check
curl -k https://localhost:9101/health

# Liveness check (for K8s or monitoring)
curl -k https://localhost:9101/health/live

# Readiness check (verifies dependencies)
curl -k https://localhost:9101/health/ready

See health-testing.md for detailed testing procedures.

Prometheus Integration

Prometheus scrapes the following endpoints:

  • api-server:9101/metrics - Application metrics (future)
  • api-server:9101/health - Health status monitoring
  • host.docker.internal:9100/metrics - Worker metrics (when the worker runs on the host)
  • worker:9100/metrics - Worker metrics (when the worker runs as a container in the compose network)

Cleanup (deprecated paths)

These legacy paths may still exist in the repo but are not used by the current dev compose config:

  • monitoring/dashboards/ (old dashboards location)
  • monitoring/prometheus.yml (old Prometheus config location)
  • monitoring/grafana/provisioning/dashboards/dashboard.yml (duplicate of dashboards.yml)