4.1 KiB
4.1 KiB
Monitoring Stack
Directory Structure (Canonical)
All monitoring configuration lives under monitoring/.
monitoring/
prometheus/
prometheus.yml # Prometheus scrape configuration
grafana/
dashboards/ # Grafana dashboards (JSON)
provisioning/
datasources/ # Grafana data sources (Prometheus/Loki)
dashboards/ # Grafana dashboard provider (points at dashboards/)
loki-config.yml # Loki configuration
promtail-config.yml # Promtail configuration
What is "Grafana provisioning"?
Grafana provisioning is how Grafana auto-configures itself on startup (no clicking in the UI):
grafana/provisioning/datasources/*.yml- Defines where Grafana reads data from (e.g. Prometheus at
http://prometheus:9090, Loki athttp://loki:3100).
- Defines where Grafana reads data from (e.g. Prometheus at
grafana/provisioning/dashboards/*.yml- Tells Grafana to load dashboard JSON files from
/var/lib/grafana/dashboards.
- Tells Grafana to load dashboard JSON files from
grafana/dashboards/*.json- The dashboards themselves.
Source of truth
- Dashboards: edit/add JSON in
monitoring/grafana/dashboards/. - Grafana provisioning: edit files in
monitoring/grafana/provisioning/. - Prometheus scrape config: edit
monitoring/prometheus/prometheus.yml.
scripts/setup_monitoring.py is intentionally provisioning-only:
- It (re)writes Grafana datasources and the dashboard provider.
- It does not create or overwrite any dashboard JSON files.
Quick Start
# Start deployment
make deploy-up
# Access services
open http://localhost:3000 # Grafana (admin/admin123)
open http://localhost:9090 # Prometheus
Services
Grafana (Port 3000)
Main monitoring dashboard
- Username:
admin - Password:
admin123 - Data source: Prometheus (http://localhost:9090)
Prometheus (Port 9090)
Metrics collection and storage
Loki (Port 3100)
Log aggregation
Dashboards
Available dashboard configurations in grafana/dashboards/:
load-test-performance.json- Load test metricswebsocket-performance.json- WebSocket performancesystem-health.json- System health monitoringrsync-performance.json- Rsync performance metrics
Importing Dashboards
- Go to Grafana → "+" → "Import"
- Upload JSON files from
grafana/dashboards/directory - Select Prometheus data source
Configuration Files
prometheus/prometheus.yml- Prometheus configurationloki-config.yml- Loki configurationpromtail-config.yml- Promtail configurationsecurity_rules.yml- Security rules
Usage
- Start monitoring stack:
make deploy-up - Access Grafana: http://localhost:3000 (admin/admin123)
- Import dashboards from
grafana/dashboards/directory - View metrics and test results in real-time
Health Endpoints
The API server provides health check endpoints for monitoring:
/health- Overall service health (for Docker healthcheck)/health/live- Liveness probe (is the service running?)/health/ready- Readiness probe (can the service accept traffic?)
Testing Health Endpoints
# Basic health check
curl -k https://localhost:9101/health
# Liveness check (for K8s or monitoring)
curl -k https://localhost:9101/health/live
# Readiness check (verifies dependencies)
curl -k https://localhost:9101/health/ready
See health-testing.md for detailed testing procedures.
Prometheus Integration
Prometheus scrapes the following endpoints:
api-server:9101/metrics- Application metrics (future)api-server:9101/health- Health status monitoringhost.docker.internal:9100/metrics- Worker metrics (when the worker runs on the host)worker:9100/metrics- Worker metrics (when the worker runs as a container in the compose network)
Cleanup (deprecated paths)
These legacy paths may still exist in the repo but are not used by the current dev compose config:
monitoring/dashboards/(old dashboards location)monitoring/prometheus.yml(old Prometheus config location)monitoring/grafana/provisioning/dashboards/dashboard.yml(duplicate ofdashboards.yml)