fetch_ml/monitoring/README.md
Jeremie Fraeys 7948639b1e
docs: update documentation for streamlined Makefile
- Replace 'make test-full' with 'make test' throughout docs
- Replace 'make self-cleanup' with 'make clean'
- Replace 'make tech-excellence' with 'make complete-suite'
- Replace 'make deploy-up' with 'make dev-up'
- Update docker-compose commands to docker compose v2
- Update CI workflow to use new Makefile targets
2026-03-04 13:22:29 -05:00

133 lines
No EOL
4.1 KiB
Markdown

# Monitoring Stack
## Directory Structure (Canonical)
All monitoring configuration lives under `monitoring/`.
```text
monitoring/
prometheus/
prometheus.yml # Prometheus scrape configuration
grafana/
dashboards/ # Grafana dashboards (JSON)
provisioning/
datasources/ # Grafana data sources (Prometheus/Loki)
dashboards/ # Grafana dashboard provider (points at dashboards/)
loki-config.yml # Loki configuration
promtail-config.yml # Promtail configuration
```
### What is "Grafana provisioning"?
Grafana provisioning is how Grafana auto-configures itself on startup (no clicking in the UI):
- **`grafana/provisioning/datasources/*.yml`**
- Defines where Grafana reads data from (e.g. Prometheus at `http://prometheus:9090`, Loki at `http://loki:3100`).
- **`grafana/provisioning/dashboards/*.yml`**
- Tells Grafana to load dashboard JSON files from `/var/lib/grafana/dashboards`.
- **`grafana/dashboards/*.json`**
- The dashboards themselves.
### Source of truth
- **Dashboards**: edit/add JSON in `monitoring/grafana/dashboards/`.
- **Grafana provisioning**: edit files in `monitoring/grafana/provisioning/`.
- **Prometheus scrape config**: edit `monitoring/prometheus/prometheus.yml`.
`scripts/setup_monitoring.py` is intentionally **provisioning-only**:
- It (re)writes Grafana **datasources** and the **dashboard provider**.
- It does **not** create or overwrite any dashboard JSON files.
## Quick Start
```bash
# Start deployment
make dev-up
# Access services
open http://localhost:3000 # Grafana (admin/admin123)
open http://localhost:9090 # Prometheus
```
## Services
### Grafana (Port 3000)
**Main monitoring dashboard**
- Username: `admin`
- Password: `admin123`
- Data source: Prometheus (http://localhost:9090)
### Prometheus (Port 9090)
**Metrics collection and storage**
### Loki (Port 3100)
**Log aggregation**
## Dashboards
Available dashboard configurations in `grafana/dashboards/`:
- `load-test-performance.json` - Load test metrics
- `websocket-performance.json` - WebSocket performance
- `system-health.json` - System health monitoring
- `rsync-performance.json` - Rsync performance metrics
### Importing Dashboards
1. Go to Grafana → "+" → "Import"
2. Upload JSON files from `grafana/dashboards/` directory
3. Select Prometheus data source
## Configuration Files
- `prometheus/prometheus.yml` - Prometheus configuration
- `loki-config.yml` - Loki configuration
- `promtail-config.yml` - Promtail configuration
- `security_rules.yml` - Security rules
## Usage
1. Start monitoring stack: `make dev-up`
2. Access Grafana: http://localhost:3000 (admin/admin123)
3. Import dashboards from `grafana/dashboards/` directory
4. View metrics and test results in real-time
## Health Endpoints
The API server provides health check endpoints for monitoring:
- **`/health`** - Overall service health (for Docker healthcheck)
- **`/health/live`** - Liveness probe (is the service running?)
- **`/health/ready`** - Readiness probe (can the service accept traffic?)
### Testing Health Endpoints
```bash
# Basic health check
curl -k https://localhost:9101/health
# Liveness check (for K8s or monitoring)
curl -k https://localhost:9101/health/live
# Readiness check (verifies dependencies)
curl -k https://localhost:9101/health/ready
```
See `health-testing.md` for detailed testing procedures.
## Prometheus Integration
Prometheus scrapes the following endpoints:
- `api-server:9101/metrics` - Application metrics (future)
- `api-server:9101/health` - Health status monitoring
- `host.docker.internal:9100/metrics` - Worker metrics (when the worker runs on the host)
- `worker:9100/metrics` - Worker metrics (when the worker runs as a container in the compose network)
## Cleanup (deprecated paths)
These legacy paths may still exist in the repo but are **not used** by the current dev compose config:
- `monitoring/dashboards/` (old dashboards location)
- `monitoring/prometheus.yml` (old Prometheus config location)
- `monitoring/grafana/provisioning/dashboards/dashboard.yml` (duplicate of `dashboards.yml`)