133 lines
No EOL
4.1 KiB
Markdown
133 lines
No EOL
4.1 KiB
Markdown
# Monitoring Stack
|
|
|
|
## Directory Structure (Canonical)
|
|
|
|
All monitoring configuration lives under `monitoring/`.
|
|
|
|
```text
|
|
monitoring/
|
|
prometheus/
|
|
prometheus.yml # Prometheus scrape configuration
|
|
grafana/
|
|
dashboards/ # Grafana dashboards (JSON)
|
|
provisioning/
|
|
datasources/ # Grafana data sources (Prometheus/Loki)
|
|
dashboards/ # Grafana dashboard provider (points at dashboards/)
|
|
loki-config.yml # Loki configuration
|
|
promtail-config.yml # Promtail configuration
|
|
```
|
|
|
|
### What is "Grafana provisioning"?
|
|
|
|
Grafana provisioning is how Grafana auto-configures itself on startup (no clicking in the UI):
|
|
|
|
- **`grafana/provisioning/datasources/*.yml`**
|
|
- Defines where Grafana reads data from (e.g. Prometheus at `http://prometheus:9090`, Loki at `http://loki:3100`).
|
|
- **`grafana/provisioning/dashboards/*.yml`**
|
|
- Tells Grafana to load dashboard JSON files from `/var/lib/grafana/dashboards`.
|
|
- **`grafana/dashboards/*.json`**
|
|
- The dashboards themselves.
|
|
|
|
### Source of truth
|
|
|
|
- **Dashboards**: edit/add JSON in `monitoring/grafana/dashboards/`.
|
|
- **Grafana provisioning**: edit files in `monitoring/grafana/provisioning/`.
|
|
- **Prometheus scrape config**: edit `monitoring/prometheus/prometheus.yml`.
|
|
|
|
`scripts/setup_monitoring.py` is intentionally **provisioning-only**:
|
|
|
|
- It (re)writes Grafana **datasources** and the **dashboard provider**.
|
|
- It does **not** create or overwrite any dashboard JSON files.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Start deployment
|
|
make deploy-up
|
|
|
|
# Access services
|
|
open http://localhost:3000 # Grafana (admin/admin123)
|
|
open http://localhost:9090 # Prometheus
|
|
```
|
|
|
|
## Services
|
|
|
|
### Grafana (Port 3000)
|
|
**Main monitoring dashboard**
|
|
- Username: `admin`
|
|
- Password: `admin123`
|
|
- Data source: Prometheus (http://localhost:9090)
|
|
|
|
### Prometheus (Port 9090)
|
|
**Metrics collection and storage**
|
|
|
|
### Loki (Port 3100)
|
|
**Log aggregation**
|
|
|
|
## Dashboards
|
|
|
|
Available dashboard configurations in `grafana/dashboards/`:
|
|
|
|
- `load-test-performance.json` - Load test metrics
|
|
- `websocket-performance.json` - WebSocket performance
|
|
- `system-health.json` - System health monitoring
|
|
- `rsync-performance.json` - Rsync performance metrics
|
|
|
|
### Importing Dashboards
|
|
|
|
1. Go to Grafana → "+" → "Import"
|
|
2. Upload JSON files from `grafana/dashboards/` directory
|
|
3. Select Prometheus data source
|
|
|
|
## Configuration Files
|
|
|
|
- `prometheus/prometheus.yml` - Prometheus configuration
|
|
- `loki-config.yml` - Loki configuration
|
|
- `promtail-config.yml` - Promtail configuration
|
|
- `security_rules.yml` - Security rules
|
|
|
|
## Usage
|
|
|
|
1. Start monitoring stack: `make deploy-up`
|
|
2. Access Grafana: http://localhost:3000 (admin/admin123)
|
|
3. Import dashboards from `grafana/dashboards/` directory
|
|
4. View metrics and test results in real-time
|
|
|
|
## Health Endpoints
|
|
|
|
The API server provides health check endpoints for monitoring:
|
|
|
|
- **`/health`** - Overall service health (for Docker healthcheck)
|
|
- **`/health/live`** - Liveness probe (is the service running?)
|
|
- **`/health/ready`** - Readiness probe (can the service accept traffic?)
|
|
|
|
### Testing Health Endpoints
|
|
|
|
```bash
|
|
# Basic health check
|
|
curl -k https://localhost:9101/health
|
|
|
|
# Liveness check (for K8s or monitoring)
|
|
curl -k https://localhost:9101/health/live
|
|
|
|
# Readiness check (verifies dependencies)
|
|
curl -k https://localhost:9101/health/ready
|
|
```
|
|
|
|
See `health-testing.md` for detailed testing procedures.
|
|
|
|
## Prometheus Integration
|
|
|
|
Prometheus scrapes the following endpoints:
|
|
- `api-server:9101/metrics` - Application metrics (future)
|
|
- `api-server:9101/health` - Health status monitoring
|
|
- `host.docker.internal:9100/metrics` - Worker metrics (when the worker runs on the host)
|
|
- `worker:9100/metrics` - Worker metrics (when the worker runs as a container in the compose network)
|
|
|
|
## Cleanup (deprecated paths)
|
|
|
|
These legacy paths may still exist in the repo but are **not used** by the current dev compose config:
|
|
|
|
- `monitoring/dashboards/` (old dashboards location)
|
|
- `monitoring/prometheus.yml` (old Prometheus config location)
|
|
- `monitoring/grafana/provisioning/dashboards/dashboard.yml` (duplicate of `dashboards.yml`) |