fetch_ml/monitoring/health-testing.md

100 lines
2.3 KiB
Markdown

# Testing Health Endpoints with Monitoring Stack
## Verify Health Endpoints
```bash
# 1. Start the monitoring stack
cd deployments
docker-compose -f docker-compose.dev.yml up -d
# 2. Wait for services to start (30 seconds)
sleep 30
# 3. Test health endpoints
curl -k https://localhost:9101/health
# Expected: {"status":"healthy","timestamp":"...","checks":{}}
curl -k https://localhost:9101/health/live
# Expected: {"status":"alive","timestamp":"..."}
curl -k https://localhost:9101/health/ready
# Expected: {"status":"ready","timestamp":"...","checks":{"queue":"ok","experiments":"ok"}}
# 4. Check Docker health status
docker ps | grep api-server
# Should show: (healthy)
# 5. Access Grafana
open http://localhost:3000
# Login: admin / admin123
# 6. Access Prometheus
open http://localhost:9090
# Check targets: Status > Targets
# Should see: api-server, api-server-health
# 7. Query health metrics in Prometheus
# Go to Graph and enter: up{job="api-server-health"}
# Should show: value=1 (service is up)
```
## Health Check Integration
### Docker Compose
The health check is configured in `deployments/docker-compose.dev.yml`:
```yaml
healthcheck:
test: [ "CMD", "curl", "-k", "https://localhost:9101/health" ]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
### Prometheus Monitoring
Prometheus scrapes health status every 30s from:
- `/health` - Overall service health
- `/metrics` - Future Prometheus metrics (when implemented)
### Kubernetes (Future)
Health endpoints ready for K8s probes:
```yaml
livenessProbe:
httpGet:
path: /health/live
port: 9101
scheme: HTTPS
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 9101
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 5
```
## Monitoring Stack Services
- **Grafana** (port 3000): Dashboards and visualization
- **Prometheus** (port 9090): Metrics collection
- **Loki** (port 3100): Log aggregation
- **Promtail**: Log shipping
## Troubleshooting
```bash
# Check API server logs
docker logs ml-experiments-api
# Check Prometheus targets
curl http://localhost:9090/api/v1/targets
# Check health endpoint directly
docker exec ml-experiments-api curl -k https://localhost:9101/health
# Restart services
docker-compose -f deployments/docker-compose.dev.yml restart api-server
```