# Testing Health Endpoints with Monitoring Stack ## Verify Health Endpoints ```bash # 1. Start the monitoring stack cd deployments docker-compose -f docker-compose.dev.yml up -d # 2. Wait for services to start (30 seconds) sleep 30 # 3. Test health endpoints curl -k https://localhost:9101/health # Expected: {"status":"healthy","timestamp":"...","checks":{}} curl -k https://localhost:9101/health/live # Expected: {"status":"alive","timestamp":"..."} curl -k https://localhost:9101/health/ready # Expected: {"status":"ready","timestamp":"...","checks":{"queue":"ok","experiments":"ok"}} # 4. Check Docker health status docker ps | grep api-server # Should show: (healthy) # 5. Access Grafana open http://localhost:3000 # Login: admin / admin123 # 6. Access Prometheus open http://localhost:9090 # Check targets: Status > Targets # Should see: api-server, api-server-health # 7. Query health metrics in Prometheus # Go to Graph and enter: up{job="api-server-health"} # Should show: value=1 (service is up) ``` ## Health Check Integration ### Docker Compose The health check is configured in `deployments/docker-compose.dev.yml`: ```yaml healthcheck: test: [ "CMD", "curl", "-k", "https://localhost:9101/health" ] interval: 30s timeout: 10s retries: 3 start_period: 40s ``` ### Prometheus Monitoring Prometheus scrapes health status every 30s from: - `/health` - Overall service health - `/metrics` - Future Prometheus metrics (when implemented) ### Kubernetes (Future) Health endpoints ready for K8s probes: ```yaml livenessProbe: httpGet: path: /health/live port: 9101 scheme: HTTPS initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 9101 scheme: HTTPS initialDelaySeconds: 10 periodSeconds: 5 ``` ## Monitoring Stack Services - **Grafana** (port 3000): Dashboards and visualization - **Prometheus** (port 9090): Metrics collection - **Loki** (port 3100): Log aggregation - **Promtail**: Log shipping ## Troubleshooting ```bash # Check API server logs docker logs ml-experiments-api # Check Prometheus targets curl http://localhost:9090/api/v1/targets # Check health endpoint directly docker exec ml-experiments-api curl -k https://localhost:9101/health # Restart services docker-compose -f deployments/docker-compose.dev.yml restart api-server ```