100 lines
2.3 KiB
Markdown
100 lines
2.3 KiB
Markdown
# Testing Health Endpoints with Monitoring Stack
|
|
|
|
## Verify Health Endpoints
|
|
|
|
```bash
|
|
# 1. Start the monitoring stack
|
|
cd deployments
|
|
docker-compose -f docker-compose.dev.yml up -d
|
|
|
|
# 2. Wait for services to start (30 seconds)
|
|
sleep 30
|
|
|
|
# 3. Test health endpoints
|
|
curl -k https://localhost:9101/health
|
|
# Expected: {"status":"healthy","timestamp":"...","checks":{}}
|
|
|
|
curl -k https://localhost:9101/health/live
|
|
# Expected: {"status":"alive","timestamp":"..."}
|
|
|
|
curl -k https://localhost:9101/health/ready
|
|
# Expected: {"status":"ready","timestamp":"...","checks":{"queue":"ok","experiments":"ok"}}
|
|
|
|
# 4. Check Docker health status
|
|
docker ps | grep api-server
|
|
# Should show: (healthy)
|
|
|
|
# 5. Access Grafana
|
|
open http://localhost:3000
|
|
# Login: admin / admin123
|
|
|
|
# 6. Access Prometheus
|
|
open http://localhost:9090
|
|
# Check targets: Status > Targets
|
|
# Should see: api-server, api-server-health
|
|
|
|
# 7. Query health metrics in Prometheus
|
|
# Go to Graph and enter: up{job="api-server-health"}
|
|
# Should show: value=1 (service is up)
|
|
```
|
|
|
|
## Health Check Integration
|
|
|
|
### Docker Compose
|
|
The health check is configured in `deployments/docker-compose.dev.yml`:
|
|
```yaml
|
|
healthcheck:
|
|
test: [ "CMD", "curl", "-k", "https://localhost:9101/health" ]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
```
|
|
|
|
### Prometheus Monitoring
|
|
Prometheus scrapes health status every 30s from:
|
|
- `/health` - Overall service health
|
|
- `/metrics` - Future Prometheus metrics (when implemented)
|
|
|
|
### Kubernetes (Future)
|
|
Health endpoints ready for K8s probes:
|
|
```yaml
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 9101
|
|
scheme: HTTPS
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 9101
|
|
scheme: HTTPS
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
```
|
|
|
|
## Monitoring Stack Services
|
|
|
|
- **Grafana** (port 3000): Dashboards and visualization
|
|
- **Prometheus** (port 9090): Metrics collection
|
|
- **Loki** (port 3100): Log aggregation
|
|
- **Promtail**: Log shipping
|
|
|
|
## Troubleshooting
|
|
|
|
```bash
|
|
# Check API server logs
|
|
docker logs ml-experiments-api
|
|
|
|
# Check Prometheus targets
|
|
curl http://localhost:9090/api/v1/targets
|
|
|
|
# Check health endpoint directly
|
|
docker exec ml-experiments-api curl -k https://localhost:9101/health
|
|
|
|
# Restart services
|
|
docker-compose -f deployments/docker-compose.dev.yml restart api-server
|
|
```
|