fetch_ml/monitoring/health-testing.md

2.3 KiB

Testing Health Endpoints with Monitoring Stack

Verify Health Endpoints

# 1. Start the monitoring stack
cd deployments
docker-compose -f docker-compose.dev.yml up -d

# 2. Wait for services to start (30 seconds)
sleep 30

# 3. Test health endpoints
curl -k https://localhost:9101/health
# Expected: {"status":"healthy","timestamp":"...","checks":{}}

curl -k https://localhost:9101/health/live  
# Expected: {"status":"alive","timestamp":"..."}

curl -k https://localhost:9101/health/ready
# Expected: {"status":"ready","timestamp":"...","checks":{"queue":"ok","experiments":"ok"}}

# 4. Check Docker health status
docker ps | grep api-server
# Should show: (healthy)

# 5. Access Grafana
open http://localhost:3000
# Login: admin / admin123

# 6. Access Prometheus
open http://localhost:9090
# Check targets: Status > Targets
# Should see: api-server, api-server-health

# 7. Query health metrics in Prometheus
# Go to Graph and enter: up{job="api-server-health"}
# Should show: value=1 (service is up)

Health Check Integration

Docker Compose

The health check is configured in deployments/docker-compose.dev.yml:

healthcheck:
  test: [ "CMD", "curl", "-k", "https://localhost:9101/health" ]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Prometheus Monitoring

Prometheus scrapes health status every 30s from:

  • /health - Overall service health
  • /metrics - Future Prometheus metrics (when implemented)

Kubernetes (Future)

Health endpoints ready for K8s probes:

livenessProbe:
  httpGet:
    path: /health/live
    port: 9101
    scheme: HTTPS
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /health/ready
    port: 9101
    scheme: HTTPS
  initialDelaySeconds: 10
  periodSeconds: 5

Monitoring Stack Services

  • Grafana (port 3000): Dashboards and visualization
  • Prometheus (port 9090): Metrics collection
  • Loki (port 3100): Log aggregation
  • Promtail: Log shipping

Troubleshooting

# Check API server logs
docker logs ml-experiments-api

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# Check health endpoint directly
docker exec ml-experiments-api curl -k https://localhost:9101/health

# Restart services
docker-compose -f deployments/docker-compose.dev.yml restart api-server