fetch_ml/docs/src/troubleshooting.md

95 lines
2.2 KiB
Markdown

# Troubleshooting
Common issues and solutions for Fetch ML.
## Quick Fixes
### Services Not Starting
```bash
# Check container status
docker ps --filter "name=ml-"
# Restart development stack
make dev-down
make dev-up
```
### API Not Responding
```bash
# Check health endpoint
curl http://localhost:8080/health
# Check if port is in use
lsof -i :8080
lsof -i :8443
# Kill process on port
kill -9 $(lsof -ti :8080)
```
### Database / Redis Issues
```bash
# Check Redis from container
docker exec ml-experiments-redis redis-cli ping
# Check API can reach database (via health endpoint)
curl -f http://localhost:8080/health || echo "API not healthy"
```
## Common Errors
### Authentication Errors
- **Invalid API key**: Check config and regenerate hash
- **JWT expired**: Check `jwt_expiry` setting
### Database Errors
- **Connection failed**: Verify database type and connection params
- **No such table**: Run migrations with `--migrate` (see [Quick Start](quick-start.md))
### Container Errors
- **Runtime not found**: Set `runtime: docker (testing only)` in config
- **Image pull failed**: Check registry access
## Performance Issues
- **High memory**: Adjust `resources.memory_limit`
- **Slow jobs**: Check worker count and queue size
## Development Issues
- **Build fails**: `go mod tidy` and `cd cli && rm -rf zig-out zig-cache`
- **Tests fail**: Ensure dev stack is running with `make dev-up` or use `make test-auth`
## CLI Issues
- **Not found**: `cd cli && zig build --release=fast`
- **Connection errors**: Check `--server` and `--api-key`
## Network Issues
- **Port conflicts**: `lsof -i :8080` / `lsof -i :8443` and kill processes
- **Firewall**: Allow ports 8080, 8443, 6379, 5432
## Configuration Issues
- **Invalid YAML**: `python3 -c "import yaml; yaml.safe_load(open('config.yaml'))"`
- **Missing fields**: Run `see [Configuration Schema](configuration-schema.md)`
## Debug Information
```bash
./bin/api-server --version
docker ps --filter "name=ml-"
docker logs ml-experiments-api | grep ERROR
```
## Emergency Reset
```bash
# Stop and remove all dev containers and volumes
make dev-down
docker volume prune
# Remove local data if needed
rm -rf data/ results/ *.db
# Start fresh dev stack
make dev-up
```