fetch_ml/README.md
Jeremie Fraeys 69dc9e6af4 Clean up README files and enhance testing documentation
- Remove 14 duplicate README files from test fixtures
- Clean up and restructure docs/src/testing.md with comprehensive testing guide
- Update main README.md to highlight testing and reference docs/ structure
- Remove empty .new files
- Keep only valuable, directory-specific README files (reduced from 34 to 20)
2025-12-06 15:43:51 -05:00

207 lines
5.2 KiB
Markdown

# FetchML - Machine Learning Platform
A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.
## Features
- **🚀 Production Resilience** - Task leasing, smart retries, dead-letter queues
- **📊 Monitoring** - Grafana/Prometheus/Loki with auto-provisioned dashboards
- **🔐 Security** - API key auth, TLS, rate limiting, IP whitelisting
- **⚡ Performance** - Go API server + Zig CLI for speed
- **📦 Easy Deployment** - Docker Compose (dev) or systemd (prod)
## Quick Start
### Development (macOS/Linux)
```bash
# Clone and start
git clone <your-repo>
cd fetch_ml
docker-compose up -d
# Access Grafana: http://localhost:3000 (admin/admin)
```
### Production (Linux)
```bash
# Setup application
sudo ./scripts/setup-prod.sh
# Setup monitoring
sudo ./scripts/setup-monitoring-prod.sh
# Build and install
make prod
make install
# Start services
sudo systemctl start fetchml-api fetchml-worker
sudo systemctl start prometheus grafana loki promtail
```
## Architecture
```
┌──────────────┐ WebSocket ┌──────────────┐
│ Zig CLI/TUI │◄─────────────►│ API Server │
└──────────────┘ │ (Go) │
└──────┬───────┘
┌─────────────┼─────────────┐
│ │ │
┌────▼────┐ ┌───▼────┐ ┌───▼────┐
│ Redis │ │ Worker │ │ Loki │
│ (Queue) │ │ (Go) │ │ (Logs) │
└─────────┘ └────────┘ └────────┘
```
## Usage
### API Server
```bash
# Development (stderr logging)
go run cmd/api-server/main.go --config configs/config-dev.yaml
# Production (file logging)
go run cmd/api-server/main.go --config configs/config-no-tls.yaml
```
### CLI
```bash
# Build
cd cli && zig build prod
# Run experiment
./cli/zig-out/bin/ml run --config config.toml
# Check status
./cli/zig-out/bin/ml status
```
### Docker
```bash
make docker-run # Start all services
make docker-logs # View logs
make docker-stop # Stop services
```
## Development
### Prerequisites
- Go 1.21+
- Zig 0.11+
- Redis
- Docker (for local dev)
### Build
```bash
make build # All components
make dev # Fast dev build
make prod # Optimized production build
```
### Testing
```bash
make test # All tests
make test-unit # Unit tests only
make test-coverage # With coverage report
make test-auth # Multi-user authentication tests
```
**Quick Start Testing**: See **[Testing Guide](docs/src/testing.md)** for comprehensive testing documentation, including a 5-minute quick start guide.
## Configuration
### Development (`configs/config-dev.yaml`)
```yaml
logging:
level: "info"
file: "" # stderr only
redis:
url: "redis://localhost:6379"
```
### Production (`configs/config-no-tls.yaml`)
```yaml
logging:
level: "info"
file: "./logs/fetch_ml.log" # file only
redis:
url: "redis://redis:6379"
```
## Monitoring
### Grafana Dashboards (Auto-Provisioned)
- **ML Task Queue** - Queue depth, task duration, failure rates
- **Application Logs** - Log streams, error tracking, search
Access: `http://localhost:3000` (dev) or `http://YOUR_SERVER:3000` (prod)
### Metrics
- Queue depth and task processing rates
- Retry attempts by error category
- Dead letter queue size
- Lease expirations
## Documentation
- **[Testing Guide](docs/src/testing.md)** - Comprehensive testing documentation
- **[Quick Start Testing](docs/src/quick-start-testing.md)** - 5-minute testing guide
- **[Installation](docs/src/installation.md)** - Setup instructions
- **[Architecture](docs/src/architecture.md)** - System design
- **[Configuration Reference](docs/src/configuration-reference.md)** - Configuration options
- **[CLI Reference](docs/src/cli-reference.md)** - Command-line interface
- **[Deployment](docs/src/deployment.md)** - Production deployment
- **[Troubleshooting](docs/src/troubleshooting.md)** - Common issues
## Makefile Targets
```bash
# Build
make build # Build all components
make prod # Production build
make clean # Clean artifacts
# Docker
make docker-build # Build image
make docker-run # Start services
make docker-stop # Stop services
# Test
make test # All tests
make test-coverage # With coverage
# Production (Linux only)
make setup # Setup app
make setup-monitoring # Setup monitoring
make install # Install binaries
```
## Security
- **TLS/HTTPS** - End-to-end encryption
- **API Keys** - Hashed with SHA256
- **Rate Limiting** - Per-user quotas
- **IP Whitelist** - Network restrictions
- **Audit Logging** - All API access logged
## License
MIT - See [LICENSE](LICENSE)
## Contributing
Contributions welcome! This is a personal homelab project but PRs are appreciated.