fetch_ml/README.md
Jeremie Fraeys c5049a2fdf feat: initialize FetchML ML platform with core project structure
- Add comprehensive README with architecture overview and quick start guide
- Set up Go module with production-ready dependencies
- Configure build system with Makefile for development and production builds
- Add Docker Compose for local development environment
- Include project configuration files (linting, Python, etc.)

This establishes the foundation for a production-ready ML experiment platform
with task queuing, monitoring, and modern CLI/API interface.
2025-12-04 16:52:09 -05:00

200 lines
4.7 KiB
Markdown

# FetchML - Machine Learning Platform
A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.
## Features
- **🚀 Production Resilience** - Task leasing, smart retries, dead-letter queues
- **📊 Monitoring** - Grafana/Prometheus/Loki with auto-provisioned dashboards
- **🔐 Security** - API key auth, TLS, rate limiting, IP whitelisting
- **⚡ Performance** - Go API server + Zig CLI for speed
- **📦 Easy Deployment** - Docker Compose (dev) or systemd (prod)
## Quick Start
### Development (macOS/Linux)
```bash
# Clone and start
git clone <your-repo>
cd fetch_ml
docker-compose up -d
# Access Grafana: http://localhost:3000 (admin/admin)
```
### Production (Linux)
```bash
# Setup application
sudo ./scripts/setup-prod.sh
# Setup monitoring
sudo ./scripts/setup-monitoring-prod.sh
# Build and install
make prod
make install
# Start services
sudo systemctl start fetchml-api fetchml-worker
sudo systemctl start prometheus grafana loki promtail
```
## Architecture
```
┌──────────────┐ WebSocket ┌──────────────┐
│ Zig CLI/TUI │◄─────────────►│ API Server │
└──────────────┘ │ (Go) │
└──────┬───────┘
┌─────────────┼─────────────┐
│ │ │
┌────▼────┐ ┌───▼────┐ ┌───▼────┐
│ Redis │ │ Worker │ │ Loki │
│ (Queue) │ │ (Go) │ │ (Logs) │
└─────────┘ └────────┘ └────────┘
```
## Usage
### API Server
```bash
# Development (stderr logging)
go run cmd/api-server/main.go --config configs/config-dev.yaml
# Production (file logging)
go run cmd/api-server/main.go --config configs/config-no-tls.yaml
```
### CLI
```bash
# Build
cd cli && zig build prod
# Run experiment
./cli/zig-out/bin/ml run --config config.toml
# Check status
./cli/zig-out/bin/ml status
```
### Docker
```bash
make docker-run # Start all services
make docker-logs # View logs
make docker-stop # Stop services
```
## Development
### Prerequisites
- Go 1.21+
- Zig 0.11+
- Redis
- Docker (for local dev)
### Build
```bash
make build # All components
make dev # Fast dev build
make prod # Optimized production build
```
### Test
```bash
make test # All tests
make test-unit # Unit tests only
make test-coverage # With coverage report
```
## Configuration
### Development (`configs/config-dev.yaml`)
```yaml
logging:
level: "info"
file: "" # stderr only
redis:
url: "redis://localhost:6379"
```
### Production (`configs/config-no-tls.yaml`)
```yaml
logging:
level: "info"
file: "./logs/fetch_ml.log" # file only
redis:
url: "redis://redis:6379"
```
## Monitoring
### Grafana Dashboards (Auto-Provisioned)
- **ML Task Queue** - Queue depth, task duration, failure rates
- **Application Logs** - Log streams, error tracking, search
Access: `http://localhost:3000` (dev) or `http://YOUR_SERVER:3000` (prod)
### Metrics
- Queue depth and task processing rates
- Retry attempts by error category
- Dead letter queue size
- Lease expirations
## Documentation
- **[Getting Started](docs/getting-started.md)** - Detailed setup guide
- **[Production Deployment](docs/production-monitoring.md)** - Linux deployment
- **[WebSocket API](docs/api/)** - Protocol documentation
- **[Architecture](docs/architecture/)** - System design
## Makefile Targets
```bash
# Build
make build # Build all components
make prod # Production build
make clean # Clean artifacts
# Docker
make docker-build # Build image
make docker-run # Start services
make docker-stop # Stop services
# Test
make test # All tests
make test-coverage # With coverage
# Production (Linux only)
make setup # Setup app
make setup-monitoring # Setup monitoring
make install # Install binaries
```
## Security
- **TLS/HTTPS** - End-to-end encryption
- **API Keys** - Hashed with SHA256
- **Rate Limiting** - Per-user quotas
- **IP Whitelist** - Network restrictions
- **Audit Logging** - All API access logged
## License
MIT - See [LICENSE](LICENSE)
## Contributing
Contributions welcome! This is a personal homelab project but PRs are appreciated.