- Add comprehensive README with architecture overview and quick start guide - Set up Go module with production-ready dependencies - Configure build system with Makefile for development and production builds - Add Docker Compose for local development environment - Include project configuration files (linting, Python, etc.) This establishes the foundation for a production-ready ML experiment platform with task queuing, monitoring, and modern CLI/API interface.
200 lines
4.7 KiB
Markdown
200 lines
4.7 KiB
Markdown
# FetchML - Machine Learning Platform
|
|
|
|
A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.
|
|
|
|
## Features
|
|
|
|
- **🚀 Production Resilience** - Task leasing, smart retries, dead-letter queues
|
|
- **📊 Monitoring** - Grafana/Prometheus/Loki with auto-provisioned dashboards
|
|
- **🔐 Security** - API key auth, TLS, rate limiting, IP whitelisting
|
|
- **⚡ Performance** - Go API server + Zig CLI for speed
|
|
- **📦 Easy Deployment** - Docker Compose (dev) or systemd (prod)
|
|
|
|
## Quick Start
|
|
|
|
### Development (macOS/Linux)
|
|
|
|
```bash
|
|
# Clone and start
|
|
git clone <your-repo>
|
|
cd fetch_ml
|
|
docker-compose up -d
|
|
|
|
# Access Grafana: http://localhost:3000 (admin/admin)
|
|
```
|
|
|
|
### Production (Linux)
|
|
|
|
```bash
|
|
# Setup application
|
|
sudo ./scripts/setup-prod.sh
|
|
|
|
# Setup monitoring
|
|
sudo ./scripts/setup-monitoring-prod.sh
|
|
|
|
# Build and install
|
|
make prod
|
|
make install
|
|
|
|
# Start services
|
|
sudo systemctl start fetchml-api fetchml-worker
|
|
sudo systemctl start prometheus grafana loki promtail
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────┐ WebSocket ┌──────────────┐
|
|
│ Zig CLI/TUI │◄─────────────►│ API Server │
|
|
└──────────────┘ │ (Go) │
|
|
└──────┬───────┘
|
|
│
|
|
┌─────────────┼─────────────┐
|
|
│ │ │
|
|
┌────▼────┐ ┌───▼────┐ ┌───▼────┐
|
|
│ Redis │ │ Worker │ │ Loki │
|
|
│ (Queue) │ │ (Go) │ │ (Logs) │
|
|
└─────────┘ └────────┘ └────────┘
|
|
```
|
|
|
|
## Usage
|
|
|
|
### API Server
|
|
|
|
```bash
|
|
# Development (stderr logging)
|
|
go run cmd/api-server/main.go --config configs/config-dev.yaml
|
|
|
|
# Production (file logging)
|
|
go run cmd/api-server/main.go --config configs/config-no-tls.yaml
|
|
```
|
|
|
|
### CLI
|
|
|
|
```bash
|
|
# Build
|
|
cd cli && zig build prod
|
|
|
|
# Run experiment
|
|
./cli/zig-out/bin/ml run --config config.toml
|
|
|
|
# Check status
|
|
./cli/zig-out/bin/ml status
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
make docker-run # Start all services
|
|
make docker-logs # View logs
|
|
make docker-stop # Stop services
|
|
```
|
|
|
|
## Development
|
|
|
|
### Prerequisites
|
|
|
|
- Go 1.21+
|
|
- Zig 0.11+
|
|
- Redis
|
|
- Docker (for local dev)
|
|
|
|
### Build
|
|
|
|
```bash
|
|
make build # All components
|
|
make dev # Fast dev build
|
|
make prod # Optimized production build
|
|
```
|
|
|
|
### Test
|
|
|
|
```bash
|
|
make test # All tests
|
|
make test-unit # Unit tests only
|
|
make test-coverage # With coverage report
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Development (`configs/config-dev.yaml`)
|
|
```yaml
|
|
logging:
|
|
level: "info"
|
|
file: "" # stderr only
|
|
|
|
redis:
|
|
url: "redis://localhost:6379"
|
|
```
|
|
|
|
### Production (`configs/config-no-tls.yaml`)
|
|
```yaml
|
|
logging:
|
|
level: "info"
|
|
file: "./logs/fetch_ml.log" # file only
|
|
|
|
redis:
|
|
url: "redis://redis:6379"
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Grafana Dashboards (Auto-Provisioned)
|
|
|
|
- **ML Task Queue** - Queue depth, task duration, failure rates
|
|
- **Application Logs** - Log streams, error tracking, search
|
|
|
|
Access: `http://localhost:3000` (dev) or `http://YOUR_SERVER:3000` (prod)
|
|
|
|
### Metrics
|
|
|
|
- Queue depth and task processing rates
|
|
- Retry attempts by error category
|
|
- Dead letter queue size
|
|
- Lease expirations
|
|
|
|
## Documentation
|
|
|
|
- **[Getting Started](docs/getting-started.md)** - Detailed setup guide
|
|
- **[Production Deployment](docs/production-monitoring.md)** - Linux deployment
|
|
- **[WebSocket API](docs/api/)** - Protocol documentation
|
|
- **[Architecture](docs/architecture/)** - System design
|
|
|
|
## Makefile Targets
|
|
|
|
```bash
|
|
# Build
|
|
make build # Build all components
|
|
make prod # Production build
|
|
make clean # Clean artifacts
|
|
|
|
# Docker
|
|
make docker-build # Build image
|
|
make docker-run # Start services
|
|
make docker-stop # Stop services
|
|
|
|
# Test
|
|
make test # All tests
|
|
make test-coverage # With coverage
|
|
|
|
# Production (Linux only)
|
|
make setup # Setup app
|
|
make setup-monitoring # Setup monitoring
|
|
make install # Install binaries
|
|
```
|
|
|
|
## Security
|
|
|
|
- **TLS/HTTPS** - End-to-end encryption
|
|
- **API Keys** - Hashed with SHA256
|
|
- **Rate Limiting** - Per-user quotas
|
|
- **IP Whitelist** - Network restrictions
|
|
- **Audit Logging** - All API access logged
|
|
|
|
## License
|
|
|
|
MIT - See [LICENSE](LICENSE)
|
|
|
|
## Contributing
|
|
|
|
Contributions welcome! This is a personal homelab project but PRs are appreciated.
|