7.2 KiB
7.2 KiB
Configuration Reference
Overview
This document provides a comprehensive reference for all configuration options in the FetchML project.
Environment Configurations
Local Development
File: configs/api/dev.yaml
auth:
enabled: true
api_keys:
dev_user:
hash: "CHANGE_ME_SHA256_DEV_USER_KEY"
admin: true
roles: ["admin"]
permissions:
"*": true
server:
address: ":9101"
tls:
enabled: false
security:
rate_limit:
enabled: false
ip_whitelist:
- "127.0.0.1"
- "::1"
- "localhost"
Multi-User Setup
File: configs/api/multi-user.yaml
auth:
enabled: true
api_keys:
admin_user:
hash: "CHANGE_ME_SHA256_ADMIN_USER_KEY"
admin: true
roles: ["user", "admin"]
permissions:
read: true
write: true
delete: true
researcher1:
hash: "CHANGE_ME_SHA256_RESEARCHER1_KEY"
admin: false
roles: ["user", "researcher"]
permissions:
jobs:read: true
jobs:create: true
jobs:update: true
jobs:delete: false
analyst1:
hash: "CHANGE_ME_SHA256_ANALYST1_KEY"
admin: false
roles: ["user", "analyst"]
permissions:
jobs:read: true
jobs:create: false
jobs:update: false
jobs:delete: false
Production
File: configs/api/prod.yaml
auth:
enabled: true
api_keys:
# Production users configured here
server:
address: ":9101"
tls:
enabled: true
cert_file: "/app/ssl/cert.pem"
key_file: "/app/ssl/key.pem"
security:
rate_limit:
enabled: true
requests_per_minute: 30
ip_whitelist:
- "127.0.0.1"
- "::1"
- "192.168.0.0/16"
- "10.0.0.0/8"
redis:
addr: "redis:6379"
password: ""
db: 0
logging:
level: "info"
file: "/app/logs/app.log"
audit_log: "/app/logs/audit.log"
Worker Configurations
Production Worker
File: configs/workers/worker-prod.toml
worker_id = "worker-prod-01"
base_path = "/data/ml-experiments"
max_workers = 4
redis_addr = "localhost:6379"
redis_password = "CHANGE_ME_REDIS_PASSWORD"
redis_db = 0
host = "localhost"
user = "ml-user"
port = 22
ssh_key = "~/.ssh/id_rsa"
podman_image = "ml-training:latest"
gpu_vendor = "none"
gpu_visible_devices = []
gpu_devices = []
container_workspace = "/workspace"
container_results = "/results"
train_script = "train.py"
[resources]
max_workers = 4
desired_rps_per_worker = 2
podman_cpus = "4"
podman_memory = "16g"
[metrics]
enabled = true
listen_addr = ":9100"
# Production Worker (NVIDIA, UUID-based GPU selection)
worker_id = "worker-prod-01"
base_path = "/data/ml-experiments"
podman_image = "ml-training:latest"
gpu_vendor = "nvidia"
gpu_visible_device_ids = ["GPU-REPLACE_WITH_REAL_UUID"]
gpu_devices = ["/dev/dri"]
container_workspace = "/workspace"
container_results = "/results"
train_script = "train.py"
Docker Worker
File: configs/workers/docker.yaml
worker_id: "docker-worker"
base_path: "/tmp/fetchml-jobs"
train_script: "train.py"
redis_addr: "redis:6379"
redis_password: ""
redis_db: 0
local_mode: true
max_workers: 1
poll_interval_seconds: 5
podman_image: "python:3.9-slim"
container_workspace: "/workspace"
container_results: "/results"
gpu_devices: []
gpu_vendor: "none"
gpu_visible_devices: []
metrics:
enabled: true
listen_addr: ":9100"
metrics_flush_interval: "500ms"
CLI Configuration
User Config File
Location: ~/.ml/config.toml
[server]
worker_host = "localhost"
worker_user = "appuser"
worker_base = "/app"
worker_port = 22
[auth]
api_key = "<your-api-key>"
[cli]
default_timeout = 30
verbose = false
Multi-User CLI Configs
Admin Config: ~/.ml/config-admin.toml
[server]
worker_host = "localhost"
worker_user = "appuser"
worker_base = "/app"
worker_port = 22
[auth]
api_key = "<admin-api-key>"
Researcher Config: ~/.ml/config-researcher.toml
[server]
worker_host = "localhost"
worker_user = "appuser"
worker_base = "/app"
worker_port = 22
[auth]
api_key = "<researcher-api-key>"
Analyst Config: ~/.ml/config-analyst.toml
[server]
worker_host = "localhost"
worker_user = "appuser"
worker_base = "/app"
worker_port = 22
[auth]
api_key = "<analyst-api-key>"
Configuration Options
Authentication
| Option | Type | Default | Description |
|---|---|---|---|
auth.enabled |
bool | false | Enable authentication |
auth.apikeys |
map | {} | API key configurations |
auth.apikeys.[user].hash |
string | - | SHA256 hash of API key |
auth.apikeys.[user].admin |
bool | false | Admin privileges |
auth.apikeys.[user].roles |
array | [] | User roles |
auth.apikeys.[user].permissions |
map | {} | User permissions |
Server
| Option | Type | Default | Description |
|---|---|---|---|
server.address |
string | ":9101" | Server bind address |
server.tls.enabled |
bool | false | Enable TLS |
server.tls.cert_file |
string | - | TLS certificate file |
server.tls.key_file |
string | - | TLS private key file |
Security
| Option | Type | Default | Description |
|---|---|---|---|
security.rate_limit.enabled |
bool | true | Enable rate limiting |
security.rate_limit.requests_per_minute |
int | 60 | Rate limit |
security.ip_whitelist |
array | [] | Allowed IP addresses |
Redis
| Option | Type | Default | Description |
|---|---|---|---|
redis.url |
string | "redis://localhost:6379" | Redis connection URL |
redis.max_connections |
int | 10 | Max Redis connections |
Logging
| Option | Type | Default | Description |
|---|---|---|---|
logging.level |
string | "info" | Log level |
logging.file |
string | - | Log file path |
logging.audit_file |
string | - | Audit log path |
Permission System
Permission Keys
| Permission | Description |
|---|---|
jobs:read |
Read job information |
jobs:create |
Create new jobs |
jobs:update |
Update existing jobs |
jobs:delete |
Delete jobs |
* |
All permissions (admin only) |
Role-Based Permissions
| Role | Default Permissions |
|---|---|
| admin | All permissions |
| researcher | jobs:read, jobs:create, jobs:update |
| analyst | jobs:read |
| user | No default permissions |
Environment Variables
| Variable | Default | Description |
|---|---|---|
FETCHML_CONFIG |
- | Path to config file |
FETCHML_LOG_LEVEL |
"info" | Override log level |
CLI_CONFIG |
- | Path to CLI config file |
Troubleshooting
Common Configuration Issues
-
Authentication Failures
- Check API key hashes are correct SHA256
- Verify YAML syntax
- Ensure auth.enabled: true
-
Connection Issues
- Verify server address and ports
- Check firewall settings
- Validate network connectivity
-
Permission Issues
- Check user roles and permissions
- Verify permission key format
- Ensure admin users have "*": true
Configuration Validation
# Validate server configuration
go run cmd/api-server/main.go --config configs/api/dev.yaml --validate
# Test CLI configuration
./cli/zig-out/bin/ml status --debug