- Add complete API documentation and architecture guides - Include quick start, installation, and deployment guides - Add troubleshooting and security documentation - Include CLI reference and configuration schema docs - Add production monitoring and operations guides - Implement MkDocs configuration with search functionality - Include comprehensive user and developer documentation Provides complete documentation for users and developers covering all aspects of the FetchML platform.
322 lines
6.7 KiB
Markdown
322 lines
6.7 KiB
Markdown
---
|
|
layout: page
|
|
title: "Task Queue Architecture"
|
|
permalink: /queue/
|
|
nav_order: 3
|
|
---
|
|
|
|
# Task Queue Architecture
|
|
|
|
The task queue system enables reliable job processing between the API server and workers using Redis.
|
|
|
|
## Overview
|
|
|
|
```mermaid
|
|
graph LR
|
|
CLI[CLI/Client] -->|WebSocket| API[API Server]
|
|
API -->|Enqueue| Redis[(Redis)]
|
|
Redis -->|Dequeue| Worker[Worker]
|
|
Worker -->|Update Status| Redis
|
|
```
|
|
|
|
## Components
|
|
|
|
### TaskQueue (`internal/queue`)
|
|
|
|
Shared package used by both API server and worker for job management.
|
|
|
|
#### Task Structure
|
|
|
|
```go
|
|
type Task struct {
|
|
ID string // Unique task ID (UUID)
|
|
JobName string // User-defined job name
|
|
Args string // Job arguments
|
|
Status string // queued, running, completed, failed
|
|
Priority int64 // Higher = executed first
|
|
CreatedAt time.Time
|
|
StartedAt *time.Time
|
|
EndedAt *time.Time
|
|
WorkerID string
|
|
Error string
|
|
Datasets []string
|
|
Metadata map[string]string // commit_id, user, etc
|
|
}
|
|
```
|
|
|
|
#### TaskQueue Interface
|
|
|
|
```go
|
|
// Initialize queue
|
|
queue, err := queue.NewTaskQueue(queue.Config{
|
|
RedisAddr: "localhost:6379",
|
|
RedisPassword: "",
|
|
RedisDB: 0,
|
|
})
|
|
|
|
// Add task (API server)
|
|
task := &queue.Task{
|
|
ID: uuid.New().String(),
|
|
JobName: "train-model",
|
|
Status: "queued",
|
|
Priority: 5,
|
|
Metadata: map[string]string{
|
|
"commit_id": commitID,
|
|
"user": username,
|
|
},
|
|
}
|
|
err = queue.AddTask(task)
|
|
|
|
// Get next task (Worker)
|
|
task, err := queue.GetNextTask()
|
|
|
|
// Update task status
|
|
task.Status = "running"
|
|
err = queue.UpdateTask(task)
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
### Job Submission Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant CLI
|
|
participant API
|
|
participant Redis
|
|
participant Worker
|
|
|
|
CLI->>API: Queue Job (WebSocket)
|
|
API->>API: Create Task (UUID)
|
|
API->>Redis: ZADD task:queue
|
|
API->>Redis: SET task:{id}
|
|
API->>CLI: Success Response
|
|
|
|
Worker->>Redis: ZPOPMAX task:queue
|
|
Redis->>Worker: Task ID
|
|
Worker->>Redis: GET task:{id}
|
|
Redis->>Worker: Task Data
|
|
Worker->>Worker: Execute Job
|
|
Worker->>Redis: Update Status
|
|
```
|
|
|
|
### Protocol
|
|
|
|
**CLI → API** (Binary WebSocket):
|
|
```
|
|
[opcode:1][api_key_hash:64][commit_id:64][priority:1][job_name_len:1][job_name:var]
|
|
```
|
|
|
|
**API → Redis**:
|
|
- Priority queue: `ZADD task:queue {priority} {task_id}`
|
|
- Task data: `SET task:{id} {json}`
|
|
- Status: `HSET task:status:{job_name} ...`
|
|
|
|
**Worker ← Redis**:
|
|
- Poll: `ZPOPMAX task:queue 1` (highest priority first)
|
|
- Fetch: `GET task:{id}`
|
|
|
|
## Redis Data Structures
|
|
|
|
### Keys
|
|
|
|
```
|
|
task:queue # ZSET: priority queue
|
|
task:{uuid} # STRING: task JSON data
|
|
task:status:{job_name} # HASH: job status
|
|
worker:heartbeat # HASH: worker health
|
|
job:metrics:{job_name} # HASH: job metrics
|
|
```
|
|
|
|
### Priority Queue (ZSET)
|
|
|
|
```redis
|
|
ZADD task:queue 10 "uuid-1" # Priority 10
|
|
ZADD task:queue 5 "uuid-2" # Priority 5
|
|
ZPOPMAX task:queue 1 # Returns uuid-1 (highest)
|
|
```
|
|
|
|
## API Server Integration
|
|
|
|
### Initialization
|
|
|
|
```go
|
|
// cmd/api-server/main.go
|
|
queueCfg := queue.Config{
|
|
RedisAddr: cfg.Redis.Addr,
|
|
RedisPassword: cfg.Redis.Password,
|
|
RedisDB: cfg.Redis.DB,
|
|
}
|
|
taskQueue, err := queue.NewTaskQueue(queueCfg)
|
|
```
|
|
|
|
### WebSocket Handler
|
|
|
|
```go
|
|
// internal/api/ws.go
|
|
func (h *WSHandler) handleQueueJob(conn *websocket.Conn, payload []byte) error {
|
|
// Parse request
|
|
apiKeyHash, commitID, priority, jobName := parsePayload(payload)
|
|
|
|
// Create task with unique ID
|
|
taskID := uuid.New().String()
|
|
task := &queue.Task{
|
|
ID: taskID,
|
|
JobName: jobName,
|
|
Status: "queued",
|
|
Priority: int64(priority),
|
|
Metadata: map[string]string{
|
|
"commit_id": commitID,
|
|
"user": user,
|
|
},
|
|
}
|
|
|
|
// Enqueue
|
|
if err := h.queue.AddTask(task); err != nil {
|
|
return h.sendErrorPacket(conn, ErrorCodeDatabaseError, ...)
|
|
}
|
|
|
|
return h.sendSuccessPacket(conn, "Job queued")
|
|
}
|
|
```
|
|
|
|
## Worker Integration
|
|
|
|
### Task Polling
|
|
|
|
```go
|
|
// cmd/worker/worker_server.go
|
|
func (w *Worker) Start() error {
|
|
for {
|
|
task, err := w.queue.WaitForNextTask(ctx, 5*time.Second)
|
|
if task != nil {
|
|
go w.executeTask(task)
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Task Execution
|
|
|
|
```go
|
|
func (w *Worker) executeTask(task *queue.Task) {
|
|
// Update status
|
|
task.Status = "running"
|
|
task.StartedAt = &now
|
|
w.queue.UpdateTaskWithMetrics(task, "start")
|
|
|
|
// Execute
|
|
err := w.runJob(task)
|
|
|
|
// Finalize
|
|
task.Status = "completed" // or "failed"
|
|
task.EndedAt = &endTime
|
|
task.Error = err.Error() // if err != nil
|
|
w.queue.UpdateTaskWithMetrics(task, "final")
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### API Server (`configs/config.yaml`)
|
|
|
|
```yaml
|
|
redis:
|
|
addr: "localhost:6379"
|
|
password: ""
|
|
db: 0
|
|
```
|
|
|
|
### Worker (`configs/worker-config.yaml`)
|
|
|
|
```yaml
|
|
redis:
|
|
addr: "localhost:6379"
|
|
password: ""
|
|
db: 0
|
|
|
|
metrics_flush_interval: 500ms
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Queue Depth
|
|
|
|
```go
|
|
depth, err := queue.QueueDepth()
|
|
fmt.Printf("Pending tasks: %d\n", depth)
|
|
```
|
|
|
|
### Worker Heartbeat
|
|
|
|
```go
|
|
// Worker sends heartbeat every 30s
|
|
err := queue.Heartbeat(workerID)
|
|
```
|
|
|
|
### Metrics
|
|
|
|
```redis
|
|
HGETALL job:metrics:{job_name}
|
|
# Returns: timestamp, tasks_start, tasks_final, etc
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Task Failures
|
|
|
|
```go
|
|
if err := w.runJob(task); err != nil {
|
|
task.Status = "failed"
|
|
task.Error = err.Error()
|
|
w.queue.UpdateTask(task)
|
|
}
|
|
```
|
|
|
|
### Redis Connection Loss
|
|
|
|
```go
|
|
// TaskQueue automatically reconnects
|
|
// Workers should implement retry logic
|
|
for retries := 0; retries < 3; retries++ {
|
|
task, err := queue.GetNextTask()
|
|
if err == nil {
|
|
break
|
|
}
|
|
time.Sleep(backoff)
|
|
}
|
|
```
|
|
|
|
## Testing
|
|
|
|
```go
|
|
// tests using miniredis
|
|
s, _ := miniredis.Run()
|
|
defer s.Close()
|
|
|
|
tq, _ := queue.NewTaskQueue(queue.Config{
|
|
RedisAddr: s.Addr(),
|
|
})
|
|
|
|
task := &queue.Task{ID: "test-1", JobName: "test"}
|
|
tq.AddTask(task)
|
|
|
|
fetched, _ := tq.GetNextTask()
|
|
// assert fetched.ID == "test-1"
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Unique Task IDs**: Always use UUIDs to avoid conflicts
|
|
2. **Metadata**: Store commit_id and user in task metadata
|
|
3. **Priority**: Higher values execute first (0-255 range)
|
|
4. **Status Updates**: Update status at each lifecycle stage
|
|
5. **Error Logging**: Store detailed errors in task.Error
|
|
6. **Heartbeats**: Workers should send heartbeats regularly
|
|
7. **Metrics**: Use UpdateTaskWithMetrics for atomic updates
|
|
|
|
---
|
|
|
|
For implementation details, see:
|
|
- [internal/queue/task.go](https://github.com/jfraeys/fetch_ml/blob/main/internal/queue/task.go)
|
|
- [internal/queue/queue.go](https://github.com/jfraeys/fetch_ml/blob/main/internal/queue/queue.go)
|