fetch_ml/docs/src/adr/ADR-001-use-go-for-api-server.md
Jeremie Fraeys ea15af1833 Fix multi-user authentication and clean up debug code
- Fix YAML tags in auth config struct (json -> yaml)
- Update CLI configs to use pre-hashed API keys
- Remove double hashing in WebSocket client
- Fix port mapping (9102 -> 9103) in CLI commands
- Update permission keys to use jobs:read, jobs:create, etc.
- Clean up all debug logging from CLI and server
- All user roles now authenticate correctly:
  * Admin: Can queue jobs and see all jobs
  * Researcher: Can queue jobs and see own jobs
  * Analyst: Can see status (read-only access)

Multi-user authentication is now fully functional.
2025-12-06 12:35:32 -05:00

2.4 KiB

ADR-001: Use Go for API Server

Status

Accepted

Context

We needed to choose a programming language for the Fetch ML API server that would provide:

  • High performance for ML experiment management
  • Strong concurrency support for handling multiple experiments
  • Good ecosystem for HTTP APIs and WebSocket connections
  • Easy deployment and containerization
  • Strong type safety and reliability

Decision

We chose Go as the primary language for the API server implementation.

Consequences

Positive

  • Excellent performance with low memory footprint
  • Built-in concurrency primitives (goroutines, channels) perfect for parallel ML experiment execution
  • Rich ecosystem for HTTP servers, WebSocket, and database drivers
  • Static compilation creates single binary deployments
  • Strong typing catches many errors at compile time
  • Good tooling for testing, benchmarking, and profiling

Negative

  • Steeper learning curve for team members unfamiliar with Go
  • Less expressive than dynamic languages for rapid prototyping
  • Smaller ecosystem for ML-specific libraries compared to Python

Options Considered

Python with FastAPI

Pros:

  • Rich ML ecosystem (TensorFlow, PyTorch, scikit-learn)
  • Easy to learn and write
  • Great for data science teams
  • FastAPI provides good performance

Cons:

  • Global Interpreter Lock limits true parallelism
  • Higher memory usage
  • Slower performance for high-throughput scenarios
  • More complex deployment (multiple files, dependencies)

Node.js with Express

Pros:

  • Excellent WebSocket support
  • Large ecosystem
  • Fast development cycle

Cons:

  • Single-threaded event loop can be limiting
  • Not ideal for CPU-intensive ML operations
  • Dynamic typing can lead to runtime errors

Rust

Pros:

  • Maximum performance and memory safety
  • Strong type system
  • Growing ecosystem

Cons:

  • Very steep learning curve
  • Longer development time
  • Smaller ecosystem for web frameworks

Java with Spring Boot

Pros:

  • Mature ecosystem
  • Good performance
  • Strong typing

Cons:

  • Higher memory usage
  • More verbose syntax
  • Slower startup time
  • Heavier deployment footprint

Rationale

Go provides the best balance of performance, concurrency support, and deployment simplicity for our API server needs. The ability to handle many concurrent ML experiments efficiently with goroutines is a key advantage. The single binary deployment model also simplifies our containerization and distribution strategy.