feat: add comprehensive setup scripts and management tools
- Add production setup scripts for automated deployment - Include monitoring setup and configuration validation - Add legacy setup scripts for various Linux distributions - Implement Bitwarden integration for secure credential management - Add development and production environment setup - Include comprehensive management tools and utilities - Add shell script library with common functions Provides complete automation for setup, deployment, and management of FetchML platform in development and production environments.
This commit is contained in:
parent
385d2cf386
commit
bb25743b0f
15 changed files with 3414 additions and 0 deletions
94
scripts/README.md
Normal file
94
scripts/README.md
Normal file
|
|
@ -0,0 +1,94 @@
|
||||||
|
# Scripts Directory
|
||||||
|
|
||||||
|
This directory contains setup and utility scripts for FetchML.
|
||||||
|
|
||||||
|
## Production Scripts
|
||||||
|
|
||||||
|
### `setup-prod.sh`
|
||||||
|
**Purpose**: Automated production setup for Rocky Linux bare metal deployment
|
||||||
|
**Usage**: `sudo ./scripts/setup-prod.sh [base_path] [user] [group]`
|
||||||
|
**What it does**:
|
||||||
|
- Creates system user and groups
|
||||||
|
- Sets up directory structure (`/data/ml-experiments/*`)
|
||||||
|
- Installs dependencies (Go, Podman, Redis)
|
||||||
|
- Configures GPU support for Podman
|
||||||
|
- Creates systemd service files
|
||||||
|
- Sets up log rotation
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```bash
|
||||||
|
sudo ./scripts/setup-prod.sh /data/ml-experiments ml-user ml-group
|
||||||
|
```
|
||||||
|
|
||||||
|
### `validate-prod-config.sh`
|
||||||
|
**Purpose**: Validates production configuration files
|
||||||
|
**Usage**: `./scripts/validate-prod-config.sh [api-config] [worker-config]`
|
||||||
|
**What it does**:
|
||||||
|
- Checks config file syntax
|
||||||
|
- Verifies base_path consistency
|
||||||
|
- Tests Redis connectivity
|
||||||
|
- Validates Podman setup
|
||||||
|
- Checks directory permissions
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```bash
|
||||||
|
./scripts/validate-prod-config.sh configs/config-prod.yaml configs/worker-prod.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Legacy Setup Scripts (Deprecated)
|
||||||
|
|
||||||
|
The following scripts are from earlier iterations and are **deprecated** in favor of `setup-prod.sh`:
|
||||||
|
|
||||||
|
- `setup_rocky.sh` - Use `setup-prod.sh` instead
|
||||||
|
- `setup_ubuntu.sh` - Ubuntu support (not primary target)
|
||||||
|
- `auto_setup.sh` - Old automated setup (superseded)
|
||||||
|
- `setup_common.sh` - Common functions (integrated into setup-prod.sh)
|
||||||
|
- `quick_start.sh` - Quick dev setup (use docker-compose on macOS instead)
|
||||||
|
- `test_tools.sh` - Tool testing (integrated into validate-prod-config.sh)
|
||||||
|
|
||||||
|
### Cleanup Recommendation
|
||||||
|
These legacy scripts can be removed or archived. The current production setup only needs:
|
||||||
|
- `setup-prod.sh`
|
||||||
|
- `validate-prod-config.sh`
|
||||||
|
|
||||||
|
## Usage Workflow
|
||||||
|
|
||||||
|
### First-Time Production Setup
|
||||||
|
```bash
|
||||||
|
# 1. Run production setup
|
||||||
|
sudo ./scripts/setup-prod.sh
|
||||||
|
|
||||||
|
# 2. Copy and configure
|
||||||
|
sudo cp configs/config-prod.yaml /etc/fetch_ml/config.yaml
|
||||||
|
sudo cp configs/worker-prod.toml /etc/fetch_ml/worker.toml
|
||||||
|
sudo vim /etc/fetch_ml/config.yaml # Update API keys, etc.
|
||||||
|
|
||||||
|
# 3. Build and install
|
||||||
|
make prod
|
||||||
|
sudo make install
|
||||||
|
|
||||||
|
# 4. Validate
|
||||||
|
./scripts/validate-prod-config.sh /etc/fetch_ml/config.yaml /etc/fetch_ml/worker.toml
|
||||||
|
|
||||||
|
# 5. Start services
|
||||||
|
sudo systemctl start fetchml-api fetchml-worker
|
||||||
|
sudo systemctl enable fetchml-api fetchml-worker
|
||||||
|
```
|
||||||
|
|
||||||
|
### Development Setup (macOS)
|
||||||
|
```bash
|
||||||
|
# Use docker-compose for local development
|
||||||
|
docker-compose up -d
|
||||||
|
|
||||||
|
# Or run components directly
|
||||||
|
make dev
|
||||||
|
./bin/api-server -config configs/config-local.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Script Maintenance
|
||||||
|
|
||||||
|
When adding new scripts:
|
||||||
|
1. Add executable permission: `chmod +x scripts/new-script.sh`
|
||||||
|
2. Add header comment with purpose and usage
|
||||||
|
3. Update this README
|
||||||
|
4. Use consistent error handling and logging
|
||||||
49
scripts/create_bitwarden_fetchml_item.sh
Normal file
49
scripts/create_bitwarden_fetchml_item.sh
Normal file
|
|
@ -0,0 +1,49 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Create a Bitwarden item for a FetchML API user.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./scripts/create_bitwarden_fetchml_item.sh <username> <api_key> <api_key_hash>
|
||||||
|
#
|
||||||
|
# Requirements:
|
||||||
|
# - Bitwarden CLI (bw) installed
|
||||||
|
# - You are logged in and unlocked (bw login; bw unlock)
|
||||||
|
# - jq installed
|
||||||
|
#
|
||||||
|
# This script does NOT run on the homelab server. Run it from your
|
||||||
|
# own machine where you manage Bitwarden.
|
||||||
|
|
||||||
|
if [[ $# -ne 3 ]]; then
|
||||||
|
echo "Usage: $0 <username> <api_key> <api_key_hash>" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
USER_NAME="$1"
|
||||||
|
API_KEY="$2"
|
||||||
|
API_KEY_HASH="$3"
|
||||||
|
|
||||||
|
ITEM_NAME="FetchML API $USER_NAME"
|
||||||
|
|
||||||
|
# Get base item template
|
||||||
|
TEMPLATE_JSON=$(bw get template item)
|
||||||
|
|
||||||
|
# Build item JSON with jq
|
||||||
|
ITEM_JSON=$(echo "$TEMPLATE_JSON" | jq \
|
||||||
|
--arg name "$ITEM_NAME" \
|
||||||
|
--arg username "$USER_NAME" \
|
||||||
|
--arg password "$API_KEY" \
|
||||||
|
--arg hash "$API_KEY_HASH" \
|
||||||
|
'.name = $name
|
||||||
|
| .login.username = $username
|
||||||
|
| .login.password = $password
|
||||||
|
| .notes = "FetchML API key for user " + $username
|
||||||
|
| .fields = [{"name":"api_key_hash","value":$hash,"type":1}]')
|
||||||
|
|
||||||
|
# Create item in Bitwarden
|
||||||
|
# If you ever want to edit instead, you can capture the ID from this call
|
||||||
|
# and use: bw edit item <id> <json>
|
||||||
|
|
||||||
|
echo "$ITEM_JSON" | bw encode | bw create item
|
||||||
|
|
||||||
|
echo "Created Bitwarden item: $ITEM_NAME"
|
||||||
455
scripts/legacy/auto_setup.sh
Executable file
455
scripts/legacy/auto_setup.sh
Executable file
|
|
@ -0,0 +1,455 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Automatic Setup Script for ML Experiment Manager
|
||||||
|
# Handles complete environment setup with security features
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
print_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
detect_os() {
|
||||||
|
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||||
|
echo "macos"
|
||||||
|
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
|
||||||
|
echo "linux"
|
||||||
|
else
|
||||||
|
echo "unknown"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_go() {
|
||||||
|
print_info "Installing Go..."
|
||||||
|
|
||||||
|
local os=$(detect_os)
|
||||||
|
local go_version="1.23.0"
|
||||||
|
|
||||||
|
if [[ "$os" == "macos" ]]; then
|
||||||
|
if command -v brew &> /dev/null; then
|
||||||
|
brew install go
|
||||||
|
else
|
||||||
|
print_error "Homebrew not found. Please install Go manually."
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
elif [[ "$os" == "linux" ]]; then
|
||||||
|
wget -q "https://go.dev/dl/go${go_version}.linux-amd64.tar.gz"
|
||||||
|
sudo rm -rf /usr/local/go
|
||||||
|
sudo tar -C /usr/local -xzf "go${go_version}.linux-amd64.tar.gz"
|
||||||
|
rm "go${go_version}.linux-amd64.tar.gz"
|
||||||
|
|
||||||
|
# Add to PATH
|
||||||
|
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
|
||||||
|
export PATH=$PATH:/usr/local/go/bin
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Go installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_zig() {
|
||||||
|
print_info "Installing Zig..."
|
||||||
|
|
||||||
|
local os=$(detect_os)
|
||||||
|
|
||||||
|
if [[ "$os" == "macos" ]]; then
|
||||||
|
if command -v brew &> /dev/null; then
|
||||||
|
brew install zig
|
||||||
|
else
|
||||||
|
print_error "Homebrew not found. Please install Zig manually."
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
elif [[ "$os" == "linux" ]]; then
|
||||||
|
# Download Zig binary
|
||||||
|
local zig_version="0.13.0"
|
||||||
|
wget -q "https://ziglang.org/download/${zig_version}/zig-linux-x86_64-${zig_version}.tar.xz"
|
||||||
|
tar -xf "zig-linux-x86_64-${zig_version}.tar.xz"
|
||||||
|
sudo mv "zig-linux-x86_64-${zig_version}/zig" /usr/local/bin/
|
||||||
|
rm -rf "zig-linux-x86_64-${zig_version}.tar.xz" "zig-linux-x86_64-${zig_version}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Zig installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_docker() {
|
||||||
|
print_info "Installing Docker..."
|
||||||
|
|
||||||
|
local os=$(detect_os)
|
||||||
|
|
||||||
|
if [[ "$os" == "macos" ]]; then
|
||||||
|
if command -v brew &> /dev/null; then
|
||||||
|
brew install --cask docker
|
||||||
|
print_warning "Docker Desktop installed. Please start it manually."
|
||||||
|
else
|
||||||
|
print_error "Homebrew not found. Please install Docker manually."
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
elif [[ "$os" == "linux" ]]; then
|
||||||
|
# Install Docker using official script
|
||||||
|
curl -fsSL https://get.docker.com -o get-docker.sh
|
||||||
|
sudo sh get-docker.sh
|
||||||
|
sudo usermod -aG docker $USER
|
||||||
|
rm get-docker.sh
|
||||||
|
|
||||||
|
# Start Docker
|
||||||
|
sudo systemctl enable docker
|
||||||
|
sudo systemctl start docker
|
||||||
|
|
||||||
|
print_success "Docker installed. You may need to log out and log back in."
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_redis() {
|
||||||
|
print_info "Installing Redis..."
|
||||||
|
|
||||||
|
local os=$(detect_os)
|
||||||
|
|
||||||
|
if [[ "$os" == "macos" ]]; then
|
||||||
|
if command -v brew &> /dev/null; then
|
||||||
|
brew install redis
|
||||||
|
brew services start redis
|
||||||
|
else
|
||||||
|
print_error "Homebrew not found. Please install Redis manually."
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
elif [[ "$os" == "linux" ]]; then
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y redis-server
|
||||||
|
sudo systemctl enable redis-server
|
||||||
|
sudo systemctl start redis-server
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Redis installed and started"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_dependencies() {
|
||||||
|
print_info "Installing dependencies..."
|
||||||
|
|
||||||
|
local os=$(detect_os)
|
||||||
|
|
||||||
|
# Install basic tools
|
||||||
|
if [[ "$os" == "macos" ]]; then
|
||||||
|
if command -v brew &> /dev/null; then
|
||||||
|
brew install openssl curl jq
|
||||||
|
fi
|
||||||
|
elif [[ "$os" == "linux" ]]; then
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y openssl curl jq build-essential
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Install Go tools
|
||||||
|
if command -v go &> /dev/null; then
|
||||||
|
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
|
||||||
|
go install golang.org/x/tools/cmd/goimports@latest
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Dependencies installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_project() {
|
||||||
|
print_info "Setting up project..."
|
||||||
|
|
||||||
|
# Create directories
|
||||||
|
mkdir -p bin
|
||||||
|
mkdir -p data
|
||||||
|
mkdir -p logs
|
||||||
|
mkdir -p db
|
||||||
|
mkdir -p ssl
|
||||||
|
mkdir -p configs
|
||||||
|
mkdir -p scripts
|
||||||
|
|
||||||
|
# Build project
|
||||||
|
if command -v make &> /dev/null; then
|
||||||
|
make build
|
||||||
|
if command -v zig &> /dev/null; then
|
||||||
|
make cli-build
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
print_warning "Make not found, building manually..."
|
||||||
|
go build -o bin/worker ./cmd/worker
|
||||||
|
go build -o bin/tui ./cmd/tui
|
||||||
|
go build -o bin/data_manager ./cmd/data_manager
|
||||||
|
go build -o bin/user_manager ./cmd/user_manager
|
||||||
|
go build -o bin/api-server ./cmd/api-server
|
||||||
|
|
||||||
|
if command -v zig &> /dev/null; then
|
||||||
|
cd cli && zig build && cd ..
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Project setup completed"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_security() {
|
||||||
|
print_info "Setting up security features..."
|
||||||
|
|
||||||
|
# Generate SSL certificates
|
||||||
|
if command -v openssl &> /dev/null; then
|
||||||
|
openssl req -x509 -newkey rsa:4096 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||||
|
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=localhost" \
|
||||||
|
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || {
|
||||||
|
print_warning "Failed to generate SSL certificates"
|
||||||
|
}
|
||||||
|
print_success "SSL certificates generated"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Generate secure configuration
|
||||||
|
local redis_password=$(openssl rand -base64 32 2>/dev/null || echo "dev_redis_password_123")
|
||||||
|
local jwt_secret=$(openssl rand -base64 64 2>/dev/null || echo "dev_jwt_secret_1234567890123456789012345678901234567890123456789012345678901234")
|
||||||
|
|
||||||
|
cat > configs/security-config.yaml << EOF
|
||||||
|
base_path: "/data/ml-experiments"
|
||||||
|
|
||||||
|
auth:
|
||||||
|
enabled: true
|
||||||
|
api_keys:
|
||||||
|
test_user:
|
||||||
|
hash: "$(echo -n "dev_test_api_key_12345" | sha256sum | cut -d' ' -f1)"
|
||||||
|
admin: true
|
||||||
|
roles: ["data_scientist", "admin"]
|
||||||
|
permissions:
|
||||||
|
read: true
|
||||||
|
write: true
|
||||||
|
delete: true
|
||||||
|
|
||||||
|
server:
|
||||||
|
address: ":9101"
|
||||||
|
tls:
|
||||||
|
enabled: true
|
||||||
|
cert_file: "./ssl/cert.pem"
|
||||||
|
key_file: "./ssl/key.pem"
|
||||||
|
min_version: "1.3"
|
||||||
|
|
||||||
|
security:
|
||||||
|
rate_limit:
|
||||||
|
enabled: true
|
||||||
|
requests_per_minute: 60
|
||||||
|
burst_size: 10
|
||||||
|
ip_whitelist:
|
||||||
|
- "127.0.0.1"
|
||||||
|
- "::1"
|
||||||
|
- "10.0.0.0/8"
|
||||||
|
- "192.168.0.0/16"
|
||||||
|
- "172.16.0.0/12"
|
||||||
|
failed_login_lockout:
|
||||||
|
enabled: true
|
||||||
|
max_attempts: 5
|
||||||
|
lockout_duration: "15m"
|
||||||
|
|
||||||
|
redis:
|
||||||
|
url: "redis://localhost:6379"
|
||||||
|
password: "${redis_password}"
|
||||||
|
|
||||||
|
logging:
|
||||||
|
level: "info"
|
||||||
|
file: "logs/fetch_ml.log"
|
||||||
|
audit_log: "logs/audit.log"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
cat > .env.dev << EOF
|
||||||
|
# Development environment variables
|
||||||
|
REDIS_PASSWORD=${redis_password}
|
||||||
|
JWT_SECRET=${jwt_secret}
|
||||||
|
GRAFANA_USER=admin
|
||||||
|
GRAFANA_PASSWORD=$(openssl rand -base64 16 2>/dev/null || echo "dev_grafana_password")
|
||||||
|
EOF
|
||||||
|
|
||||||
|
print_success "Security configuration created"
|
||||||
|
}
|
||||||
|
|
||||||
|
test_installation() {
|
||||||
|
print_info "Testing installation..."
|
||||||
|
|
||||||
|
local tests_passed=0
|
||||||
|
local tests_total=0
|
||||||
|
|
||||||
|
# Test Go
|
||||||
|
tests_total=$((tests_total + 1))
|
||||||
|
if command -v go &> /dev/null; then
|
||||||
|
print_success "Go: Installed"
|
||||||
|
tests_passed=$((tests_passed + 1))
|
||||||
|
else
|
||||||
|
print_error "Go: Not found"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test Zig
|
||||||
|
tests_total=$((tests_total + 1))
|
||||||
|
if command -v zig &> /dev/null; then
|
||||||
|
print_success "Zig: Installed"
|
||||||
|
tests_passed=$((tests_passed + 1))
|
||||||
|
else
|
||||||
|
print_warning "Zig: Not found (optional)"
|
||||||
|
tests_total=$((tests_total - 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test Docker
|
||||||
|
tests_total=$((tests_total + 1))
|
||||||
|
if command -v docker &> /dev/null; then
|
||||||
|
print_success "Docker: Installed"
|
||||||
|
tests_passed=$((tests_passed + 1))
|
||||||
|
else
|
||||||
|
print_warning "Docker: Not found (optional)"
|
||||||
|
tests_total=$((tests_total - 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test Redis
|
||||||
|
tests_total=$((tests_total + 1))
|
||||||
|
if command -v redis-cli &> /dev/null; then
|
||||||
|
if redis-cli ping | grep -q "PONG"; then
|
||||||
|
print_success "Redis: Running"
|
||||||
|
tests_passed=$((tests_passed + 1))
|
||||||
|
else
|
||||||
|
print_warning "Redis: Not running"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
print_warning "Redis: Not found (optional)"
|
||||||
|
tests_total=$((tests_total - 1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test binaries
|
||||||
|
if [[ -f "bin/api-server" ]]; then
|
||||||
|
tests_total=$((tests_total + 1))
|
||||||
|
if ./bin/api-server --help > /dev/null 2>&1; then
|
||||||
|
print_success "API Server: Built"
|
||||||
|
tests_passed=$((tests_passed + 1))
|
||||||
|
else
|
||||||
|
print_error "API Server: Build failed"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ $tests_total -gt 0 ]]; then
|
||||||
|
local success_rate=$((tests_passed * 100 / tests_total))
|
||||||
|
print_info "Tests: $tests_passed/$tests_total passed ($success_rate%)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Installation testing completed"
|
||||||
|
}
|
||||||
|
|
||||||
|
show_next_steps() {
|
||||||
|
print_success "Automatic setup completed!"
|
||||||
|
echo
|
||||||
|
echo "Next Steps:"
|
||||||
|
echo "==========="
|
||||||
|
echo ""
|
||||||
|
echo "1. Load environment variables:"
|
||||||
|
echo " source .env.dev"
|
||||||
|
echo ""
|
||||||
|
echo "2. Start the API server:"
|
||||||
|
echo " ./bin/api-server -config configs/config.yaml"
|
||||||
|
echo ""
|
||||||
|
echo "3. Test the Zig CLI (if installed):"
|
||||||
|
echo " ./cli/zig-out/bin/ml --help"
|
||||||
|
echo ""
|
||||||
|
echo "4. Deploy with Docker (optional):"
|
||||||
|
echo " make docker-run"
|
||||||
|
echo ""
|
||||||
|
echo "5. Docker Compose deployment:"
|
||||||
|
echo " docker-compose up -d"
|
||||||
|
echo ""
|
||||||
|
echo "Configuration Files:"
|
||||||
|
echo " configs/config.yaml # Main configuration"
|
||||||
|
echo " configs/config_local.yaml # Local development"
|
||||||
|
echo " ssl/cert.pem, ssl/key.pem # TLS certificates"
|
||||||
|
echo ""
|
||||||
|
echo "Documentation:"
|
||||||
|
echo " docs/DEPLOYMENT.md # Deployment guide"
|
||||||
|
echo ""
|
||||||
|
echo "Quick Commands:"
|
||||||
|
echo " make help # Show all commands"
|
||||||
|
echo " make test # Run tests"
|
||||||
|
echo " docker-compose up -d # Start services"
|
||||||
|
echo ""
|
||||||
|
print_success "Ready to use ML Experiment Manager!"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main setup function
|
||||||
|
main() {
|
||||||
|
echo "ML Experiment Manager Automatic Setup"
|
||||||
|
echo "====================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
print_info "Starting automatic setup..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check and install dependencies
|
||||||
|
if ! command -v go &> /dev/null; then
|
||||||
|
print_info "Go not found, installing..."
|
||||||
|
install_go
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v zig &> /dev/null; then
|
||||||
|
print_info "Zig not found, installing..."
|
||||||
|
install_zig
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v docker &> /dev/null; then
|
||||||
|
print_info "Docker not found, installing..."
|
||||||
|
install_docker
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v redis-cli &> /dev/null; then
|
||||||
|
print_info "Redis not found, installing..."
|
||||||
|
install_redis
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Install additional dependencies
|
||||||
|
install_dependencies
|
||||||
|
|
||||||
|
# Setup project
|
||||||
|
setup_project
|
||||||
|
|
||||||
|
# Setup security
|
||||||
|
setup_security
|
||||||
|
|
||||||
|
# Test installation
|
||||||
|
test_installation
|
||||||
|
|
||||||
|
# Show next steps
|
||||||
|
show_next_steps
|
||||||
|
}
|
||||||
|
|
||||||
|
# Handle command line arguments
|
||||||
|
case "${1:-setup}" in
|
||||||
|
"setup")
|
||||||
|
main
|
||||||
|
;;
|
||||||
|
"deps")
|
||||||
|
install_dependencies
|
||||||
|
;;
|
||||||
|
"test")
|
||||||
|
test_installation
|
||||||
|
;;
|
||||||
|
"help"|"-h"|"--help")
|
||||||
|
echo "Automatic Setup Script"
|
||||||
|
echo "Usage: $0 {setup|deps|test|help}"
|
||||||
|
echo ""
|
||||||
|
echo "Commands:"
|
||||||
|
echo " setup - Run full automatic setup"
|
||||||
|
echo " deps - Install dependencies only"
|
||||||
|
echo " test - Test installation"
|
||||||
|
echo " help - Show this help"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
print_error "Unknown command: $1"
|
||||||
|
echo "Use '$0 help' for usage information"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
314
scripts/legacy/quick_start.sh
Executable file
314
scripts/legacy/quick_start.sh
Executable file
|
|
@ -0,0 +1,314 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Fetch ML Quick Start Script with Security
|
||||||
|
# Sets up development environment with security features and creates test user
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
print_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
check_prerequisites() {
|
||||||
|
print_info "Checking prerequisites..."
|
||||||
|
|
||||||
|
# Check Go
|
||||||
|
if ! command -v go &> /dev/null; then
|
||||||
|
print_error "Go is not installed. Please install Go 1.25 or later."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||||
|
print_info "Go version: $go_version"
|
||||||
|
|
||||||
|
# Check Zig
|
||||||
|
if ! command -v zig &> /dev/null; then
|
||||||
|
print_warning "Zig is not installed. CLI features will not be available."
|
||||||
|
else
|
||||||
|
local zig_version=$(zig version)
|
||||||
|
print_info "Zig version: $zig_version"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Docker
|
||||||
|
if ! command -v docker &> /dev/null; then
|
||||||
|
print_warning "Docker is not installed. Container features will not work."
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Redis
|
||||||
|
if ! command -v redis-server &> /dev/null && ! command -v redis-cli &> /dev/null; then
|
||||||
|
print_warning "Redis is not installed. Starting local Redis..."
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check OpenSSL for certificates
|
||||||
|
if ! command -v openssl &> /dev/null; then
|
||||||
|
print_warning "OpenSSL is not installed. TLS certificates will not be generated."
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Prerequisites checked"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_project() {
|
||||||
|
print_info "Setting up Fetch ML project..."
|
||||||
|
|
||||||
|
# Create directories
|
||||||
|
mkdir -p bin
|
||||||
|
mkdir -p data
|
||||||
|
mkdir -p logs
|
||||||
|
mkdir -p db
|
||||||
|
mkdir -p ssl
|
||||||
|
mkdir -p configs
|
||||||
|
|
||||||
|
print_success "Project directories created"
|
||||||
|
}
|
||||||
|
|
||||||
|
build_project() {
|
||||||
|
print_info "Building Fetch ML..."
|
||||||
|
|
||||||
|
# Build Go binaries
|
||||||
|
make build
|
||||||
|
|
||||||
|
# Build Zig CLI if available
|
||||||
|
if command -v zig &> /dev/null; then
|
||||||
|
make cli-build
|
||||||
|
print_success "Zig CLI built"
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Build completed"
|
||||||
|
}
|
||||||
|
|
||||||
|
generate_ssl_certificates() {
|
||||||
|
print_info "Generating SSL certificates..."
|
||||||
|
|
||||||
|
if command -v openssl &> /dev/null; then
|
||||||
|
# Generate self-signed certificate for development
|
||||||
|
openssl req -x509 -newkey rsa:4096 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||||
|
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=localhost" \
|
||||||
|
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || {
|
||||||
|
print_warning "Failed to generate SSL certificates"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success "SSL certificates generated in ssl/"
|
||||||
|
print_info "Certificates are self-signed (development only)"
|
||||||
|
else
|
||||||
|
print_warning "OpenSSL not available, skipping SSL certificates"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_redis() {
|
||||||
|
print_info "Setting up Redis..."
|
||||||
|
|
||||||
|
if command -v redis-server &> /dev/null; then
|
||||||
|
if ! pgrep -f "redis-server" > /dev/null; then
|
||||||
|
redis-server --daemonize yes --port 6379
|
||||||
|
print_success "Redis started"
|
||||||
|
else
|
||||||
|
print_info "Redis already running"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
print_warning "Redis not available, some features may be limited"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
create_secure_config() {
|
||||||
|
print_info "Creating secure development configuration..."
|
||||||
|
|
||||||
|
# Generate secure passwords and secrets
|
||||||
|
local redis_password=$(openssl rand -base64 32 2>/dev/null || echo "dev_redis_password_123")
|
||||||
|
local jwt_secret=$(openssl rand -base64 64 2>/dev/null || echo "dev_jwt_secret_1234567890123456789012345678901234567890123456789012345678901234")
|
||||||
|
|
||||||
|
# Create development config
|
||||||
|
cat > configs/config.yaml << EOF
|
||||||
|
base_path: "/data/ml-experiments"
|
||||||
|
|
||||||
|
auth:
|
||||||
|
enabled: true
|
||||||
|
api_keys:
|
||||||
|
test_user:
|
||||||
|
hash: "$(echo -n "dev_test_api_key_12345" | sha256sum | cut -d' ' -f1)"
|
||||||
|
admin: true
|
||||||
|
roles: ["data_scientist", "admin"]
|
||||||
|
permissions:
|
||||||
|
read: true
|
||||||
|
write: true
|
||||||
|
delete: true
|
||||||
|
|
||||||
|
server:
|
||||||
|
address: ":9101"
|
||||||
|
tls:
|
||||||
|
enabled: true
|
||||||
|
cert_file: "./ssl/cert.pem"
|
||||||
|
key_file: "./ssl/key.pem"
|
||||||
|
min_version: "1.3"
|
||||||
|
|
||||||
|
security:
|
||||||
|
rate_limit:
|
||||||
|
enabled: true
|
||||||
|
requests_per_minute: 60
|
||||||
|
burst_size: 10
|
||||||
|
ip_whitelist:
|
||||||
|
- "127.0.0.1"
|
||||||
|
- "::1"
|
||||||
|
- "10.0.0.0/8"
|
||||||
|
- "192.168.0.0/16"
|
||||||
|
- "172.16.0.0/12"
|
||||||
|
failed_login_lockout:
|
||||||
|
enabled: true
|
||||||
|
max_attempts: 5
|
||||||
|
lockout_duration: "15m"
|
||||||
|
|
||||||
|
redis:
|
||||||
|
url: "redis://localhost:6379"
|
||||||
|
password: "${redis_password}"
|
||||||
|
|
||||||
|
logging:
|
||||||
|
level: "info"
|
||||||
|
file: "logs/fetch_ml.log"
|
||||||
|
audit_log: "logs/audit.log"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Create environment file
|
||||||
|
cat > .env.dev << EOF
|
||||||
|
# Development environment variables
|
||||||
|
REDIS_PASSWORD=${redis_password}
|
||||||
|
JWT_SECRET=${jwt_secret}
|
||||||
|
GRAFANA_USER=admin
|
||||||
|
GRAFANA_PASSWORD=$(openssl rand -base64 16 2>/dev/null || echo "dev_grafana_password")
|
||||||
|
EOF
|
||||||
|
|
||||||
|
print_success "Secure configuration created"
|
||||||
|
print_warning "Using development certificates and passwords"
|
||||||
|
}
|
||||||
|
|
||||||
|
create_test_user() {
|
||||||
|
print_info "Creating test user..."
|
||||||
|
|
||||||
|
# Generate API key for test user
|
||||||
|
local api_key="dev_test_api_key_12345"
|
||||||
|
local api_key_hash=$(echo -n "$api_key" | sha256sum | cut -d' ' -f1)
|
||||||
|
|
||||||
|
print_success "Test user created successfully"
|
||||||
|
echo "Username: test_user"
|
||||||
|
echo "API Key: $api_key"
|
||||||
|
echo "API Key Hash: $api_key_hash"
|
||||||
|
echo "Store this key safely!"
|
||||||
|
echo ""
|
||||||
|
echo "Environment variables in .env.dev"
|
||||||
|
echo "Run: source .env.dev"
|
||||||
|
}
|
||||||
|
|
||||||
|
test_setup() {
|
||||||
|
print_info "Testing setup..."
|
||||||
|
|
||||||
|
# Test Go binaries
|
||||||
|
if [[ -f "bin/api-server" ]]; then
|
||||||
|
./bin/api-server --help > /dev/null 2>&1 || true
|
||||||
|
print_success "API server binary OK"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -f "bin/worker" ]]; then
|
||||||
|
./bin/worker --help > /dev/null 2>&1 || true
|
||||||
|
print_success "Worker binary OK"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test Zig CLI
|
||||||
|
if [[ -f "cli/zig-out/bin/ml" ]]; then
|
||||||
|
./cli/zig-out/bin/ml --help > /dev/null 2>&1 || true
|
||||||
|
print_success "Zig CLI binary OK"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test Redis connection
|
||||||
|
if command -v redis-cli &> /dev/null; then
|
||||||
|
if redis-cli ping > /dev/null 2>&1; then
|
||||||
|
print_success "Redis connection OK"
|
||||||
|
else
|
||||||
|
print_warning "Redis not responding"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test SSL certificates
|
||||||
|
if [[ -f "ssl/cert.pem" && -f "ssl/key.pem" ]]; then
|
||||||
|
if openssl x509 -in ssl/cert.pem -noout -checkend 86400 > /dev/null 2>&1; then
|
||||||
|
print_success "SSL certificates valid"
|
||||||
|
else
|
||||||
|
print_warning "SSL certificates expired or invalid"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
show_next_steps() {
|
||||||
|
print_success "Secure quick start completed!"
|
||||||
|
echo
|
||||||
|
echo "Next steps:"
|
||||||
|
echo "1. Load environment variables:"
|
||||||
|
echo " source .env.dev"
|
||||||
|
echo
|
||||||
|
echo "2. Start API server:"
|
||||||
|
echo " ./bin/api-server -config configs/config.yaml"
|
||||||
|
echo
|
||||||
|
echo "3. Test Zig CLI:"
|
||||||
|
echo " ./cli/zig-out/bin/ml --help"
|
||||||
|
echo
|
||||||
|
echo "4. Test with curl (HTTPS):"
|
||||||
|
echo " curl -k -H 'X-API-Key: dev_test_api_key_12345' https://localhost:9101/health"
|
||||||
|
echo
|
||||||
|
echo "5. Deploy with Docker:"
|
||||||
|
echo " docker-compose up -d"
|
||||||
|
echo
|
||||||
|
echo "Features Enabled:"
|
||||||
|
echo " ✅ HTTPS/TLS encryption"
|
||||||
|
echo " ✅ API key authentication"
|
||||||
|
echo " ✅ Rate limiting"
|
||||||
|
echo " ✅ IP whitelisting"
|
||||||
|
echo " ✅ Security headers"
|
||||||
|
echo " ✅ Audit logging"
|
||||||
|
echo
|
||||||
|
echo "Configuration Files:"
|
||||||
|
echo " configs/config.yaml # Main configuration"
|
||||||
|
echo " .env.dev # Environment variables"
|
||||||
|
echo " ssl/cert.pem, ssl/key.pem # TLS certificates"
|
||||||
|
echo
|
||||||
|
echo "Documentation:"
|
||||||
|
echo " docs/DEPLOYMENT.md # Deployment guide"
|
||||||
|
echo ""
|
||||||
|
print_success "Ready to run ML experiments!"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main function
|
||||||
|
main() {
|
||||||
|
echo "Fetch ML Quick Start Script (with Security & Zig CLI)"
|
||||||
|
echo "===================================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
check_prerequisites
|
||||||
|
setup_project
|
||||||
|
build_project
|
||||||
|
generate_ssl_certificates
|
||||||
|
setup_redis
|
||||||
|
create_secure_config
|
||||||
|
create_test_user
|
||||||
|
test_setup
|
||||||
|
show_next_steps
|
||||||
|
}
|
||||||
|
|
||||||
|
# Run main function
|
||||||
|
main "$@"
|
||||||
124
scripts/legacy/setup_common.sh
Executable file
124
scripts/legacy/setup_common.sh
Executable file
|
|
@ -0,0 +1,124 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Shared helper functions for Fetch ML setup scripts (Ubuntu/Rocky)
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
# Configuration defaults
|
||||||
|
FETCH_ML_USER="fetchml"
|
||||||
|
FETCH_ML_HOME="/opt/fetchml"
|
||||||
|
SERVICE_DIR="/etc/systemd/system"
|
||||||
|
LOG_DIR="/var/log/fetchml"
|
||||||
|
DATA_DIR="/var/lib/fetchml"
|
||||||
|
CONFIG_DIR="$FETCH_ML_HOME/configs"
|
||||||
|
|
||||||
|
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||||
|
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
|
||||||
|
log_warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
|
||||||
|
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||||
|
|
||||||
|
# Download file with checksum verification
|
||||||
|
# Args: url, checksum, dest
|
||||||
|
secure_download() {
|
||||||
|
local url="$1" checksum="$2" dest="$3"
|
||||||
|
curl -fsSL "$url" -o "$dest"
|
||||||
|
echo "$checksum $dest" | sha256sum --check --status || {
|
||||||
|
log_error "Checksum verification failed for $dest"
|
||||||
|
rm -f "$dest"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
cleanup_temp() {
|
||||||
|
if [[ -n "${TMP_FILES:-}" ]]; then
|
||||||
|
rm -f $TMP_FILES || true
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
trap cleanup_temp EXIT
|
||||||
|
|
||||||
|
ensure_user() {
|
||||||
|
if ! id "$FETCH_ML_USER" &>/dev/null; then
|
||||||
|
useradd -m -d "$FETCH_ML_HOME" -s /bin/bash "$FETCH_ML_USER"
|
||||||
|
fi
|
||||||
|
usermod -aG podman "$FETCH_ML_USER" || true
|
||||||
|
}
|
||||||
|
|
||||||
|
create_directories() {
|
||||||
|
mkdir -p "$FETCH_ML_HOME" "$LOG_DIR" "$DATA_DIR" "$FETCH_ML_HOME/bin" "$CONFIG_DIR"
|
||||||
|
chown -R "$FETCH_ML_USER":"$FETCH_ML_USER" "$FETCH_ML_HOME" "$LOG_DIR" "$DATA_DIR"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_systemd_service() {
|
||||||
|
local name="$1" exec="$2"
|
||||||
|
cat > "$SERVICE_DIR/${name}.service" <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Fetch ML ${name^} Service
|
||||||
|
After=network.target redis.service
|
||||||
|
Wants=redis.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$FETCH_ML_USER
|
||||||
|
Group=$FETCH_ML_USER
|
||||||
|
WorkingDirectory=$FETCH_ML_HOME
|
||||||
|
Environment=PATH=$FETCH_ML_HOME/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||||
|
ExecStart=$exec
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=fetch_ml_${name}
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_logrotate() {
|
||||||
|
cat > /etc/logrotate.d/fetch_ml <<'EOF'
|
||||||
|
/var/log/fetchml/*.log {
|
||||||
|
daily
|
||||||
|
missingok
|
||||||
|
rotate 14
|
||||||
|
compress
|
||||||
|
delaycompress
|
||||||
|
notifempty
|
||||||
|
create 0640 fetchml fetchml
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
hardening_steps() {
|
||||||
|
# Increase file limits
|
||||||
|
if ! grep -q fetchml /etc/security/limits.conf; then
|
||||||
|
cat >> /etc/security/limits.conf <<'EOF'
|
||||||
|
fetchml soft nofile 65536
|
||||||
|
fetchml hard nofile 65536
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Enable unattended security upgrades if available
|
||||||
|
if command -v apt-get &>/dev/null; then
|
||||||
|
apt-get install -y unattended-upgrades >/dev/null || true
|
||||||
|
elif command -v dnf &>/dev/null; then
|
||||||
|
dnf install -y dnf-automatic >/dev/null || true
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
selinux_guidance() {
|
||||||
|
if command -v getenforce &>/dev/null; then
|
||||||
|
local mode=$(getenforce)
|
||||||
|
log_info "SELinux mode: $mode"
|
||||||
|
if [[ "$mode" == "Enforcing" ]]; then
|
||||||
|
log_info "Ensure systemd units and directories have proper contexts. Example:"
|
||||||
|
echo " semanage fcontext -a -t bin_t '$FETCH_ML_HOME/bin(/.*)?'"
|
||||||
|
echo " restorecon -Rv $FETCH_ML_HOME/bin"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
}
|
||||||
417
scripts/legacy/setup_rocky.sh
Executable file
417
scripts/legacy/setup_rocky.sh
Executable file
|
|
@ -0,0 +1,417 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Fetch ML Rocky Linux Setup Script
|
||||||
|
# Optimized for ML experiments on Rocky Linux 8/9
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# shellcheck source=scripts/setup_common.sh
|
||||||
|
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
|
||||||
|
source "$SCRIPT_DIR/setup_common.sh"
|
||||||
|
|
||||||
|
check_root() {
|
||||||
|
if [[ $EUID -ne 0 ]]; then
|
||||||
|
log_error "This script must be run as root"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_rocky() {
|
||||||
|
if ! command -v dnf &> /dev/null && ! command -v yum &> /dev/null; then
|
||||||
|
log_error "This script is designed for Rocky Linux systems"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local rocky_version=$(cat /etc/rocky-release | grep -oE '[0-9]+\.[0-9]+')
|
||||||
|
log_info "Rocky Linux version: $rocky_version"
|
||||||
|
|
||||||
|
# Use dnf for Rocky 9+, yum for Rocky 8
|
||||||
|
if command -v dnf &> /dev/null; then
|
||||||
|
PKG_MANAGER="dnf"
|
||||||
|
else
|
||||||
|
PKG_MANAGER="yum"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
update_system() {
|
||||||
|
log_info "Updating system packages..."
|
||||||
|
$PKG_MANAGER update -y
|
||||||
|
$PKG_MANAGER upgrade -y
|
||||||
|
$PKG_MANAGER install -y curl wget gnupg2
|
||||||
|
}
|
||||||
|
|
||||||
|
enable_epel() {
|
||||||
|
log_info "Enabling EPEL repository..."
|
||||||
|
|
||||||
|
if $PKG_MANAGER repolist | grep -q "epel"; then
|
||||||
|
log_info "EPEL already enabled"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
$PKG_MANAGER install -y epel-release
|
||||||
|
$PKG_MANAGER config-manager --set-enabled powertools
|
||||||
|
|
||||||
|
log_success "EPEL repository enabled"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_go() {
|
||||||
|
log_info "Installing Go 1.25..."
|
||||||
|
|
||||||
|
if command -v go &> /dev/null; then
|
||||||
|
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||||
|
log_info "Go already installed: $go_version"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd /tmp
|
||||||
|
TMP_FILES="/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||||
|
secure_download "https://go.dev/dl/go1.25.0.linux-amd64.tar.gz" "b5b98c784d53115553848114fd3c74e565643b4e4c8e8db0c3bea3478fd8c345" "/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||||
|
tar -C /usr/local -xzf go1.25.0.linux-amd64.tar.gz
|
||||||
|
|
||||||
|
# Add to PATH
|
||||||
|
echo 'export PATH=$PATH:/usr/local/go/bin' >> /etc/profile
|
||||||
|
echo 'export PATH=$PATH:$HOME/go/bin' >> /etc/profile
|
||||||
|
export PATH=$PATH:/usr/local/go/bin
|
||||||
|
|
||||||
|
log_success "Go 1.25 installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_podman() {
|
||||||
|
log_info "Installing Podman..."
|
||||||
|
|
||||||
|
if command -v podman &> /dev/null; then
|
||||||
|
log_info "Podman already installed"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Install Podman and related tools
|
||||||
|
$PKG_MANAGER install -y podman podman-compose containernetworking-plugins
|
||||||
|
|
||||||
|
# Configure Podman
|
||||||
|
mkdir -p /etc/containers
|
||||||
|
cat > /etc/containers/containers.conf << EOF
|
||||||
|
[containers]
|
||||||
|
user_namespace_enable = 1
|
||||||
|
runtime = "crun"
|
||||||
|
|
||||||
|
[network]
|
||||||
|
network_backend = "netavark"
|
||||||
|
|
||||||
|
[engine]
|
||||||
|
cgroup_manager = "systemd"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Enable user namespaces
|
||||||
|
echo "user.max_user_namespaces=15000" >> /etc/sysctl.conf
|
||||||
|
sysctl -p user.max_user_namespaces=15000
|
||||||
|
|
||||||
|
log_success "Podman installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_redis() {
|
||||||
|
log_info "Installing Redis..."
|
||||||
|
|
||||||
|
if command -v redis-server &> /dev/null; then
|
||||||
|
log_info "Redis already installed"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
$PKG_MANAGER install -y redis
|
||||||
|
|
||||||
|
# Configure Redis for production
|
||||||
|
sed -i 's/supervised no/supervised systemd/' /etc/redis.conf
|
||||||
|
sed -i 's/bind 127.0.0.1 ::1/bind 127.0.0.1/' /etc/redis.conf
|
||||||
|
|
||||||
|
systemctl enable redis
|
||||||
|
systemctl start redis
|
||||||
|
|
||||||
|
log_success "Redis installed and configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_nvidia_drivers() {
|
||||||
|
log_info "Checking for NVIDIA GPU..."
|
||||||
|
|
||||||
|
if command -v nvidia-smi &> /dev/null; then
|
||||||
|
log_info "NVIDIA drivers already installed"
|
||||||
|
nvidia-smi
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
if lspci | grep -i nvidia &> /dev/null; then
|
||||||
|
log_info "NVIDIA GPU detected, installing drivers..."
|
||||||
|
|
||||||
|
# Enable NVIDIA repository
|
||||||
|
$PKG_MANAGER config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel$(rpm -E %rhel)/x86_64/cuda-rhel.repo
|
||||||
|
|
||||||
|
# Clean and install
|
||||||
|
$PKG_MANAGER clean all
|
||||||
|
$PKG_MANAGER module enable -y nvidia-driver:latest-dkms
|
||||||
|
$PKG_MANAGER install -y nvidia-driver nvidia-cuda-toolkit
|
||||||
|
|
||||||
|
# Configure Podman for NVIDIA (only if needed)
|
||||||
|
if ! podman run --rm --device nvidia.com/gpu=all alpine echo "NVIDIA GPU access configured" 2>/dev/null; then
|
||||||
|
log_warning "NVIDIA GPU access test failed, you may need to reboot"
|
||||||
|
else
|
||||||
|
log_success "NVIDIA drivers installed and GPU access verified"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Reboot required
|
||||||
|
log_warning "System reboot required for NVIDIA drivers"
|
||||||
|
log_info "Run: reboot"
|
||||||
|
else
|
||||||
|
log_info "No NVIDIA GPU detected, skipping driver installation"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_ml_tools() {
|
||||||
|
log_info "Installing ML tools and dependencies..."
|
||||||
|
|
||||||
|
# Python and ML packages
|
||||||
|
$PKG_MANAGER install -y python3 python3-pip python3-devel
|
||||||
|
|
||||||
|
# System dependencies for ML
|
||||||
|
$PKG_MANAGER groupinstall -y "Development Tools"
|
||||||
|
$PKG_MANAGER install -y cmake git pkgconfig
|
||||||
|
$PKG_MANAGER install -y libjpeg-turbo-devel libpng-devel libtiff-devel
|
||||||
|
$PKG_MANAGER install -y mesa-libGL-devel mesa-libGLU-devel
|
||||||
|
$PKG_MANAGER install -y gtk3-devel
|
||||||
|
$PKG_MANAGER install -y atlas-devel blas-devel lapack-devel
|
||||||
|
|
||||||
|
# Install common ML libraries
|
||||||
|
pip3 install --upgrade pip
|
||||||
|
pip3 install numpy scipy scikit-learn pandas
|
||||||
|
pip3 install jupyter matplotlib seaborn
|
||||||
|
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||||
|
|
||||||
|
log_success "ML tools installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
create_user() {
|
||||||
|
log_info "Creating fetchml user..."
|
||||||
|
|
||||||
|
if id "$FETCH_ML_USER" &>/dev/null; then
|
||||||
|
log_info "User $FETCH_ML_USER already exists"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
useradd -m -d $FETCH_ML_HOME -s /bin/bash $FETCH_ML_USER
|
||||||
|
usermod -aG podman $FETCH_ML_USER
|
||||||
|
|
||||||
|
# Create directories
|
||||||
|
mkdir -p $FETCH_ML_HOME/.config/containers
|
||||||
|
mkdir -p $FETCH_ML_HOME/go/bin
|
||||||
|
mkdir -p $LOG_DIR
|
||||||
|
mkdir -p $DATA_DIR
|
||||||
|
|
||||||
|
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||||
|
chown -R $FETCH_ML_USER:$FETCH_ML_USER $LOG_DIR
|
||||||
|
chown -R $FETCH_ML_USER:$FETCH_ML_USER $DATA_DIR
|
||||||
|
|
||||||
|
log_success "User $FETCH_ML_USER created"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_firewall() {
|
||||||
|
log_info "Configuring firewall..."
|
||||||
|
|
||||||
|
if command -v firewall-cmd &> /dev/null; then
|
||||||
|
systemctl enable firewalld
|
||||||
|
systemctl start firewalld
|
||||||
|
|
||||||
|
firewall-cmd --permanent --add-service=ssh
|
||||||
|
firewall-cmd --permanent --add-port=8080/tcp # Worker API
|
||||||
|
firewall-cmd --permanent --add-port=8081/tcp # Data manager API
|
||||||
|
firewall-cmd --permanent --add-port=6379/tcp # Redis
|
||||||
|
firewall-cmd --reload
|
||||||
|
|
||||||
|
firewall-cmd --list-all
|
||||||
|
else
|
||||||
|
log_warning "Firewalld not available, skipping firewall configuration"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_systemd_services() {
|
||||||
|
log_info "Setting up systemd services..."
|
||||||
|
|
||||||
|
# Fetch ML Worker service
|
||||||
|
cat > $SERVICE_DIR/fetch_ml_worker.service << EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Fetch ML Worker Service
|
||||||
|
After=network.target redis.service
|
||||||
|
Wants=redis.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$FETCH_ML_USER
|
||||||
|
Group=$FETCH_ML_USER
|
||||||
|
WorkingDirectory=$FETCH_ML_HOME
|
||||||
|
Environment=FETCH_ML_HOME=$FETCH_ML_HOME
|
||||||
|
Environment=PATH=$FETCH_ML_HOME/go/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||||
|
ExecStart=$FETCH_ML_HOME/bin/worker --config $FETCH_ML_HOME/configs/config-local.yaml
|
||||||
|
Restart=always
|
||||||
|
RestartSec=5
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=fetch_ml_worker
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Fetch ML Data Manager service
|
||||||
|
cat > $SERVICE_DIR/fetch_ml_data_manager.service << EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Fetch ML Data Manager Service
|
||||||
|
After=network.target redis.service
|
||||||
|
Wants=redis.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$FETCH_ML_USER
|
||||||
|
Group=$FETCH_ML_USER
|
||||||
|
WorkingDirectory=$FETCH_ML_HOME
|
||||||
|
Environment=FETCH_ML_HOME=$FETCH_ML_HOME
|
||||||
|
Environment=PATH=$FETCH_ML_HOME/go/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||||
|
ExecStart=$FETCH_ML_HOME/bin/data_manager --config $FETCH_ML_HOME/configs/config-local.yaml
|
||||||
|
Restart=always
|
||||||
|
RestartSec=5
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=fetch_ml_data_manager
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Enable services
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable fetch_ml_worker
|
||||||
|
systemctl enable fetch_ml_data_manager
|
||||||
|
|
||||||
|
log_success "Systemd services configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_log_rotation() {
|
||||||
|
log_info "Setting up log rotation..."
|
||||||
|
|
||||||
|
cat > /etc/logrotate.d/fetch_ml << EOF
|
||||||
|
$LOG_DIR/*.log {
|
||||||
|
daily
|
||||||
|
missingok
|
||||||
|
rotate 30
|
||||||
|
compress
|
||||||
|
delaycompress
|
||||||
|
notifempty
|
||||||
|
create 0644 $FETCH_ML_USER $FETCH_ML_USER
|
||||||
|
postrotate
|
||||||
|
systemctl reload fetch_ml_worker || true
|
||||||
|
systemctl reload fetch_ml_data_manager || true
|
||||||
|
endscript
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
log_success "Log rotation configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
optimize_system() {
|
||||||
|
log_info "Optimizing system for ML workloads..."
|
||||||
|
|
||||||
|
# Increase file limits
|
||||||
|
echo "* soft nofile 65536" >> /etc/security/limits.conf
|
||||||
|
echo "* hard nofile 65536" >> /etc/security/limits.conf
|
||||||
|
|
||||||
|
# Optimize kernel parameters for ML
|
||||||
|
cat >> /etc/sysctl.conf << EOF
|
||||||
|
# ML Optimization
|
||||||
|
net.core.rmem_max = 134217728
|
||||||
|
net.core.wmem_max = 134217728
|
||||||
|
vm.swappiness = 10
|
||||||
|
vm.dirty_ratio = 15
|
||||||
|
vm.dirty_background_ratio = 5
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sysctl -p
|
||||||
|
|
||||||
|
# Configure GPU persistence mode if NVIDIA available
|
||||||
|
if command -v nvidia-smi &> /dev/null; then
|
||||||
|
nvidia-smi -pm 1 || log_warning "Could not enable GPU persistence mode"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Disable SELinux for better container compatibility (optional)
|
||||||
|
if [[ -f /etc/selinux/config ]]; then
|
||||||
|
log_warning "Consider setting SELinux to permissive mode for better container compatibility"
|
||||||
|
log_info "Edit /etc/selinux/config and set SELINUX=permissive"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "System optimized for ML workloads"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_fetch_ml() {
|
||||||
|
log_info "Installing Fetch ML..."
|
||||||
|
|
||||||
|
# Clone or copy Fetch ML
|
||||||
|
cd $FETCH_ML_HOME
|
||||||
|
|
||||||
|
if [[ ! -d "fetch_ml" ]]; then
|
||||||
|
log_warning "Please clone Fetch ML repository manually to $FETCH_ML_HOME/fetch_ml"
|
||||||
|
log_info "Example: git clone https://github.com/your-org/fetch_ml.git"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd fetch_ml
|
||||||
|
|
||||||
|
# Build
|
||||||
|
export PATH=$PATH:/usr/local/go/bin
|
||||||
|
make build
|
||||||
|
|
||||||
|
# Copy binaries
|
||||||
|
cp bin/* $FETCH_ML_HOME/bin/
|
||||||
|
chmod +x $FETCH_ML_HOME/bin/*
|
||||||
|
|
||||||
|
# Copy configs
|
||||||
|
mkdir -p $FETCH_ML_HOME/configs
|
||||||
|
cp configs/config-local.yaml.example $FETCH_ML_HOME/configs/config-local.yaml
|
||||||
|
|
||||||
|
# Set permissions
|
||||||
|
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||||
|
|
||||||
|
log_success "Fetch ML installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
main() {
|
||||||
|
log_info "Starting Fetch ML Rocky Linux server setup..."
|
||||||
|
|
||||||
|
check_root
|
||||||
|
check_rocky
|
||||||
|
|
||||||
|
update_system
|
||||||
|
enable_epel
|
||||||
|
install_go
|
||||||
|
install_podman
|
||||||
|
install_redis
|
||||||
|
install_nvidia_drivers
|
||||||
|
install_ml_tools
|
||||||
|
ensure_user
|
||||||
|
create_directories
|
||||||
|
setup_firewall
|
||||||
|
setup_systemd_services
|
||||||
|
setup_logrotate
|
||||||
|
hardening_steps
|
||||||
|
selinux_guidance
|
||||||
|
install_fetch_ml
|
||||||
|
|
||||||
|
log_success "Fetch ML setup complete!"
|
||||||
|
echo
|
||||||
|
log_info "Next steps:"
|
||||||
|
echo "1. Clone Fetch ML repository: git clone https://github.com/your-org/fetch_ml.git $FETCH_ML_HOME/fetch_ml"
|
||||||
|
echo "2. Configure: $FETCH_ML_HOME/configs/config-local.yaml"
|
||||||
|
echo "3. Start services: systemctl start fetch_ml_worker fetch_ml_data_manager"
|
||||||
|
echo "4. Check status: systemctl status fetch_ml_worker fetch_ml_data_manager"
|
||||||
|
echo "5. View logs: journalctl -u fetch_ml_worker -f"
|
||||||
|
echo
|
||||||
|
log_info "Services will be available at:"
|
||||||
|
echo "- Worker API: http://$(hostname -I | awk '{print $1}'):8080"
|
||||||
|
echo "- Data Manager: http://$(hostname -I | awk '{print $1}'):8081"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Run main function
|
||||||
|
main "$@"
|
||||||
294
scripts/legacy/setup_ubuntu.sh
Executable file
294
scripts/legacy/setup_ubuntu.sh
Executable file
|
|
@ -0,0 +1,294 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Fetch ML Ubuntu Server Setup Script
|
||||||
|
# Optimized for ML experiments on Ubuntu 20.04/22.04
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# shellcheck source=scripts/setup_common.sh
|
||||||
|
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
|
||||||
|
source "$SCRIPT_DIR/setup_common.sh"
|
||||||
|
|
||||||
|
check_root() {
|
||||||
|
if [[ $EUID -ne 0 ]]; then
|
||||||
|
log_error "This script must be run as root"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_ubuntu() {
|
||||||
|
if ! command -v apt-get &> /dev/null; then
|
||||||
|
log_error "This script is designed for Ubuntu systems"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local ubuntu_version=$(lsb_release -rs)
|
||||||
|
log_info "Ubuntu version: $ubuntu_version"
|
||||||
|
|
||||||
|
if (( $(echo "$ubuntu_version < 20.04" | bc -l) == 1 )); then
|
||||||
|
log_warning "Ubuntu version < 20.04 may not support all features"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
update_system() {
|
||||||
|
log_info "Updating system packages..."
|
||||||
|
apt-get update -y
|
||||||
|
apt-get upgrade -y
|
||||||
|
apt-get install -y curl wget gnupg lsb-release software-properties-common
|
||||||
|
}
|
||||||
|
|
||||||
|
install_go() {
|
||||||
|
log_info "Installing Go 1.25..."
|
||||||
|
|
||||||
|
if command -v go &> /dev/null; then
|
||||||
|
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||||
|
log_info "Go already installed: $go_version"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd /tmp
|
||||||
|
TMP_FILES="/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||||
|
secure_download "https://go.dev/dl/go1.25.0.linux-amd64.tar.gz" "b5b98c784d53115553848114fd3c74e565643b4e4c8e8db0c3bea3478fd8c345" "/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||||
|
tar -C /usr/local -xzf go1.25.0.linux-amd64.tar.gz
|
||||||
|
|
||||||
|
# Add to PATH
|
||||||
|
echo 'export PATH=$PATH:/usr/local/go/bin' >> /etc/profile
|
||||||
|
echo 'export PATH=$PATH:$HOME/go/bin' >> /etc/profile
|
||||||
|
export PATH=$PATH:/usr/local/go/bin
|
||||||
|
|
||||||
|
log_success "Go 1.25 installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_podman() {
|
||||||
|
log_info "Installing Podman..."
|
||||||
|
|
||||||
|
if command -v podman &> /dev/null; then
|
||||||
|
log_info "Podman already installed"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Add official Podman repository
|
||||||
|
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_$(lsb_release -rs)/ /" | tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
|
||||||
|
curl -L "https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_$(lsb_release -rs)/Release.key" | apt-key add -
|
||||||
|
|
||||||
|
apt-get update -y
|
||||||
|
apt-get install -y podman podman-compose
|
||||||
|
|
||||||
|
# Configure Podman for rootless operation
|
||||||
|
echo "user_namespace_enable = 1" >> /etc/containers/containers.conf
|
||||||
|
echo "runtime = \"crun\"" >> /etc/containers/containers.conf
|
||||||
|
|
||||||
|
log_success "Podman installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_redis() {
|
||||||
|
log_info "Installing Redis..."
|
||||||
|
|
||||||
|
if command -v redis-server &> /dev/null; then
|
||||||
|
log_info "Redis already installed"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt-get install -y redis-server
|
||||||
|
|
||||||
|
# Configure Redis for production
|
||||||
|
sed -i 's/supervised no/supervised systemd/' /etc/redis/redis.conf
|
||||||
|
sed -i 's/bind 127.0.0.1 ::1/bind 127.0.0.1/' /etc/redis/redis.conf
|
||||||
|
|
||||||
|
systemctl enable redis-server
|
||||||
|
systemctl start redis-server
|
||||||
|
|
||||||
|
log_success "Redis installed and configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_nvidia_drivers() {
|
||||||
|
log_info "Checking for NVIDIA GPU..."
|
||||||
|
|
||||||
|
if command -v nvidia-smi &> /dev/null; then
|
||||||
|
log_info "NVIDIA drivers already installed"
|
||||||
|
nvidia-smi
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
if lspci | grep -i nvidia &> /dev/null; then
|
||||||
|
log_info "NVIDIA GPU detected, installing drivers..."
|
||||||
|
|
||||||
|
# Add NVIDIA repository
|
||||||
|
TMP_FILES="/tmp/cuda-keyring_1.1-1_all.deb"
|
||||||
|
secure_download "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu$(lsb_release -rs | cut -d. -f1)/x86_64/cuda-keyring_1.1-1_all.deb" "cfa6b4109e7e3d9be060a016b7dc07e8edcd5356c0eabcc0c537a76e6c603d76" "/tmp/cuda-keyring_1.1-1_all.deb"
|
||||||
|
dpkg -i /tmp/cuda-keyring_1.1-1_all.deb
|
||||||
|
apt-get update -y
|
||||||
|
|
||||||
|
# Install drivers
|
||||||
|
apt-get install -y nvidia-driver-535 nvidia-cuda-toolkit
|
||||||
|
|
||||||
|
# Configure Podman for NVIDIA (only if needed)
|
||||||
|
if ! podman run --rm --device nvidia.com/gpu=all alpine echo "NVIDIA GPU access configured" 2>/dev/null; then
|
||||||
|
log_warning "NVIDIA GPU access test failed, you may need to reboot"
|
||||||
|
else
|
||||||
|
log_success "NVIDIA drivers installed and GPU access verified"
|
||||||
|
fi
|
||||||
|
|
||||||
|
else
|
||||||
|
log_info "No NVIDIA GPU detected, skipping driver installation"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_ml_tools() {
|
||||||
|
log_info "Installing ML tools and dependencies..."
|
||||||
|
|
||||||
|
# Python and ML packages
|
||||||
|
apt-get install -y python3 python3-pip python3-venv
|
||||||
|
|
||||||
|
# System dependencies for ML
|
||||||
|
apt-get install -y build-essential cmake git pkg-config
|
||||||
|
apt-get install -y libjpeg-dev libpng-dev libtiff-dev
|
||||||
|
apt-get install -y libavcodec-dev libavformat-dev libswscale-dev
|
||||||
|
apt-get install -y libgtk2.0-dev libcanberra-gtk-module
|
||||||
|
apt-get install -y libxvidcore-dev libx264-dev
|
||||||
|
apt-get install -y libatlas-base-dev gfortran
|
||||||
|
|
||||||
|
# Install common ML libraries
|
||||||
|
pip3 install --upgrade pip
|
||||||
|
pip3 install numpy scipy scikit-learn pandas
|
||||||
|
pip3 install jupyter matplotlib seaborn
|
||||||
|
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||||
|
|
||||||
|
log_success "ML tools installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
create_user() {
|
||||||
|
log_info "Creating fetchml user..."
|
||||||
|
ensure_user
|
||||||
|
create_directories
|
||||||
|
log_success "User $FETCH_ML_USER and directories created"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_firewall() {
|
||||||
|
log_info "Configuring firewall..."
|
||||||
|
|
||||||
|
if command -v ufw &> /dev/null; then
|
||||||
|
ufw --force enable
|
||||||
|
ufw allow ssh
|
||||||
|
ufw allow 8080/tcp # Worker API
|
||||||
|
ufw allow 8081/tcp # Data manager API
|
||||||
|
ufw allow 6379/tcp # Redis
|
||||||
|
ufw status
|
||||||
|
else
|
||||||
|
log_warning "UFW not available, skipping firewall configuration"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_systemd_services() {
|
||||||
|
log_info "Setting up systemd services..."
|
||||||
|
|
||||||
|
setup_systemd_service "fetch_ml_worker" "$FETCH_ML_HOME/bin/worker --config $FETCH_ML_HOME/configs/config-local.yaml"
|
||||||
|
setup_systemd_service "fetch_ml_data_manager" "$FETCH_ML_HOME/bin/data_manager --config $FETCH_ML_HOME/configs/config-local.yaml"
|
||||||
|
|
||||||
|
# Enable services
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable fetch_ml_worker
|
||||||
|
systemctl enable fetch_ml_data_manager
|
||||||
|
|
||||||
|
log_success "Systemd services configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_log_rotation() {
|
||||||
|
log_info "Setting up log rotation..."
|
||||||
|
setup_logrotate
|
||||||
|
log_success "Log rotation configured"
|
||||||
|
}
|
||||||
|
|
||||||
|
optimize_system() {
|
||||||
|
log_info "Optimizing system for ML workloads..."
|
||||||
|
hardening_steps
|
||||||
|
|
||||||
|
# Optimize kernel parameters for ML
|
||||||
|
cat >> /etc/sysctl.conf << EOF
|
||||||
|
# ML Optimization
|
||||||
|
net.core.rmem_max = 134217728
|
||||||
|
net.core.wmem_max = 134217728
|
||||||
|
vm.swappiness = 10
|
||||||
|
vm.dirty_ratio = 15
|
||||||
|
vm.dirty_background_ratio = 5
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sysctl -p
|
||||||
|
|
||||||
|
# Configure GPU persistence mode if NVIDIA available
|
||||||
|
if command -v nvidia-smi &> /dev/null; then
|
||||||
|
nvidia-smi -pm 1 || log_warning "Could not enable GPU persistence mode"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "System optimized for ML workloads"
|
||||||
|
}
|
||||||
|
|
||||||
|
install_fetch_ml() {
|
||||||
|
log_info "Installing Fetch ML..."
|
||||||
|
|
||||||
|
# Clone or copy Fetch ML
|
||||||
|
cd $FETCH_ML_HOME
|
||||||
|
|
||||||
|
if [[ ! -d "fetch_ml" ]]; then
|
||||||
|
# This would be replaced with actual repository URL
|
||||||
|
log_warning "Please clone Fetch ML repository manually to $FETCH_ML_HOME/fetch_ml"
|
||||||
|
log_info "Example: git clone https://github.com/your-org/fetch_ml.git"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd fetch_ml
|
||||||
|
|
||||||
|
# Build
|
||||||
|
export PATH=$PATH:/usr/local/go/bin
|
||||||
|
make build
|
||||||
|
|
||||||
|
# Copy binaries
|
||||||
|
cp bin/* $FETCH_ML_HOME/bin/
|
||||||
|
chmod +x $FETCH_ML_HOME/bin/*
|
||||||
|
|
||||||
|
# Copy configs
|
||||||
|
mkdir -p $FETCH_ML_HOME/configs
|
||||||
|
cp configs/config-local.yaml.example $FETCH_ML_HOME/configs/config-local.yaml
|
||||||
|
|
||||||
|
# Set permissions
|
||||||
|
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||||
|
|
||||||
|
log_success "Fetch ML installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
main() {
|
||||||
|
log_info "Starting Fetch ML Ubuntu server setup..."
|
||||||
|
|
||||||
|
check_root
|
||||||
|
check_ubuntu
|
||||||
|
|
||||||
|
update_system
|
||||||
|
install_go
|
||||||
|
install_podman
|
||||||
|
install_redis
|
||||||
|
install_nvidia_drivers
|
||||||
|
install_ml_tools
|
||||||
|
ensure_user
|
||||||
|
create_directories
|
||||||
|
setup_firewall
|
||||||
|
setup_systemd_services
|
||||||
|
setup_logrotate
|
||||||
|
hardening_steps
|
||||||
|
install_fetch_ml
|
||||||
|
|
||||||
|
log_success "Fetch ML setup complete!"
|
||||||
|
echo
|
||||||
|
log_info "Next steps:"
|
||||||
|
echo "1. Clone Fetch ML repository: git clone https://github.com/your-org/fetch_ml.git $FETCH_ML_HOME/fetch_ml"
|
||||||
|
echo "2. Configure: $FETCH_ML_HOME/configs/config-local.yaml"
|
||||||
|
echo "3. Start services: systemctl start fetch_ml_worker fetch_ml_data_manager"
|
||||||
|
echo "4. Check status: systemctl status fetch_ml_worker fetch_ml_data_manager"
|
||||||
|
echo "5. View logs: journalctl -u fetch_ml_worker -f"
|
||||||
|
echo
|
||||||
|
log_info "Services will be available at:"
|
||||||
|
echo "- Worker API: http://$(hostname -I | awk '{print $1}'):8080"
|
||||||
|
echo "- Data Manager: http://$(hostname -I | awk '{print $1}'):8081"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Run main function
|
||||||
|
main "$@"
|
||||||
67
scripts/legacy/test_tools.sh
Executable file
67
scripts/legacy/test_tools.sh
Executable file
|
|
@ -0,0 +1,67 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "=== Test Tools Harness ==="
|
||||||
|
|
||||||
|
# Function to check if Redis is running, start temporary instance if needed
|
||||||
|
ensure_redis() {
|
||||||
|
if ! redis-cli ping >/dev/null 2>&1; then
|
||||||
|
echo "Starting temporary Redis instance..."
|
||||||
|
redis-server --daemonize yes --port 6379
|
||||||
|
sleep 2
|
||||||
|
if ! redis-cli ping >/dev/null 2>&1; then
|
||||||
|
echo "Failed to start Redis"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "Redis started successfully"
|
||||||
|
# Set up cleanup trap
|
||||||
|
trap 'echo "Stopping temporary Redis..."; redis-cli shutdown || true' EXIT
|
||||||
|
else
|
||||||
|
echo "Redis is already running"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Step 1: Build Go binaries
|
||||||
|
echo "Building Go binaries..."
|
||||||
|
go build -o bin/api-server ./cmd/api-server
|
||||||
|
go build -o bin/worker ./cmd/worker
|
||||||
|
go build -o bin/data_manager ./cmd/data_manager
|
||||||
|
go build -o bin/user_manager ./cmd/user_manager
|
||||||
|
|
||||||
|
# Step 2: Build Zig CLI
|
||||||
|
echo "Building Zig CLI..."
|
||||||
|
cd cli
|
||||||
|
zig build
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# Step 3: Ensure Redis is running
|
||||||
|
ensure_redis
|
||||||
|
|
||||||
|
# Step 4: Run Go tests
|
||||||
|
echo "Running Go tests..."
|
||||||
|
go test ./...
|
||||||
|
|
||||||
|
# Step 5: Run Zig tests
|
||||||
|
echo "Running Zig CLI tests..."
|
||||||
|
cd cli
|
||||||
|
zig test
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# Step 6: Run Go E2E tests (Redis is already available)
|
||||||
|
echo "Running Go E2E tests..."
|
||||||
|
go test ./tests/e2e/...
|
||||||
|
|
||||||
|
# Step 7: Smoke test API server and CLI
|
||||||
|
echo "Running smoke test..."
|
||||||
|
# Start API server in background on different port
|
||||||
|
./bin/api-server -config configs/config.yaml -port 19101 -no-tls > /tmp/api-server.log 2>&1 &
|
||||||
|
API_PID=$!
|
||||||
|
sleep 2
|
||||||
|
|
||||||
|
# Test CLI status
|
||||||
|
./cli/zig-out/bin/ml status -server http://localhost:19101
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
kill $API_PID 2>/dev/null || true
|
||||||
|
|
||||||
|
echo "=== All tests completed successfully ==="
|
||||||
183
scripts/lib/common.sh
Executable file
183
scripts/lib/common.sh
Executable file
|
|
@ -0,0 +1,183 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Common shell functions for FetchML scripts
|
||||||
|
# Source this file in other scripts: source "$(dirname "$0")/lib/common.sh"
|
||||||
|
|
||||||
|
# Colors for output
|
||||||
|
export BOLD='\033[1m'
|
||||||
|
export GREEN='\033[0;32m'
|
||||||
|
export BLUE='\033[0;34m'
|
||||||
|
export YELLOW='\033[0;33m'
|
||||||
|
export RED='\033[0;31m'
|
||||||
|
export NC='\033[0m'
|
||||||
|
|
||||||
|
###################
|
||||||
|
# Logging functions
|
||||||
|
###################
|
||||||
|
|
||||||
|
log_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_success() {
|
||||||
|
echo -e "${GREEN}✓${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warn() {
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
log_step() {
|
||||||
|
local step=$1
|
||||||
|
local total=$2
|
||||||
|
local message=$3
|
||||||
|
echo -e "${BLUE}[$step/$total]${NC} $message"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_header() {
|
||||||
|
local title=$1
|
||||||
|
echo ""
|
||||||
|
echo -e "${BOLD}=== $title ===${NC}"
|
||||||
|
echo ""
|
||||||
|
}
|
||||||
|
|
||||||
|
###################
|
||||||
|
# System detection
|
||||||
|
###################
|
||||||
|
|
||||||
|
detect_distro() {
|
||||||
|
if [ -f /etc/os-release ]; then
|
||||||
|
. /etc/os-release
|
||||||
|
export DISTRO=$ID
|
||||||
|
export DISTRO_VERSION=$VERSION_ID
|
||||||
|
elif [ -f /etc/redhat-release ]; then
|
||||||
|
export DISTRO="rhel"
|
||||||
|
export DISTRO_VERSION="unknown"
|
||||||
|
else
|
||||||
|
export DISTRO="unknown"
|
||||||
|
export DISTRO_VERSION="unknown"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Detect package manager
|
||||||
|
if command -v dnf &>/dev/null; then
|
||||||
|
export PKG_MANAGER="dnf"
|
||||||
|
elif command -v yum &>/dev/null; then
|
||||||
|
export PKG_MANAGER="yum"
|
||||||
|
elif command -v apt-get &>/dev/null; then
|
||||||
|
export PKG_MANAGER="apt"
|
||||||
|
elif command -v pacman &>/dev/null; then
|
||||||
|
export PKG_MANAGER="pacman"
|
||||||
|
elif command -v zypper &>/dev/null; then
|
||||||
|
export PKG_MANAGER="zypper"
|
||||||
|
else
|
||||||
|
log_warn "No known package manager found"
|
||||||
|
export PKG_MANAGER="unknown"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_info "Detected: $DISTRO $DISTRO_VERSION (using $PKG_MANAGER)"
|
||||||
|
}
|
||||||
|
|
||||||
|
###################
|
||||||
|
# Utility functions
|
||||||
|
###################
|
||||||
|
|
||||||
|
check_command() {
|
||||||
|
local cmd=$1
|
||||||
|
local install_hint=$2
|
||||||
|
|
||||||
|
if ! command -v "$cmd" &>/dev/null; then
|
||||||
|
log_error "$cmd not found"
|
||||||
|
if [ -n "$install_hint" ]; then
|
||||||
|
log_info "Install with: $install_hint"
|
||||||
|
fi
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
check_root() {
|
||||||
|
if [ "$EUID" -ne 0 ]; then
|
||||||
|
log_error "This script must be run with sudo"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
confirm() {
|
||||||
|
local prompt=$1
|
||||||
|
local default=${2:-n}
|
||||||
|
|
||||||
|
if [ "$default" = "y" ]; then
|
||||||
|
read -p "$prompt [Y/n]: " -n 1 -r
|
||||||
|
else
|
||||||
|
read -p "$prompt [y/N]: " -n 1 -r
|
||||||
|
fi
|
||||||
|
echo
|
||||||
|
|
||||||
|
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
###################
|
||||||
|
# Firewall management
|
||||||
|
###################
|
||||||
|
|
||||||
|
setup_firewall() {
|
||||||
|
local port=$1
|
||||||
|
local comment=${2:-"FetchML"}
|
||||||
|
|
||||||
|
if command -v firewall-cmd &>/dev/null; then
|
||||||
|
# RHEL/Rocky/Fedora (firewalld)
|
||||||
|
sudo firewall-cmd --permanent --add-port="${port}/tcp" >/dev/null 2>&1
|
||||||
|
log_success "Firewall rule added (firewalld): ${port}/tcp"
|
||||||
|
return 0
|
||||||
|
elif command -v ufw &>/dev/null; then
|
||||||
|
# Ubuntu/Debian (ufw)
|
||||||
|
sudo ufw allow "${port}/tcp" comment "$comment" >/dev/null 2>&1
|
||||||
|
log_success "Firewall rule added (ufw): ${port}/tcp"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
log_warn "No firewall detected. Manually open port ${port}/tcp"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
reload_firewall() {
|
||||||
|
if command -v firewall-cmd &>/dev/null; then
|
||||||
|
sudo firewall-cmd --reload >/dev/null 2>&1
|
||||||
|
log_success "Firewall reloaded (firewalld)"
|
||||||
|
elif command -v ufw &>/dev/null; then
|
||||||
|
sudo ufw reload >/dev/null 2>&1 || true
|
||||||
|
log_success "Firewall reloaded (ufw)"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
###################
|
||||||
|
# File/directory management
|
||||||
|
###################
|
||||||
|
|
||||||
|
create_dir() {
|
||||||
|
local dir=$1
|
||||||
|
local owner=${2:-$USER}
|
||||||
|
local group=${3:-$(id -gn)}
|
||||||
|
|
||||||
|
sudo mkdir -p "$dir"
|
||||||
|
sudo chown "$owner:$group" "$dir"
|
||||||
|
sudo chmod 755 "$dir"
|
||||||
|
log_success "Created: $dir"
|
||||||
|
}
|
||||||
|
|
||||||
|
check_service() {
|
||||||
|
local service=$1
|
||||||
|
|
||||||
|
if systemctl list-unit-files | grep -q "^${service}"; then
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
275
scripts/setup-monitoring-prod.sh
Executable file
275
scripts/setup-monitoring-prod.sh
Executable file
|
|
@ -0,0 +1,275 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Production Monitoring Stack Setup for Linux
|
||||||
|
# Deploys Prometheus/Grafana/Loki/Promtail as Podman containers with systemd
|
||||||
|
# Compatible with: Rocky/RHEL/CentOS, Ubuntu/Debian, Arch, SUSE, etc.
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
BOLD='\033[1m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
YELLOW='\033[0;33m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
echo -e "${BOLD}=== FetchML Monitoring Stack Setup (Linux) ===${NC}\n"
|
||||||
|
|
||||||
|
# Detect Linux distribution and package manager
|
||||||
|
detect_distro() {
|
||||||
|
if [ -f /etc/os-release ]; then
|
||||||
|
. /etc/os-release
|
||||||
|
DISTRO=$ID
|
||||||
|
DISTRO_VERSION=$VERSION_ID
|
||||||
|
elif [ -f /etc/redhat-release ]; then
|
||||||
|
DISTRO="rhel"
|
||||||
|
else
|
||||||
|
DISTRO="unknown"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Detect package manager
|
||||||
|
if command -v dnf &>/dev/null; then
|
||||||
|
PKG_MANAGER="dnf"
|
||||||
|
elif command -v yum &>/dev/null; then
|
||||||
|
PKG_MANAGER="yum"
|
||||||
|
elif command -v apt-get &>/dev/null; then
|
||||||
|
PKG_MANAGER="apt"
|
||||||
|
elif command -v pacman &>/dev/null; then
|
||||||
|
PKG_MANAGER="pacman"
|
||||||
|
elif command -v zypper &>/dev/null; then
|
||||||
|
PKG_MANAGER="zypper"
|
||||||
|
else
|
||||||
|
echo -e "${YELLOW}Warning: No known package manager found${NC}"
|
||||||
|
PKG_MANAGER="unknown"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Detected distribution: $DISTRO (using $PKG_MANAGER)"
|
||||||
|
}
|
||||||
|
|
||||||
|
detect_distro
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
DATA_PATH="${1:-/data/monitoring}"
|
||||||
|
ML_USER="${2:-ml-user}"
|
||||||
|
ML_GROUP="${3:-ml-group}"
|
||||||
|
|
||||||
|
echo "Configuration:"
|
||||||
|
echo " Monitoring data path: $DATA_PATH"
|
||||||
|
echo " User: $ML_USER"
|
||||||
|
echo " Group: $ML_GROUP"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create pod for monitoring stack
|
||||||
|
POD_NAME="monitoring"
|
||||||
|
|
||||||
|
# 1. Create directories
|
||||||
|
echo -e "${BLUE}[1/6]${NC} Creating directory structure..."
|
||||||
|
sudo mkdir -p "${DATA_PATH}"/{prometheus,grafana,loki,promtail-config}
|
||||||
|
sudo mkdir -p /etc/fetch_ml/monitoring
|
||||||
|
sudo mkdir -p /var/lib/grafana/dashboards
|
||||||
|
|
||||||
|
sudo chown -R $ML_USER:$ML_GROUP $DATA_PATH
|
||||||
|
sudo chmod 755 $DATA_PATH
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Directories created"
|
||||||
|
|
||||||
|
# 2. Copy configuration files
|
||||||
|
echo -e "${BLUE}[2/6]${NC} Copying configuration files..."
|
||||||
|
sudo cp monitoring/prometheus.yml /etc/fetch_ml/monitoring/
|
||||||
|
sudo cp monitoring/loki-config.yml /etc/fetch_ml/monitoring/
|
||||||
|
sudo cp monitoring/promtail-config.yml /etc/fetch_ml/monitoring/
|
||||||
|
sudo cp monitoring/grafana/provisioning /etc/fetch_ml/monitoring/ -r
|
||||||
|
sudo cp monitoring/grafana-dashboard.json /var/lib/grafana/dashboards/ml-queue.json
|
||||||
|
sudo cp monitoring/logs-dashboard.json /var/lib/grafana/dashboards/logs.json
|
||||||
|
|
||||||
|
sudo chown -R $ML_USER:$ML_GROUP /etc/fetch_ml/monitoring
|
||||||
|
sudo chown -R $ML_USER:$ML_GROUP /var/lib/grafana
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Configuration copied"
|
||||||
|
|
||||||
|
# 3. Create Podman pod
|
||||||
|
echo -e "${BLUE}[3/6]${NC} Creating Podman pod..."
|
||||||
|
sudo -u $ML_USER podman pod create \\
|
||||||
|
--name $POD_NAME \\
|
||||||
|
-p 3000:3000 \\
|
||||||
|
-p 9090:9090 \\
|
||||||
|
-p 3100:3100 \\
|
||||||
|
|| echo "Pod may already exist"
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Pod created"
|
||||||
|
|
||||||
|
# 4. Create systemd service for monitoring pod
|
||||||
|
echo -e "${BLUE}[4/6]${NC} Creating systemd services..."
|
||||||
|
|
||||||
|
# Prometheus service
|
||||||
|
sudo tee /etc/systemd/system/prometheus.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Prometheus Monitoring
|
||||||
|
After=network.target
|
||||||
|
PartOf=$POD_NAME-pod.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 9090:9090
|
||||||
|
ExecStart=/usr/bin/podman run --rm --name prometheus \\
|
||||||
|
--pod $POD_NAME \\
|
||||||
|
-v /etc/fetch_ml/monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro \\
|
||||||
|
-v ${DATA_PATH}/prometheus:/prometheus \\
|
||||||
|
docker.io/prom/prometheus:latest \\
|
||||||
|
--config.file=/etc/prometheus/prometheus.yml \\
|
||||||
|
--storage.tsdb.path=/prometheus \\
|
||||||
|
--web.enable-lifecycle
|
||||||
|
|
||||||
|
ExecStop=/usr/bin/podman stop -t 10 prometheus
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Loki service
|
||||||
|
sudo tee /etc/systemd/system/loki.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Loki Log Aggregation
|
||||||
|
After=network.target
|
||||||
|
PartOf=$POD_NAME-pod.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 3100:3100
|
||||||
|
ExecStart=/usr/bin/podman run --rm --name loki \\
|
||||||
|
--pod $POD_NAME \\
|
||||||
|
-v /etc/fetch_ml/monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro \\
|
||||||
|
-v ${DATA_PATH}/loki:/loki \\
|
||||||
|
docker.io/grafana/loki:latest \\
|
||||||
|
-config.file=/etc/loki/local-config.yaml
|
||||||
|
|
||||||
|
ExecStop=/usr/bin/podman stop -t 10 loki
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Grafana service
|
||||||
|
sudo tee /etc/systemd/system/grafana.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Grafana Visualization
|
||||||
|
After=network.target prometheus.service loki.service
|
||||||
|
PartOf=$POD_NAME-pod.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 3000:3000
|
||||||
|
ExecStart=/usr/bin/podman run --rm --name grafana \\
|
||||||
|
--pod $POD_NAME \\
|
||||||
|
-v ${DATA_PATH}/grafana:/var/lib/grafana \\
|
||||||
|
-v /etc/fetch_ml/monitoring/grafana/provisioning:/etc/grafana/provisioning:ro \\
|
||||||
|
-v /var/lib/grafana/dashboards:/var/lib/grafana/dashboards:ro \\
|
||||||
|
-e GF_SECURITY_ADMIN_PASSWORD=\${GRAFANA_ADMIN_PASSWORD:-$(openssl rand -base64 32)} \\
|
||||||
|
-e GF_USERS_ALLOW_SIGN_UP=false \\
|
||||||
|
-e GF_AUTH_ANONYMOUS_ENABLED=false \\
|
||||||
|
docker.io/grafana/grafana:latest
|
||||||
|
|
||||||
|
ExecStop=/usr/bin/podman stop -t 10 grafana
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Promtail service
|
||||||
|
sudo tee /etc/systemd/system/promtail.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Promtail Log Collector
|
||||||
|
After=network.target loki.service
|
||||||
|
PartOf=$POD_NAME-pod.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME
|
||||||
|
ExecStart=/usr/bin/podman run --rm --name promtail \\
|
||||||
|
--pod $POD_NAME \\
|
||||||
|
-v /etc/fetch_ml/monitoring/promtail-config.yml:/etc/promtail/config.yml:ro \\
|
||||||
|
-v /var/log/fetch_ml:/var/log/app:ro \\
|
||||||
|
docker.io/grafana/promtail:latest \\
|
||||||
|
-config.file=/etc/promtail/config.yml
|
||||||
|
|
||||||
|
ExecStop=/usr/bin/podman stop -t 10 promtail
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
echo -e "${GREEN}✓${NC} Systemd services created"
|
||||||
|
|
||||||
|
# 5. Create monitoring pod service
|
||||||
|
echo -e "${BLUE}[5/6]${NC} Creating pod management service..."
|
||||||
|
sudo -u $ML_USER podman generate systemd --new --name $POD_NAME \\
|
||||||
|
| sudo tee /etc/systemd/system/$POD_NAME-pod.service >/dev/null
|
||||||
|
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
echo -e "${GREEN}✓${NC} Pod service created"
|
||||||
|
|
||||||
|
# 6. Setup firewall rules
|
||||||
|
echo -e "${BLUE}[6/6]${NC} Configuring firewall..."
|
||||||
|
if command -v firewall-cmd &>/dev/null; then
|
||||||
|
# RHEL/Rocky/Fedora (firewalld)
|
||||||
|
sudo firewall-cmd --permanent --add-port=3000/tcp # Grafana
|
||||||
|
sudo firewall-cmd --permanent --add-port=9090/tcp # Prometheus
|
||||||
|
sudo firewall-cmd --reload
|
||||||
|
echo -e "${GREEN}✓${NC} Firewall configured (firewalld)"
|
||||||
|
elif command -v ufw &>/dev/null; then
|
||||||
|
# Ubuntu/Debian (ufw)
|
||||||
|
sudo ufw allow 3000/tcp comment 'Grafana'
|
||||||
|
sudo ufw allow 9090/tcp comment 'Prometheus'
|
||||||
|
echo -e "${GREEN}✓${NC} Firewall configured (ufw)"
|
||||||
|
else
|
||||||
|
echo -e "${YELLOW}!${NC} No firewall detected. You may need to manually open ports 3000 and 9090"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
echo ""
|
||||||
|
echo -e "${BOLD}=== Monitoring Stack Setup Complete! ===${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Services created:"
|
||||||
|
echo " - prometheus.service (Metrics collection)"
|
||||||
|
echo " - loki.service (Log aggregation)"
|
||||||
|
echo " - grafana.service (Visualization)"
|
||||||
|
echo " - promtail.service (Log shipping)"
|
||||||
|
echo ""
|
||||||
|
echo -e "${BOLD}Next steps:${NC}"
|
||||||
|
echo "1. Start services:"
|
||||||
|
echo " sudo systemctl start prometheus"
|
||||||
|
echo " sudo systemctl start loki"
|
||||||
|
echo " sudo systemctl start promtail"
|
||||||
|
echo " sudo systemctl start grafana"
|
||||||
|
echo ""
|
||||||
|
echo "2. Enable on boot:"
|
||||||
|
echo " sudo systemctl enable prometheus loki promtail grafana"
|
||||||
|
echo ""
|
||||||
|
echo "3. Access Grafana:"
|
||||||
|
echo " http://YOUR_SERVER_IP:3000"
|
||||||
|
echo " Username: admin"
|
||||||
|
echo " Password: admin (change on first login)"
|
||||||
|
echo ""
|
||||||
|
echo "4. Check logs:"
|
||||||
|
echo " sudo journalctl -u prometheus -f"
|
||||||
|
echo " sudo journalctl -u grafana -f"
|
||||||
|
echo ""
|
||||||
229
scripts/setup-prod.sh
Executable file
229
scripts/setup-prod.sh
Executable file
|
|
@ -0,0 +1,229 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Production Setup Script for Rocky Linux (Bare Metal)
|
||||||
|
# This script sets up the complete FetchML environment on bare metal
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
BOLD='\033[1m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
echo -e "${BOLD}=== FetchML Production Setup (Rocky Linux Bare Metal) ===${NC}\n"
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
BASE_PATH="${1:-/data/ml-experiments}"
|
||||||
|
ML_USER="${2:-ml-user}"
|
||||||
|
ML_GROUP="${3:-ml-group}"
|
||||||
|
|
||||||
|
echo "Configuration:"
|
||||||
|
echo " Base path: $BASE_PATH"
|
||||||
|
echo " ML user: $ML_USER"
|
||||||
|
echo " ML group: $ML_GROUP"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 1. Create system user if it doesn't exist
|
||||||
|
echo -e "${BLUE}[1/8]${NC} Creating system user..."
|
||||||
|
if id "$ML_USER" &>/dev/null; then
|
||||||
|
echo " User $ML_USER already exists"
|
||||||
|
else
|
||||||
|
sudo useradd -r -s /bin/bash -m -d /home/$ML_USER -c "ML System User" $ML_USER
|
||||||
|
echo -e "${GREEN}✓${NC} Created user: $ML_USER"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 2. Create directory structure
|
||||||
|
echo -e "${BLUE}[2/8]${NC} Creating directory structure..."
|
||||||
|
sudo mkdir -p "${BASE_PATH}"/{experiments,pending,running,finished,failed,datasets}
|
||||||
|
sudo mkdir -p /var/log/fetch_ml
|
||||||
|
sudo mkdir -p /etc/fetch_ml
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Created directories:"
|
||||||
|
echo " $BASE_PATH/experiments/"
|
||||||
|
echo " $BASE_PATH/pending/"
|
||||||
|
echo " $BASE_PATH/running/"
|
||||||
|
echo " $BASE_PATH/finished/"
|
||||||
|
echo " $BASE_PATH/failed/"
|
||||||
|
echo " $BASE_PATH/datasets/"
|
||||||
|
echo " /var/log/fetch_ml/"
|
||||||
|
echo " /etc/fetch_ml/"
|
||||||
|
|
||||||
|
# 3. Set ownership and permissions
|
||||||
|
echo -e "${BLUE}[3/8]${NC} Setting permissions..."
|
||||||
|
sudo chown -R $ML_USER:$ML_GROUP $BASE_PATH
|
||||||
|
sudo chmod 755 $BASE_PATH
|
||||||
|
sudo chmod 700 $BASE_PATH/experiments # Restrict experiment data
|
||||||
|
|
||||||
|
sudo chown -R $ML_USER:$ML_GROUP /var/log/fetch_ml
|
||||||
|
sudo chmod 755 /var/log/fetch_ml
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Permissions set"
|
||||||
|
|
||||||
|
# 4. Install system dependencies (Rocky Linux)
|
||||||
|
echo -e "${BLUE}[4/8]${NC} Installing system dependencies..."
|
||||||
|
sudo dnf install -y \
|
||||||
|
golang \
|
||||||
|
podman \
|
||||||
|
redis \
|
||||||
|
git \
|
||||||
|
make \
|
||||||
|
gcc \
|
||||||
|
|| echo "Some packages may already be installed"
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Dependencies installed"
|
||||||
|
|
||||||
|
# 5. Configure Podman for GPU access (if NVIDIA GPU present)
|
||||||
|
echo -e "${BLUE}[5/8]${NC} Configuring Podman..."
|
||||||
|
if lspci | grep -i nvidia &>/dev/null; then
|
||||||
|
echo " NVIDIA GPU detected, configuring GPU access..."
|
||||||
|
|
||||||
|
# Install nvidia-container-toolkit if not present
|
||||||
|
if ! command -v nvidia-container-toolkit &>/dev/null; then
|
||||||
|
echo " Installing nvidia-container-toolkit..."
|
||||||
|
sudo dnf config-manager --add-repo \
|
||||||
|
https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
|
||||||
|
sudo dnf install -y nvidia-container-toolkit
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Configure Podman CDI
|
||||||
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
||||||
|
echo -e "${GREEN}✓${NC} GPU support configured"
|
||||||
|
else
|
||||||
|
echo " No NVIDIA GPU detected, skipping GPU setup"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 6. Configure Redis
|
||||||
|
echo -e "${BLUE}[6/8]${NC} Configuring Redis..."
|
||||||
|
sudo systemctl enable redis
|
||||||
|
sudo systemctl start redis || echo "Redis may already be running"
|
||||||
|
|
||||||
|
# Set Redis password if not already configured
|
||||||
|
if ! sudo grep -q "^requirepass" /etc/redis/redis.conf 2>/dev/null; then
|
||||||
|
REDIS_PASSWORD=$(openssl rand -base64 32)
|
||||||
|
echo "requirepass $REDIS_PASSWORD" | sudo tee -a /etc/redis/redis.conf >/dev/null
|
||||||
|
sudo systemctl restart redis
|
||||||
|
echo " Generated Redis password: $REDIS_PASSWORD"
|
||||||
|
echo " Save this password for your configuration!"
|
||||||
|
else
|
||||||
|
echo " Redis password already configured"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Redis configured"
|
||||||
|
|
||||||
|
# 7. Setup systemd services
|
||||||
|
echo -e "${BLUE}[7/8]${NC} Creating systemd services..."
|
||||||
|
|
||||||
|
# API Server service
|
||||||
|
sudo tee /etc/systemd/system/fetchml-api.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=FetchML API Server
|
||||||
|
After=network.target redis.service
|
||||||
|
Wants=redis.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
WorkingDirectory=/opt/fetch_ml
|
||||||
|
ExecStart=/usr/local/bin/fetchml-api -config /etc/fetch_ml/config.yaml
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
StandardOutput=append:/var/log/fetch_ml/api.log
|
||||||
|
StandardError=append:/var/log/fetch_ml/api-error.log
|
||||||
|
|
||||||
|
# Security hardening
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ReadWritePaths=$BASE_PATH /var/log/fetch_ml
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Worker service
|
||||||
|
sudo tee /etc/systemd/system/fetchml-worker.service >/dev/null <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=FetchML Worker
|
||||||
|
After=network.target redis.service fetchml-api.service
|
||||||
|
Wants=redis.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=$ML_USER
|
||||||
|
Group=$ML_GROUP
|
||||||
|
WorkingDirectory=/opt/fetch_ml
|
||||||
|
ExecStart=/usr/local/bin/fetchml-worker -config /etc/fetch_ml/worker.toml
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
StandardOutput=append:/var/log/fetch_ml/worker.log
|
||||||
|
StandardError=append:/var/log/fetch_ml/worker-error.log
|
||||||
|
|
||||||
|
# Security hardening
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ReadWritePaths=$BASE_PATH /var/log/fetch_ml
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
echo -e "${GREEN}✓${NC} Systemd services created"
|
||||||
|
|
||||||
|
# 8. Setup logrotate
|
||||||
|
echo -e "${BLUE}[8/8]${NC} Configuring log rotation..."
|
||||||
|
sudo tee /etc/logrotate.d/fetchml >/dev/null <<EOF
|
||||||
|
/var/log/fetch_ml/*.log {
|
||||||
|
daily
|
||||||
|
rotate 14
|
||||||
|
compress
|
||||||
|
delaycompress
|
||||||
|
notifempty
|
||||||
|
missingok
|
||||||
|
create 0640 $ML_USER $ML_GROUP
|
||||||
|
sharedscripts
|
||||||
|
postrotate
|
||||||
|
systemctl reload fetchml-api >/dev/null 2>&1 || true
|
||||||
|
systemctl reload fetchml-worker >/dev/null 2>&1 || true
|
||||||
|
endscript
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
echo -e "${GREEN}✓${NC} Log rotation configured"
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
echo ""
|
||||||
|
echo -e "${BOLD}=== Setup Complete! ===${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Directory structure created at: $BASE_PATH"
|
||||||
|
echo "Logs will be written to: /var/log/fetch_ml/"
|
||||||
|
echo "Configuration directory: /etc/fetch_ml/"
|
||||||
|
echo ""
|
||||||
|
echo -e "${BOLD}Next steps:${NC}"
|
||||||
|
echo "1. Copy your config files:"
|
||||||
|
echo " sudo cp configs/config-prod.yaml /etc/fetch_ml/config.yaml"
|
||||||
|
echo " sudo cp configs/worker-prod.toml /etc/fetch_ml/worker.toml"
|
||||||
|
echo ""
|
||||||
|
echo "2. Build and install binaries:"
|
||||||
|
echo " make build"
|
||||||
|
echo " sudo cp bin/api-server /usr/local/bin/fetchml-api"
|
||||||
|
echo " sudo cp bin/worker /usr/local/bin/fetchml-worker"
|
||||||
|
echo ""
|
||||||
|
echo "3. Update config files with your settings (Redis password, API keys, etc.)"
|
||||||
|
echo ""
|
||||||
|
echo "4. Start services:"
|
||||||
|
echo " sudo systemctl start fetchml-api"
|
||||||
|
echo " sudo systemctl start fetchml-worker"
|
||||||
|
echo ""
|
||||||
|
echo "5. Enable services to start on boot:"
|
||||||
|
echo " sudo systemctl enable fetchml-api"
|
||||||
|
echo " sudo systemctl enable fetchml-worker"
|
||||||
|
echo ""
|
||||||
|
echo "6. Check status:"
|
||||||
|
echo " sudo systemctl status fetchml-api"
|
||||||
|
echo " sudo systemctl status fetchml-worker"
|
||||||
|
echo " sudo journalctl -u fetchml-api -f"
|
||||||
|
echo ""
|
||||||
0
scripts/setup-production.sh
Normal file
0
scripts/setup-production.sh
Normal file
204
scripts/validate-prod-config.sh
Executable file
204
scripts/validate-prod-config.sh
Executable file
|
|
@ -0,0 +1,204 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Production Configuration Validator
|
||||||
|
# Verifies all paths and configs are consistent for experiment lifecycle
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
BOLD='\033[1m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
RED='\033[0;31m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
echo -e "${BOLD}=== FetchML Production Configuration Validator ===${NC}\n"
|
||||||
|
|
||||||
|
# Configuration file paths
|
||||||
|
API_CONFIG="${1:-configs/config-prod.yaml}"
|
||||||
|
WORKER_CONFIG="${2:-configs/worker-prod.toml}"
|
||||||
|
|
||||||
|
errors=0
|
||||||
|
warnings=0
|
||||||
|
|
||||||
|
check_pass() {
|
||||||
|
echo -e "${GREEN}✓${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
check_fail() {
|
||||||
|
echo -e "${RED}✗${NC} $1"
|
||||||
|
((errors++))
|
||||||
|
}
|
||||||
|
|
||||||
|
check_warn() {
|
||||||
|
echo -e "${YELLOW}⚠${NC} $1"
|
||||||
|
((warnings++))
|
||||||
|
}
|
||||||
|
|
||||||
|
# 1. Check API server config exists
|
||||||
|
echo -e "${BOLD}Checking API Server Configuration${NC}"
|
||||||
|
if [ ! -f "$API_CONFIG" ]; then
|
||||||
|
check_fail "API config not found: $API_CONFIG"
|
||||||
|
else
|
||||||
|
check_pass "API config found: $API_CONFIG"
|
||||||
|
|
||||||
|
# Extract base_path from API config
|
||||||
|
API_BASE_PATH=$(grep 'base_path:' "$API_CONFIG" | head -1 | awk '{print $2}' | tr -d '"')
|
||||||
|
echo " Base path: $API_BASE_PATH"
|
||||||
|
|
||||||
|
# Check if path is absolute
|
||||||
|
if [[ "$API_BASE_PATH" != /* ]]; then
|
||||||
|
check_fail "base_path must be absolute: $API_BASE_PATH"
|
||||||
|
else
|
||||||
|
check_pass "base_path is absolute"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Redis config
|
||||||
|
if grep -q 'redis:' "$API_CONFIG"; then
|
||||||
|
check_pass "Redis configuration present"
|
||||||
|
else
|
||||||
|
check_fail "Redis configuration missing"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check auth enabled
|
||||||
|
if grep -q 'enabled: true' "$API_CONFIG"; then
|
||||||
|
check_pass "Authentication enabled"
|
||||||
|
else
|
||||||
|
check_warn "Authentication disabled (not recommended for production)"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 2. Check Worker config (if provided)
|
||||||
|
if [ -f "$WORKER_CONFIG" ]; then
|
||||||
|
echo -e "${BOLD}Checking Worker Configuration${NC}"
|
||||||
|
check_pass "Worker config found: $WORKER_CONFIG"
|
||||||
|
|
||||||
|
# Extract base_path from worker config
|
||||||
|
WORKER_BASE_PATH=$(grep 'base_path' "$WORKER_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||||
|
echo " Base path: $WORKER_BASE_PATH"
|
||||||
|
|
||||||
|
# Compare paths
|
||||||
|
if [ "$API_BASE_PATH" = "$WORKER_BASE_PATH" ]; then
|
||||||
|
check_pass "API and Worker base_path match"
|
||||||
|
else
|
||||||
|
check_fail "base_path mismatch! API: $API_BASE_PATH, Worker: $WORKER_BASE_PATH"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check podman_image configured
|
||||||
|
if grep -q 'podman_image' "$WORKER_CONFIG"; then
|
||||||
|
PODMAN_IMAGE=$(grep 'podman_image' "$WORKER_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||||
|
check_pass "Podman image configured: $PODMAN_IMAGE"
|
||||||
|
else
|
||||||
|
check_fail "podman_image not configured"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check_warn "Worker config not found: $WORKER_CONFIG (optional for API server only)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 3. Check directory structure (if base_path exists)
|
||||||
|
if [ -n "$API_BASE_PATH" ] && [ -d "$API_BASE_PATH" ]; then
|
||||||
|
echo -e "${BOLD}Checking Directory Structure${NC}"
|
||||||
|
check_pass "Base directory exists: $API_BASE_PATH"
|
||||||
|
|
||||||
|
# Check subdirectories
|
||||||
|
for dir in experiments pending running finished failed; do
|
||||||
|
if [ -d "$API_BASE_PATH/$dir" ]; then
|
||||||
|
check_pass "$dir/ directory exists"
|
||||||
|
else
|
||||||
|
check_warn "$dir/ directory missing (will be created automatically)"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# Check permissions
|
||||||
|
if [ -w "$API_BASE_PATH" ]; then
|
||||||
|
check_pass "Base directory is writable"
|
||||||
|
else
|
||||||
|
check_fail "Base directory is not writable (check permissions)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
elif [ -n "$API_BASE_PATH" ]; then
|
||||||
|
check_warn "Base directory does not exist: $API_BASE_PATH (will need to be created)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 4. Check Redis connectivity (if server is running)
|
||||||
|
echo -e "${BOLD}Checking Redis Connectivity${NC}"
|
||||||
|
if command -v redis-cli &> /dev/null; then
|
||||||
|
if redis-cli ping &> /dev/null; then
|
||||||
|
check_pass "Redis server is running and accessible"
|
||||||
|
|
||||||
|
# Check queue
|
||||||
|
QUEUE_SIZE=$(redis-cli llen fetchml:tasks:queue 2>/dev/null || echo "0")
|
||||||
|
echo " Queue size: $QUEUE_SIZE tasks"
|
||||||
|
else
|
||||||
|
check_warn "Redis server not accessible (start with: redis-server)"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check_warn "redis-cli not installed (cannot verify Redis connectivity)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 5. Check Podman (if worker config exists)
|
||||||
|
if [ -f "$WORKER_CONFIG" ]; then
|
||||||
|
echo -e "${BOLD}Checking Podman${NC}"
|
||||||
|
if command -v podman &> /dev/null; then
|
||||||
|
check_pass "Podman is installed"
|
||||||
|
|
||||||
|
# Check if image exists
|
||||||
|
if [ -n "$PODMAN_IMAGE" ]; then
|
||||||
|
if podman image exists "$PODMAN_IMAGE" 2>/dev/null; then
|
||||||
|
check_pass "Podman image exists: $PODMAN_IMAGE"
|
||||||
|
else
|
||||||
|
check_warn "Podman image not found: $PODMAN_IMAGE (needs to be built)"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check GPU access (if configured)
|
||||||
|
if grep -q 'gpu_access.*true' "$WORKER_CONFIG" 2>/dev/null; then
|
||||||
|
if podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.8.0-base nvidia-smi &>/dev/null; then
|
||||||
|
check_pass "GPU access working"
|
||||||
|
else
|
||||||
|
check_warn "GPU access configured but not working (check nvidia-container-toolkit)"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check_fail "Podman not installed (required for worker)"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 6. Check CLI config consistency
|
||||||
|
echo -e "${BOLD}Checking CLI Configuration${NC}"
|
||||||
|
CLI_CONFIG="$HOME/.ml/config.toml"
|
||||||
|
if [ -f "$CLI_CONFIG" ]; then
|
||||||
|
check_pass "CLI config found: $CLI_CONFIG"
|
||||||
|
|
||||||
|
CLI_BASE=$(grep 'worker_base' "$CLI_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||||
|
if [ "$CLI_BASE" = "$API_BASE_PATH" ]; then
|
||||||
|
check_pass "CLI worker_base matches server base_path"
|
||||||
|
else
|
||||||
|
check_warn "CLI worker_base ($CLI_BASE) differs from server ($API_BASE_PATH)"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
check_warn "CLI config not found (run: ml init)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
echo -e "${BOLD}=== Summary ===${NC}"
|
||||||
|
if [ $errors -eq 0 ] && [ $warnings -eq 0 ]; then
|
||||||
|
echo -e "${GREEN}All checks passed! Configuration is ready for production.${NC}"
|
||||||
|
exit 0
|
||||||
|
elif [ $errors -eq 0 ]; then
|
||||||
|
echo -e "${YELLOW}Configuration has $warnings warning(s). Review before deployment.${NC}"
|
||||||
|
exit 0
|
||||||
|
else
|
||||||
|
echo -e "${RED}Configuration has $errors error(s) and $warnings warning(s). Fix before deployment.${NC}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
313
setup.sh
Executable file
313
setup.sh
Executable file
|
|
@ -0,0 +1,313 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Balanced Homelab Setup Script
|
||||||
|
# Keeps essential security (Fail2Ban, monitoring) while simplifying complexity
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
print_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Simple dependency check
|
||||||
|
check_deps() {
|
||||||
|
print_info "Checking dependencies..."
|
||||||
|
|
||||||
|
local missing=()
|
||||||
|
|
||||||
|
if ! command -v go &> /dev/null; then
|
||||||
|
missing+=("go")
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v zig &> /dev/null; then
|
||||||
|
missing+=("zig")
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v redis-server &> /dev/null; then
|
||||||
|
missing+=("redis-server")
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v docker &> /dev/null; then
|
||||||
|
missing+=("docker")
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ${#missing[@]} -gt 0 ]]; then
|
||||||
|
print_error "Missing dependencies: ${missing[*]}"
|
||||||
|
echo ""
|
||||||
|
echo "Install with:"
|
||||||
|
echo " macOS: brew install ${missing[*]}"
|
||||||
|
echo " Ubuntu: sudo apt-get install ${missing[*]}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Dependencies OK"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Simple setup
|
||||||
|
setup_project() {
|
||||||
|
print_info "Setting up project..."
|
||||||
|
|
||||||
|
# Create essential directories
|
||||||
|
mkdir -p ssl logs configs data monitoring
|
||||||
|
|
||||||
|
# Generate simple SSL cert
|
||||||
|
if [[ ! -f "ssl/cert.pem" ]]; then
|
||||||
|
openssl req -x509 -newkey rsa:2048 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||||
|
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Homelab/CN=localhost" \
|
||||||
|
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null
|
||||||
|
print_success "SSL certificates generated"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create balanced config
|
||||||
|
cat > configs/config.yaml << 'EOF'
|
||||||
|
base_path: "./data/experiments"
|
||||||
|
|
||||||
|
auth:
|
||||||
|
enabled: true
|
||||||
|
api_keys:
|
||||||
|
homelab_user:
|
||||||
|
hash: "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8" # "password"
|
||||||
|
admin: true
|
||||||
|
roles: ["user", "admin"]
|
||||||
|
permissions:
|
||||||
|
read: true
|
||||||
|
write: true
|
||||||
|
delete: true
|
||||||
|
|
||||||
|
server:
|
||||||
|
address: ":9101"
|
||||||
|
tls:
|
||||||
|
enabled: true
|
||||||
|
cert_file: "./ssl/cert.pem"
|
||||||
|
key_file: "./ssl/key.pem"
|
||||||
|
|
||||||
|
security:
|
||||||
|
rate_limit:
|
||||||
|
enabled: true
|
||||||
|
requests_per_minute: 30
|
||||||
|
burst_size: 5
|
||||||
|
ip_whitelist:
|
||||||
|
- "127.0.0.1"
|
||||||
|
- "::1"
|
||||||
|
- "192.168.0.0/16"
|
||||||
|
- "10.0.0.0/8"
|
||||||
|
- "172.16.0.0/12"
|
||||||
|
failed_login_lockout:
|
||||||
|
enabled: true
|
||||||
|
max_attempts: 3
|
||||||
|
lockout_duration: "15m"
|
||||||
|
|
||||||
|
redis:
|
||||||
|
url: "redis://localhost:6379"
|
||||||
|
|
||||||
|
logging:
|
||||||
|
level: "info"
|
||||||
|
file: "./logs/app.log"
|
||||||
|
audit_log: "./logs/audit.log"
|
||||||
|
access_log: "./logs/access.log"
|
||||||
|
|
||||||
|
monitoring:
|
||||||
|
enabled: true
|
||||||
|
metrics_port: 9090
|
||||||
|
health_check_interval: "30s"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
print_success "Configuration created"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Simple build
|
||||||
|
build_project() {
|
||||||
|
print_info "Building project..."
|
||||||
|
|
||||||
|
# Build Go apps
|
||||||
|
go build -o bin/api-server ./cmd/api-server
|
||||||
|
go build -o bin/worker ./cmd/worker
|
||||||
|
go build -o bin/tui ./cmd/tui
|
||||||
|
|
||||||
|
# Build Zig CLI
|
||||||
|
cd cli && zig build && cd ..
|
||||||
|
|
||||||
|
print_success "Build completed"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Setup Fail2Ban
|
||||||
|
setup_fail2ban() {
|
||||||
|
print_info "Setting up Fail2Ban..."
|
||||||
|
|
||||||
|
if ! command -v fail2ban-server &> /dev/null; then
|
||||||
|
print_warning "Fail2Ban not installed, skipping..."
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create Fail2Ban configuration
|
||||||
|
sudo mkdir -p /etc/fail2ban/jail.d 2>/dev/null || true
|
||||||
|
|
||||||
|
cat > /tmp/ml-experiments-jail.conf << 'EOF'
|
||||||
|
[DEFAULT]
|
||||||
|
bantime = 3600
|
||||||
|
findtime = 600
|
||||||
|
maxretry = 3
|
||||||
|
backend = systemd
|
||||||
|
|
||||||
|
[sshd]
|
||||||
|
enabled = true
|
||||||
|
port = ssh
|
||||||
|
logpath = /var/log/auth.log
|
||||||
|
maxretry = 3
|
||||||
|
|
||||||
|
[ml-experiments-api]
|
||||||
|
enabled = true
|
||||||
|
port = 9101
|
||||||
|
filter = ml-experiments-api
|
||||||
|
logpath = ./logs/audit.log
|
||||||
|
maxretry = 5
|
||||||
|
bantime = 7200
|
||||||
|
|
||||||
|
[ml-experiments-auth]
|
||||||
|
enabled = true
|
||||||
|
filter = ml-experiments-auth
|
||||||
|
logpath = ./logs/audit.log
|
||||||
|
maxretry = 3
|
||||||
|
bantime = 3600
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Create filter definitions
|
||||||
|
cat > /tmp/ml-experiments-api.conf << 'EOF'
|
||||||
|
[Definition]
|
||||||
|
failregex = ^.*<HOST>.*"status":40[13].*$
|
||||||
|
ignoreregex =
|
||||||
|
EOF
|
||||||
|
|
||||||
|
cat > /tmp/ml-experiments-auth.conf << 'EOF'
|
||||||
|
[Definition]
|
||||||
|
failregex = ^.*"event":"failed_login".*"client_ip":"<HOST>".*$
|
||||||
|
ignoreregex =
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Try to install configurations
|
||||||
|
if sudo cp /tmp/ml-experiments-jail.conf /etc/fail2ban/jail.d/ 2>/dev/null; then
|
||||||
|
sudo cp /tmp/ml-experiments-*.conf /etc/fail2ban/filter.d/ 2>/dev/null || true
|
||||||
|
sudo systemctl restart fail2ban 2>/dev/null || true
|
||||||
|
print_success "Fail2Ban configured"
|
||||||
|
else
|
||||||
|
print_warning "Could not configure Fail2Ban (requires sudo)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
rm -f /tmp/ml-experiments-*.conf
|
||||||
|
}
|
||||||
|
|
||||||
|
# Setup Redis
|
||||||
|
setup_redis() {
|
||||||
|
print_info "Setting up Redis..."
|
||||||
|
|
||||||
|
if ! pgrep -f "redis-server" > /dev/null; then
|
||||||
|
redis-server --daemonize yes --port 6379
|
||||||
|
print_success "Redis started"
|
||||||
|
else
|
||||||
|
print_info "Redis already running"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create simple management script
|
||||||
|
create_manage_script() {
|
||||||
|
cat > manage.sh << 'EOF'
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Simple management script
|
||||||
|
|
||||||
|
case "${1:-status}" in
|
||||||
|
"start")
|
||||||
|
echo "Starting services..."
|
||||||
|
redis-server --daemonize yes --port 6379 2>/dev/null || true
|
||||||
|
./bin/api-server -config configs/config.yaml &
|
||||||
|
echo "Services started"
|
||||||
|
;;
|
||||||
|
"stop")
|
||||||
|
echo "Stopping services..."
|
||||||
|
pkill -f "api-server" || true
|
||||||
|
redis-cli shutdown 2>/dev/null || true
|
||||||
|
echo "Services stopped"
|
||||||
|
;;
|
||||||
|
"status")
|
||||||
|
echo "=== Status ==="
|
||||||
|
if pgrep -f "redis-server" > /dev/null; then
|
||||||
|
echo "✅ Redis: Running"
|
||||||
|
else
|
||||||
|
echo "❌ Redis: Stopped"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if pgrep -f "api-server" > /dev/null; then
|
||||||
|
echo "✅ API Server: Running"
|
||||||
|
else
|
||||||
|
echo "❌ API Server: Stopped"
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
"logs")
|
||||||
|
echo "=== Recent Logs ==="
|
||||||
|
tail -20 logs/app.log 2>/dev/null || echo "No logs yet"
|
||||||
|
;;
|
||||||
|
"test")
|
||||||
|
echo "=== Testing ==="
|
||||||
|
curl -k -s https://localhost:9101/health || echo "API server not responding"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "Usage: $0 {start|stop|status|logs|test}"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
EOF
|
||||||
|
|
||||||
|
chmod +x manage.sh
|
||||||
|
print_success "Management script created"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Show next steps
|
||||||
|
show_next_steps() {
|
||||||
|
print_success "Setup completed!"
|
||||||
|
echo ""
|
||||||
|
echo "🎉 Setup complete!"
|
||||||
|
echo ""
|
||||||
|
echo "Next steps:"
|
||||||
|
echo " 1. Start services: ./tools/manage.sh start"
|
||||||
|
echo " 2. Check status: ./tools/manage.sh status"
|
||||||
|
echo " 3. Test API: curl -k -H 'X-API-Key: password' https://localhost:9101/health"
|
||||||
|
echo ""
|
||||||
|
echo "Configuration: configs/config.yaml"
|
||||||
|
echo "Logs: logs/app.log and logs/audit.log"
|
||||||
|
echo ""
|
||||||
|
print_success "Ready for homelab use!"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main setup
|
||||||
|
main() {
|
||||||
|
echo "ML Experiment Manager - Homelab Setup"
|
||||||
|
echo "====================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
check_deps
|
||||||
|
setup_project
|
||||||
|
build_project
|
||||||
|
setup_redis
|
||||||
|
create_manage_script
|
||||||
|
show_next_steps
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
||||||
396
tools/manage.sh
Executable file
396
tools/manage.sh
Executable file
|
|
@ -0,0 +1,396 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Project Management Script for ML Experiment Manager
|
||||||
|
# Provides unified interface for managing all components
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Colors
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
PURPLE='\033[0;35m'
|
||||||
|
CYAN='\033[0;36m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
print_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_header() {
|
||||||
|
echo -e "${PURPLE}$1${NC}"
|
||||||
|
}
|
||||||
|
|
||||||
|
print_app() {
|
||||||
|
echo -e "${CYAN}$1${NC}"
|
||||||
|
}
|
||||||
|
|
||||||
|
show_status() {
|
||||||
|
print_header "ML Experiment Manager Status"
|
||||||
|
echo "=================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check Go apps
|
||||||
|
print_app "Go Applications:"
|
||||||
|
local go_apps=("api-server" "worker" "tui" "data_manager" "user_manager")
|
||||||
|
for app in "${go_apps[@]}"; do
|
||||||
|
if [[ -f "bin/$app" ]]; then
|
||||||
|
echo " ✅ $app: Built"
|
||||||
|
else
|
||||||
|
echo " ❌ $app: Not built"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check Zig CLI
|
||||||
|
print_app "Zig CLI:"
|
||||||
|
if [[ -f "cli/zig-out/bin/ml" ]]; then
|
||||||
|
echo " ✅ CLI: Built"
|
||||||
|
else
|
||||||
|
echo " ❌ CLI: Not built"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check services
|
||||||
|
print_app "Services:"
|
||||||
|
if command -v redis-cli &> /dev/null; then
|
||||||
|
if redis-cli ping | grep -q "PONG"; then
|
||||||
|
echo " ✅ Redis: Running"
|
||||||
|
else
|
||||||
|
echo " ⚠️ Redis: Not running"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo " ❌ Redis: Not installed"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if command -v docker &> /dev/null; then
|
||||||
|
echo " ✅ Docker: Available"
|
||||||
|
else
|
||||||
|
echo " ❌ Docker: Not installed"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check configuration
|
||||||
|
print_app "Configuration:"
|
||||||
|
if [[ -f "configs/config-local.yaml" ]]; then
|
||||||
|
echo " ✅ Security config: Found"
|
||||||
|
else
|
||||||
|
echo " ⚠️ Security config: Not found"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -f ".env.dev" ]]; then
|
||||||
|
echo " ✅ Environment: Found"
|
||||||
|
else
|
||||||
|
echo " ⚠️ Environment: Not found"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -f "ssl/cert.pem" && -f "ssl/key.pem" ]]; then
|
||||||
|
echo " ✅ SSL certificates: Found"
|
||||||
|
else
|
||||||
|
echo " ⚠️ SSL certificates: Not found"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
}
|
||||||
|
|
||||||
|
build_all() {
|
||||||
|
print_header "Building All Components"
|
||||||
|
echo "============================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
print_info "Building Go applications..."
|
||||||
|
make build
|
||||||
|
|
||||||
|
if command -v zig &> /dev/null; then
|
||||||
|
print_info "Building Zig CLI..."
|
||||||
|
make cli-build
|
||||||
|
else
|
||||||
|
print_warning "Zig not found, skipping CLI build"
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Build completed!"
|
||||||
|
}
|
||||||
|
|
||||||
|
test_all() {
|
||||||
|
print_header "Running All Tests"
|
||||||
|
echo "===================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
print_info "Running main test suite..."
|
||||||
|
make test
|
||||||
|
|
||||||
|
print_info "Running comprehensive tests..."
|
||||||
|
make test-all
|
||||||
|
|
||||||
|
print_success "All tests completed!"
|
||||||
|
}
|
||||||
|
|
||||||
|
start_services() {
|
||||||
|
print_header "Starting Services"
|
||||||
|
echo "==================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Start Redis if available
|
||||||
|
if command -v redis-server &> /dev/null; then
|
||||||
|
if ! pgrep -f "redis-server" > /dev/null; then
|
||||||
|
print_info "Starting Redis..."
|
||||||
|
redis-server --daemonize yes --port 6379
|
||||||
|
print_success "Redis started"
|
||||||
|
else
|
||||||
|
print_info "Redis already running"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Start API server if built
|
||||||
|
if [[ -f "bin/api-server" ]]; then
|
||||||
|
print_info "Starting API server..."
|
||||||
|
if [[ -f "configs/config-local.yaml" ]]; then
|
||||||
|
./bin/api-server --config configs/config-local.yaml &
|
||||||
|
else
|
||||||
|
print_warning "No config found, using defaults"
|
||||||
|
./bin/api-server &
|
||||||
|
fi
|
||||||
|
print_success "API server started (PID: $!)"
|
||||||
|
else
|
||||||
|
print_error "API server not built. Run 'make build' first."
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "Services started!"
|
||||||
|
}
|
||||||
|
|
||||||
|
check_health() {
|
||||||
|
print_header "API Health Check"
|
||||||
|
echo "=================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
print_info "Checking if API port is open..."
|
||||||
|
|
||||||
|
# First check if port 9101 is open
|
||||||
|
if ! nc -z localhost 9101 2>/dev/null; then
|
||||||
|
print_error "API port 9101 not open - is it running?"
|
||||||
|
print_info "Start with: ./tools/manage.sh start"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_info "Port 9101 is open, checking API health endpoint..."
|
||||||
|
|
||||||
|
# Try the health endpoint
|
||||||
|
response=$(curl -k -s --max-time 3 -H 'X-API-Key: password' -H 'X-Forwarded-For: 127.0.0.1' https://localhost:9101/health 2>/dev/null)
|
||||||
|
|
||||||
|
if [[ "$response" == "OK" ]]; then
|
||||||
|
print_success "API is healthy: $response"
|
||||||
|
elif [[ "$response" == *"IP not whitelisted"* ]]; then
|
||||||
|
print_warning "API running but IP not whitelisted (expected behavior)"
|
||||||
|
print_info "Try: curl -k -H 'X-API-Key: password' -H 'X-Forwarded-For: 127.0.0.1' https://localhost:9101/health"
|
||||||
|
else
|
||||||
|
print_error "Unexpected response: $response"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
stop_services() {
|
||||||
|
print_header "Stopping Services"
|
||||||
|
echo "=================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Stop API server
|
||||||
|
if pgrep -f "api-server" > /dev/null; then
|
||||||
|
print_info "Stopping API server..."
|
||||||
|
pkill -f "api-server"
|
||||||
|
print_success "API server stopped"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Stop Redis
|
||||||
|
if command -v redis-cli &> /dev/null; then
|
||||||
|
print_info "Stopping Redis..."
|
||||||
|
redis-cli shutdown 2>/dev/null || true
|
||||||
|
print_success "Redis stopped"
|
||||||
|
fi
|
||||||
|
|
||||||
|
print_success "All services stopped!"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_security() {
|
||||||
|
print_header "Security Management"
|
||||||
|
echo "===================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
case "${1:-check}" in
|
||||||
|
"check")
|
||||||
|
print_info "Running security checks..."
|
||||||
|
make security-check
|
||||||
|
;;
|
||||||
|
"monitor")
|
||||||
|
print_info "Starting security monitoring..."
|
||||||
|
make security-monitor
|
||||||
|
;;
|
||||||
|
"deploy")
|
||||||
|
print_info "Deploying with security..."
|
||||||
|
make security-deploy
|
||||||
|
;;
|
||||||
|
"audit")
|
||||||
|
print_info "Running security audit..."
|
||||||
|
make security-audit
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "Usage: $0 security {check|monitor|deploy|audit}"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
run_development() {
|
||||||
|
print_header "Development Environment"
|
||||||
|
echo "========================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
case "${1:-setup}" in
|
||||||
|
"setup")
|
||||||
|
print_info "Setting up development environment..."
|
||||||
|
./scripts/auto_setup.sh
|
||||||
|
;;
|
||||||
|
"quick")
|
||||||
|
print_info "Running quick start..."
|
||||||
|
./scripts/quick_start.sh
|
||||||
|
;;
|
||||||
|
"deps")
|
||||||
|
print_info "Installing dependencies..."
|
||||||
|
make install-deps
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "Usage: $0 dev {setup|quick|deps}"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
show_logs() {
|
||||||
|
print_header "Application Logs"
|
||||||
|
echo "=================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Show application logs
|
||||||
|
if [[ -f "logs/fetch_ml.log" ]]; then
|
||||||
|
print_app "Application Log:"
|
||||||
|
tail -20 logs/fetch_ml.log
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -f "logs/audit.log" ]]; then
|
||||||
|
print_app "Security Log:"
|
||||||
|
tail -20 logs/audit.log
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Show Docker logs if running
|
||||||
|
if command -v docker &> /dev/null; then
|
||||||
|
local containers=$(docker ps --format "table {{.Names}}" | grep "ml-experiment" || true)
|
||||||
|
if [[ -n "$containers" ]]; then
|
||||||
|
print_app "Docker Logs:"
|
||||||
|
docker logs --tail=20 $(echo "$containers" | tail -1) 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
print_header "Cleanup Project"
|
||||||
|
echo "================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
print_info "Cleaning project artifacts..."
|
||||||
|
make clean-all
|
||||||
|
|
||||||
|
print_info "Stopping services..."
|
||||||
|
stop_services
|
||||||
|
|
||||||
|
print_success "Cleanup completed!"
|
||||||
|
}
|
||||||
|
|
||||||
|
show_help() {
|
||||||
|
print_header "Project Management Script"
|
||||||
|
echo "==========================="
|
||||||
|
echo ""
|
||||||
|
echo "Usage: ./tools/manage.sh {status|build|test|start|stop|health|security|dev|logs|cleanup|help}"
|
||||||
|
echo ""
|
||||||
|
echo "Commands:"
|
||||||
|
echo " status - Show project status"
|
||||||
|
echo " build - Build all components"
|
||||||
|
echo " test - Run all tests"
|
||||||
|
echo " start - Start all services"
|
||||||
|
echo " stop - Stop all services"
|
||||||
|
echo " health - Check API health endpoint"
|
||||||
|
echo " security - Security management (check|monitor|deploy|audit)"
|
||||||
|
echo " dev - Development environment (setup|quick|deps)"
|
||||||
|
echo " logs - Show application logs"
|
||||||
|
echo " cleanup - Clean project artifacts and stop services"
|
||||||
|
echo " help - Show this help"
|
||||||
|
echo ""
|
||||||
|
echo "Examples:"
|
||||||
|
echo " $0 status # Show current status"
|
||||||
|
echo " $0 health # Check API health"
|
||||||
|
echo " $0 build && $0 test # Build and test everything"
|
||||||
|
echo " $0 start # Start all services"
|
||||||
|
echo " $0 security monitor # Start security monitoring"
|
||||||
|
echo " $0 dev setup # Setup development environment"
|
||||||
|
echo ""
|
||||||
|
echo "Quick Start:"
|
||||||
|
echo " $0 dev setup && $0 start && $0 status"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main function
|
||||||
|
main() {
|
||||||
|
case "${1:-help}" in
|
||||||
|
"status")
|
||||||
|
show_status
|
||||||
|
;;
|
||||||
|
"build")
|
||||||
|
build_all
|
||||||
|
;;
|
||||||
|
"test")
|
||||||
|
test_all
|
||||||
|
;;
|
||||||
|
"start")
|
||||||
|
start_services
|
||||||
|
;;
|
||||||
|
"stop")
|
||||||
|
stop_services
|
||||||
|
;;
|
||||||
|
"health")
|
||||||
|
check_health
|
||||||
|
;;
|
||||||
|
"security")
|
||||||
|
run_security "${2:-check}"
|
||||||
|
;;
|
||||||
|
"dev")
|
||||||
|
run_development "${2:-setup}"
|
||||||
|
;;
|
||||||
|
"logs")
|
||||||
|
show_logs
|
||||||
|
;;
|
||||||
|
"cleanup")
|
||||||
|
cleanup
|
||||||
|
;;
|
||||||
|
"help"|"-h"|"--help")
|
||||||
|
show_help
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
print_error "Unknown command: $1"
|
||||||
|
echo "Use '$0 help' for usage information"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
# Run main function
|
||||||
|
main "$@"
|
||||||
Loading…
Reference in a new issue