feat: add comprehensive setup scripts and management tools
- Add production setup scripts for automated deployment - Include monitoring setup and configuration validation - Add legacy setup scripts for various Linux distributions - Implement Bitwarden integration for secure credential management - Add development and production environment setup - Include comprehensive management tools and utilities - Add shell script library with common functions Provides complete automation for setup, deployment, and management of FetchML platform in development and production environments.
This commit is contained in:
parent
385d2cf386
commit
bb25743b0f
15 changed files with 3414 additions and 0 deletions
94
scripts/README.md
Normal file
94
scripts/README.md
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# Scripts Directory
|
||||
|
||||
This directory contains setup and utility scripts for FetchML.
|
||||
|
||||
## Production Scripts
|
||||
|
||||
### `setup-prod.sh`
|
||||
**Purpose**: Automated production setup for Rocky Linux bare metal deployment
|
||||
**Usage**: `sudo ./scripts/setup-prod.sh [base_path] [user] [group]`
|
||||
**What it does**:
|
||||
- Creates system user and groups
|
||||
- Sets up directory structure (`/data/ml-experiments/*`)
|
||||
- Installs dependencies (Go, Podman, Redis)
|
||||
- Configures GPU support for Podman
|
||||
- Creates systemd service files
|
||||
- Sets up log rotation
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
sudo ./scripts/setup-prod.sh /data/ml-experiments ml-user ml-group
|
||||
```
|
||||
|
||||
### `validate-prod-config.sh`
|
||||
**Purpose**: Validates production configuration files
|
||||
**Usage**: `./scripts/validate-prod-config.sh [api-config] [worker-config]`
|
||||
**What it does**:
|
||||
- Checks config file syntax
|
||||
- Verifies base_path consistency
|
||||
- Tests Redis connectivity
|
||||
- Validates Podman setup
|
||||
- Checks directory permissions
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
./scripts/validate-prod-config.sh configs/config-prod.yaml configs/worker-prod.toml
|
||||
```
|
||||
|
||||
## Legacy Setup Scripts (Deprecated)
|
||||
|
||||
The following scripts are from earlier iterations and are **deprecated** in favor of `setup-prod.sh`:
|
||||
|
||||
- `setup_rocky.sh` - Use `setup-prod.sh` instead
|
||||
- `setup_ubuntu.sh` - Ubuntu support (not primary target)
|
||||
- `auto_setup.sh` - Old automated setup (superseded)
|
||||
- `setup_common.sh` - Common functions (integrated into setup-prod.sh)
|
||||
- `quick_start.sh` - Quick dev setup (use docker-compose on macOS instead)
|
||||
- `test_tools.sh` - Tool testing (integrated into validate-prod-config.sh)
|
||||
|
||||
### Cleanup Recommendation
|
||||
These legacy scripts can be removed or archived. The current production setup only needs:
|
||||
- `setup-prod.sh`
|
||||
- `validate-prod-config.sh`
|
||||
|
||||
## Usage Workflow
|
||||
|
||||
### First-Time Production Setup
|
||||
```bash
|
||||
# 1. Run production setup
|
||||
sudo ./scripts/setup-prod.sh
|
||||
|
||||
# 2. Copy and configure
|
||||
sudo cp configs/config-prod.yaml /etc/fetch_ml/config.yaml
|
||||
sudo cp configs/worker-prod.toml /etc/fetch_ml/worker.toml
|
||||
sudo vim /etc/fetch_ml/config.yaml # Update API keys, etc.
|
||||
|
||||
# 3. Build and install
|
||||
make prod
|
||||
sudo make install
|
||||
|
||||
# 4. Validate
|
||||
./scripts/validate-prod-config.sh /etc/fetch_ml/config.yaml /etc/fetch_ml/worker.toml
|
||||
|
||||
# 5. Start services
|
||||
sudo systemctl start fetchml-api fetchml-worker
|
||||
sudo systemctl enable fetchml-api fetchml-worker
|
||||
```
|
||||
|
||||
### Development Setup (macOS)
|
||||
```bash
|
||||
# Use docker-compose for local development
|
||||
docker-compose up -d
|
||||
|
||||
# Or run components directly
|
||||
make dev
|
||||
./bin/api-server -config configs/config-local.yaml
|
||||
```
|
||||
|
||||
## Script Maintenance
|
||||
|
||||
When adding new scripts:
|
||||
1. Add executable permission: `chmod +x scripts/new-script.sh`
|
||||
2. Add header comment with purpose and usage
|
||||
3. Update this README
|
||||
4. Use consistent error handling and logging
|
||||
49
scripts/create_bitwarden_fetchml_item.sh
Normal file
49
scripts/create_bitwarden_fetchml_item.sh
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Create a Bitwarden item for a FetchML API user.
|
||||
#
|
||||
# Usage:
|
||||
# ./scripts/create_bitwarden_fetchml_item.sh <username> <api_key> <api_key_hash>
|
||||
#
|
||||
# Requirements:
|
||||
# - Bitwarden CLI (bw) installed
|
||||
# - You are logged in and unlocked (bw login; bw unlock)
|
||||
# - jq installed
|
||||
#
|
||||
# This script does NOT run on the homelab server. Run it from your
|
||||
# own machine where you manage Bitwarden.
|
||||
|
||||
if [[ $# -ne 3 ]]; then
|
||||
echo "Usage: $0 <username> <api_key> <api_key_hash>" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
USER_NAME="$1"
|
||||
API_KEY="$2"
|
||||
API_KEY_HASH="$3"
|
||||
|
||||
ITEM_NAME="FetchML API $USER_NAME"
|
||||
|
||||
# Get base item template
|
||||
TEMPLATE_JSON=$(bw get template item)
|
||||
|
||||
# Build item JSON with jq
|
||||
ITEM_JSON=$(echo "$TEMPLATE_JSON" | jq \
|
||||
--arg name "$ITEM_NAME" \
|
||||
--arg username "$USER_NAME" \
|
||||
--arg password "$API_KEY" \
|
||||
--arg hash "$API_KEY_HASH" \
|
||||
'.name = $name
|
||||
| .login.username = $username
|
||||
| .login.password = $password
|
||||
| .notes = "FetchML API key for user " + $username
|
||||
| .fields = [{"name":"api_key_hash","value":$hash,"type":1}]')
|
||||
|
||||
# Create item in Bitwarden
|
||||
# If you ever want to edit instead, you can capture the ID from this call
|
||||
# and use: bw edit item <id> <json>
|
||||
|
||||
echo "$ITEM_JSON" | bw encode | bw create item
|
||||
|
||||
echo "Created Bitwarden item: $ITEM_NAME"
|
||||
455
scripts/legacy/auto_setup.sh
Executable file
455
scripts/legacy/auto_setup.sh
Executable file
|
|
@ -0,0 +1,455 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Automatic Setup Script for ML Experiment Manager
|
||||
# Handles complete environment setup with security features
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
print_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
print_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
print_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
print_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
detect_os() {
|
||||
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||
echo "macos"
|
||||
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
|
||||
echo "linux"
|
||||
else
|
||||
echo "unknown"
|
||||
fi
|
||||
}
|
||||
|
||||
install_go() {
|
||||
print_info "Installing Go..."
|
||||
|
||||
local os=$(detect_os)
|
||||
local go_version="1.23.0"
|
||||
|
||||
if [[ "$os" == "macos" ]]; then
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install go
|
||||
else
|
||||
print_error "Homebrew not found. Please install Go manually."
|
||||
return 1
|
||||
fi
|
||||
elif [[ "$os" == "linux" ]]; then
|
||||
wget -q "https://go.dev/dl/go${go_version}.linux-amd64.tar.gz"
|
||||
sudo rm -rf /usr/local/go
|
||||
sudo tar -C /usr/local -xzf "go${go_version}.linux-amd64.tar.gz"
|
||||
rm "go${go_version}.linux-amd64.tar.gz"
|
||||
|
||||
# Add to PATH
|
||||
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
fi
|
||||
|
||||
print_success "Go installed"
|
||||
}
|
||||
|
||||
install_zig() {
|
||||
print_info "Installing Zig..."
|
||||
|
||||
local os=$(detect_os)
|
||||
|
||||
if [[ "$os" == "macos" ]]; then
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install zig
|
||||
else
|
||||
print_error "Homebrew not found. Please install Zig manually."
|
||||
return 1
|
||||
fi
|
||||
elif [[ "$os" == "linux" ]]; then
|
||||
# Download Zig binary
|
||||
local zig_version="0.13.0"
|
||||
wget -q "https://ziglang.org/download/${zig_version}/zig-linux-x86_64-${zig_version}.tar.xz"
|
||||
tar -xf "zig-linux-x86_64-${zig_version}.tar.xz"
|
||||
sudo mv "zig-linux-x86_64-${zig_version}/zig" /usr/local/bin/
|
||||
rm -rf "zig-linux-x86_64-${zig_version}.tar.xz" "zig-linux-x86_64-${zig_version}"
|
||||
fi
|
||||
|
||||
print_success "Zig installed"
|
||||
}
|
||||
|
||||
install_docker() {
|
||||
print_info "Installing Docker..."
|
||||
|
||||
local os=$(detect_os)
|
||||
|
||||
if [[ "$os" == "macos" ]]; then
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install --cask docker
|
||||
print_warning "Docker Desktop installed. Please start it manually."
|
||||
else
|
||||
print_error "Homebrew not found. Please install Docker manually."
|
||||
return 1
|
||||
fi
|
||||
elif [[ "$os" == "linux" ]]; then
|
||||
# Install Docker using official script
|
||||
curl -fsSL https://get.docker.com -o get-docker.sh
|
||||
sudo sh get-docker.sh
|
||||
sudo usermod -aG docker $USER
|
||||
rm get-docker.sh
|
||||
|
||||
# Start Docker
|
||||
sudo systemctl enable docker
|
||||
sudo systemctl start docker
|
||||
|
||||
print_success "Docker installed. You may need to log out and log back in."
|
||||
fi
|
||||
}
|
||||
|
||||
install_redis() {
|
||||
print_info "Installing Redis..."
|
||||
|
||||
local os=$(detect_os)
|
||||
|
||||
if [[ "$os" == "macos" ]]; then
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install redis
|
||||
brew services start redis
|
||||
else
|
||||
print_error "Homebrew not found. Please install Redis manually."
|
||||
return 1
|
||||
fi
|
||||
elif [[ "$os" == "linux" ]]; then
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y redis-server
|
||||
sudo systemctl enable redis-server
|
||||
sudo systemctl start redis-server
|
||||
fi
|
||||
|
||||
print_success "Redis installed and started"
|
||||
}
|
||||
|
||||
install_dependencies() {
|
||||
print_info "Installing dependencies..."
|
||||
|
||||
local os=$(detect_os)
|
||||
|
||||
# Install basic tools
|
||||
if [[ "$os" == "macos" ]]; then
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install openssl curl jq
|
||||
fi
|
||||
elif [[ "$os" == "linux" ]]; then
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y openssl curl jq build-essential
|
||||
fi
|
||||
|
||||
# Install Go tools
|
||||
if command -v go &> /dev/null; then
|
||||
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
|
||||
go install golang.org/x/tools/cmd/goimports@latest
|
||||
fi
|
||||
|
||||
print_success "Dependencies installed"
|
||||
}
|
||||
|
||||
setup_project() {
|
||||
print_info "Setting up project..."
|
||||
|
||||
# Create directories
|
||||
mkdir -p bin
|
||||
mkdir -p data
|
||||
mkdir -p logs
|
||||
mkdir -p db
|
||||
mkdir -p ssl
|
||||
mkdir -p configs
|
||||
mkdir -p scripts
|
||||
|
||||
# Build project
|
||||
if command -v make &> /dev/null; then
|
||||
make build
|
||||
if command -v zig &> /dev/null; then
|
||||
make cli-build
|
||||
fi
|
||||
else
|
||||
print_warning "Make not found, building manually..."
|
||||
go build -o bin/worker ./cmd/worker
|
||||
go build -o bin/tui ./cmd/tui
|
||||
go build -o bin/data_manager ./cmd/data_manager
|
||||
go build -o bin/user_manager ./cmd/user_manager
|
||||
go build -o bin/api-server ./cmd/api-server
|
||||
|
||||
if command -v zig &> /dev/null; then
|
||||
cd cli && zig build && cd ..
|
||||
fi
|
||||
fi
|
||||
|
||||
print_success "Project setup completed"
|
||||
}
|
||||
|
||||
setup_security() {
|
||||
print_info "Setting up security features..."
|
||||
|
||||
# Generate SSL certificates
|
||||
if command -v openssl &> /dev/null; then
|
||||
openssl req -x509 -newkey rsa:4096 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=localhost" \
|
||||
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || {
|
||||
print_warning "Failed to generate SSL certificates"
|
||||
}
|
||||
print_success "SSL certificates generated"
|
||||
fi
|
||||
|
||||
# Generate secure configuration
|
||||
local redis_password=$(openssl rand -base64 32 2>/dev/null || echo "dev_redis_password_123")
|
||||
local jwt_secret=$(openssl rand -base64 64 2>/dev/null || echo "dev_jwt_secret_1234567890123456789012345678901234567890123456789012345678901234")
|
||||
|
||||
cat > configs/security-config.yaml << EOF
|
||||
base_path: "/data/ml-experiments"
|
||||
|
||||
auth:
|
||||
enabled: true
|
||||
api_keys:
|
||||
test_user:
|
||||
hash: "$(echo -n "dev_test_api_key_12345" | sha256sum | cut -d' ' -f1)"
|
||||
admin: true
|
||||
roles: ["data_scientist", "admin"]
|
||||
permissions:
|
||||
read: true
|
||||
write: true
|
||||
delete: true
|
||||
|
||||
server:
|
||||
address: ":9101"
|
||||
tls:
|
||||
enabled: true
|
||||
cert_file: "./ssl/cert.pem"
|
||||
key_file: "./ssl/key.pem"
|
||||
min_version: "1.3"
|
||||
|
||||
security:
|
||||
rate_limit:
|
||||
enabled: true
|
||||
requests_per_minute: 60
|
||||
burst_size: 10
|
||||
ip_whitelist:
|
||||
- "127.0.0.1"
|
||||
- "::1"
|
||||
- "10.0.0.0/8"
|
||||
- "192.168.0.0/16"
|
||||
- "172.16.0.0/12"
|
||||
failed_login_lockout:
|
||||
enabled: true
|
||||
max_attempts: 5
|
||||
lockout_duration: "15m"
|
||||
|
||||
redis:
|
||||
url: "redis://localhost:6379"
|
||||
password: "${redis_password}"
|
||||
|
||||
logging:
|
||||
level: "info"
|
||||
file: "logs/fetch_ml.log"
|
||||
audit_log: "logs/audit.log"
|
||||
EOF
|
||||
|
||||
cat > .env.dev << EOF
|
||||
# Development environment variables
|
||||
REDIS_PASSWORD=${redis_password}
|
||||
JWT_SECRET=${jwt_secret}
|
||||
GRAFANA_USER=admin
|
||||
GRAFANA_PASSWORD=$(openssl rand -base64 16 2>/dev/null || echo "dev_grafana_password")
|
||||
EOF
|
||||
|
||||
print_success "Security configuration created"
|
||||
}
|
||||
|
||||
test_installation() {
|
||||
print_info "Testing installation..."
|
||||
|
||||
local tests_passed=0
|
||||
local tests_total=0
|
||||
|
||||
# Test Go
|
||||
tests_total=$((tests_total + 1))
|
||||
if command -v go &> /dev/null; then
|
||||
print_success "Go: Installed"
|
||||
tests_passed=$((tests_passed + 1))
|
||||
else
|
||||
print_error "Go: Not found"
|
||||
fi
|
||||
|
||||
# Test Zig
|
||||
tests_total=$((tests_total + 1))
|
||||
if command -v zig &> /dev/null; then
|
||||
print_success "Zig: Installed"
|
||||
tests_passed=$((tests_passed + 1))
|
||||
else
|
||||
print_warning "Zig: Not found (optional)"
|
||||
tests_total=$((tests_total - 1))
|
||||
fi
|
||||
|
||||
# Test Docker
|
||||
tests_total=$((tests_total + 1))
|
||||
if command -v docker &> /dev/null; then
|
||||
print_success "Docker: Installed"
|
||||
tests_passed=$((tests_passed + 1))
|
||||
else
|
||||
print_warning "Docker: Not found (optional)"
|
||||
tests_total=$((tests_total - 1))
|
||||
fi
|
||||
|
||||
# Test Redis
|
||||
tests_total=$((tests_total + 1))
|
||||
if command -v redis-cli &> /dev/null; then
|
||||
if redis-cli ping | grep -q "PONG"; then
|
||||
print_success "Redis: Running"
|
||||
tests_passed=$((tests_passed + 1))
|
||||
else
|
||||
print_warning "Redis: Not running"
|
||||
fi
|
||||
else
|
||||
print_warning "Redis: Not found (optional)"
|
||||
tests_total=$((tests_total - 1))
|
||||
fi
|
||||
|
||||
# Test binaries
|
||||
if [[ -f "bin/api-server" ]]; then
|
||||
tests_total=$((tests_total + 1))
|
||||
if ./bin/api-server --help > /dev/null 2>&1; then
|
||||
print_success "API Server: Built"
|
||||
tests_passed=$((tests_passed + 1))
|
||||
else
|
||||
print_error "API Server: Build failed"
|
||||
fi
|
||||
fi
|
||||
|
||||
if [[ $tests_total -gt 0 ]]; then
|
||||
local success_rate=$((tests_passed * 100 / tests_total))
|
||||
print_info "Tests: $tests_passed/$tests_total passed ($success_rate%)"
|
||||
fi
|
||||
|
||||
print_success "Installation testing completed"
|
||||
}
|
||||
|
||||
show_next_steps() {
|
||||
print_success "Automatic setup completed!"
|
||||
echo
|
||||
echo "Next Steps:"
|
||||
echo "==========="
|
||||
echo ""
|
||||
echo "1. Load environment variables:"
|
||||
echo " source .env.dev"
|
||||
echo ""
|
||||
echo "2. Start the API server:"
|
||||
echo " ./bin/api-server -config configs/config.yaml"
|
||||
echo ""
|
||||
echo "3. Test the Zig CLI (if installed):"
|
||||
echo " ./cli/zig-out/bin/ml --help"
|
||||
echo ""
|
||||
echo "4. Deploy with Docker (optional):"
|
||||
echo " make docker-run"
|
||||
echo ""
|
||||
echo "5. Docker Compose deployment:"
|
||||
echo " docker-compose up -d"
|
||||
echo ""
|
||||
echo "Configuration Files:"
|
||||
echo " configs/config.yaml # Main configuration"
|
||||
echo " configs/config_local.yaml # Local development"
|
||||
echo " ssl/cert.pem, ssl/key.pem # TLS certificates"
|
||||
echo ""
|
||||
echo "Documentation:"
|
||||
echo " docs/DEPLOYMENT.md # Deployment guide"
|
||||
echo ""
|
||||
echo "Quick Commands:"
|
||||
echo " make help # Show all commands"
|
||||
echo " make test # Run tests"
|
||||
echo " docker-compose up -d # Start services"
|
||||
echo ""
|
||||
print_success "Ready to use ML Experiment Manager!"
|
||||
}
|
||||
|
||||
# Main setup function
|
||||
main() {
|
||||
echo "ML Experiment Manager Automatic Setup"
|
||||
echo "====================================="
|
||||
echo ""
|
||||
|
||||
print_info "Starting automatic setup..."
|
||||
echo ""
|
||||
|
||||
# Check and install dependencies
|
||||
if ! command -v go &> /dev/null; then
|
||||
print_info "Go not found, installing..."
|
||||
install_go
|
||||
fi
|
||||
|
||||
if ! command -v zig &> /dev/null; then
|
||||
print_info "Zig not found, installing..."
|
||||
install_zig
|
||||
fi
|
||||
|
||||
if ! command -v docker &> /dev/null; then
|
||||
print_info "Docker not found, installing..."
|
||||
install_docker
|
||||
fi
|
||||
|
||||
if ! command -v redis-cli &> /dev/null; then
|
||||
print_info "Redis not found, installing..."
|
||||
install_redis
|
||||
fi
|
||||
|
||||
# Install additional dependencies
|
||||
install_dependencies
|
||||
|
||||
# Setup project
|
||||
setup_project
|
||||
|
||||
# Setup security
|
||||
setup_security
|
||||
|
||||
# Test installation
|
||||
test_installation
|
||||
|
||||
# Show next steps
|
||||
show_next_steps
|
||||
}
|
||||
|
||||
# Handle command line arguments
|
||||
case "${1:-setup}" in
|
||||
"setup")
|
||||
main
|
||||
;;
|
||||
"deps")
|
||||
install_dependencies
|
||||
;;
|
||||
"test")
|
||||
test_installation
|
||||
;;
|
||||
"help"|"-h"|"--help")
|
||||
echo "Automatic Setup Script"
|
||||
echo "Usage: $0 {setup|deps|test|help}"
|
||||
echo ""
|
||||
echo "Commands:"
|
||||
echo " setup - Run full automatic setup"
|
||||
echo " deps - Install dependencies only"
|
||||
echo " test - Test installation"
|
||||
echo " help - Show this help"
|
||||
;;
|
||||
*)
|
||||
print_error "Unknown command: $1"
|
||||
echo "Use '$0 help' for usage information"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
314
scripts/legacy/quick_start.sh
Executable file
314
scripts/legacy/quick_start.sh
Executable file
|
|
@ -0,0 +1,314 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Fetch ML Quick Start Script with Security
|
||||
# Sets up development environment with security features and creates test user
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
print_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
print_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
print_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
print_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
check_prerequisites() {
|
||||
print_info "Checking prerequisites..."
|
||||
|
||||
# Check Go
|
||||
if ! command -v go &> /dev/null; then
|
||||
print_error "Go is not installed. Please install Go 1.25 or later."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||
print_info "Go version: $go_version"
|
||||
|
||||
# Check Zig
|
||||
if ! command -v zig &> /dev/null; then
|
||||
print_warning "Zig is not installed. CLI features will not be available."
|
||||
else
|
||||
local zig_version=$(zig version)
|
||||
print_info "Zig version: $zig_version"
|
||||
fi
|
||||
|
||||
# Check Docker
|
||||
if ! command -v docker &> /dev/null; then
|
||||
print_warning "Docker is not installed. Container features will not work."
|
||||
fi
|
||||
|
||||
# Check Redis
|
||||
if ! command -v redis-server &> /dev/null && ! command -v redis-cli &> /dev/null; then
|
||||
print_warning "Redis is not installed. Starting local Redis..."
|
||||
fi
|
||||
|
||||
# Check OpenSSL for certificates
|
||||
if ! command -v openssl &> /dev/null; then
|
||||
print_warning "OpenSSL is not installed. TLS certificates will not be generated."
|
||||
fi
|
||||
|
||||
print_success "Prerequisites checked"
|
||||
}
|
||||
|
||||
setup_project() {
|
||||
print_info "Setting up Fetch ML project..."
|
||||
|
||||
# Create directories
|
||||
mkdir -p bin
|
||||
mkdir -p data
|
||||
mkdir -p logs
|
||||
mkdir -p db
|
||||
mkdir -p ssl
|
||||
mkdir -p configs
|
||||
|
||||
print_success "Project directories created"
|
||||
}
|
||||
|
||||
build_project() {
|
||||
print_info "Building Fetch ML..."
|
||||
|
||||
# Build Go binaries
|
||||
make build
|
||||
|
||||
# Build Zig CLI if available
|
||||
if command -v zig &> /dev/null; then
|
||||
make cli-build
|
||||
print_success "Zig CLI built"
|
||||
fi
|
||||
|
||||
print_success "Build completed"
|
||||
}
|
||||
|
||||
generate_ssl_certificates() {
|
||||
print_info "Generating SSL certificates..."
|
||||
|
||||
if command -v openssl &> /dev/null; then
|
||||
# Generate self-signed certificate for development
|
||||
openssl req -x509 -newkey rsa:4096 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Organization/CN=localhost" \
|
||||
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || {
|
||||
print_warning "Failed to generate SSL certificates"
|
||||
return 1
|
||||
}
|
||||
|
||||
print_success "SSL certificates generated in ssl/"
|
||||
print_info "Certificates are self-signed (development only)"
|
||||
else
|
||||
print_warning "OpenSSL not available, skipping SSL certificates"
|
||||
fi
|
||||
}
|
||||
|
||||
setup_redis() {
|
||||
print_info "Setting up Redis..."
|
||||
|
||||
if command -v redis-server &> /dev/null; then
|
||||
if ! pgrep -f "redis-server" > /dev/null; then
|
||||
redis-server --daemonize yes --port 6379
|
||||
print_success "Redis started"
|
||||
else
|
||||
print_info "Redis already running"
|
||||
fi
|
||||
else
|
||||
print_warning "Redis not available, some features may be limited"
|
||||
fi
|
||||
}
|
||||
|
||||
create_secure_config() {
|
||||
print_info "Creating secure development configuration..."
|
||||
|
||||
# Generate secure passwords and secrets
|
||||
local redis_password=$(openssl rand -base64 32 2>/dev/null || echo "dev_redis_password_123")
|
||||
local jwt_secret=$(openssl rand -base64 64 2>/dev/null || echo "dev_jwt_secret_1234567890123456789012345678901234567890123456789012345678901234")
|
||||
|
||||
# Create development config
|
||||
cat > configs/config.yaml << EOF
|
||||
base_path: "/data/ml-experiments"
|
||||
|
||||
auth:
|
||||
enabled: true
|
||||
api_keys:
|
||||
test_user:
|
||||
hash: "$(echo -n "dev_test_api_key_12345" | sha256sum | cut -d' ' -f1)"
|
||||
admin: true
|
||||
roles: ["data_scientist", "admin"]
|
||||
permissions:
|
||||
read: true
|
||||
write: true
|
||||
delete: true
|
||||
|
||||
server:
|
||||
address: ":9101"
|
||||
tls:
|
||||
enabled: true
|
||||
cert_file: "./ssl/cert.pem"
|
||||
key_file: "./ssl/key.pem"
|
||||
min_version: "1.3"
|
||||
|
||||
security:
|
||||
rate_limit:
|
||||
enabled: true
|
||||
requests_per_minute: 60
|
||||
burst_size: 10
|
||||
ip_whitelist:
|
||||
- "127.0.0.1"
|
||||
- "::1"
|
||||
- "10.0.0.0/8"
|
||||
- "192.168.0.0/16"
|
||||
- "172.16.0.0/12"
|
||||
failed_login_lockout:
|
||||
enabled: true
|
||||
max_attempts: 5
|
||||
lockout_duration: "15m"
|
||||
|
||||
redis:
|
||||
url: "redis://localhost:6379"
|
||||
password: "${redis_password}"
|
||||
|
||||
logging:
|
||||
level: "info"
|
||||
file: "logs/fetch_ml.log"
|
||||
audit_log: "logs/audit.log"
|
||||
EOF
|
||||
|
||||
# Create environment file
|
||||
cat > .env.dev << EOF
|
||||
# Development environment variables
|
||||
REDIS_PASSWORD=${redis_password}
|
||||
JWT_SECRET=${jwt_secret}
|
||||
GRAFANA_USER=admin
|
||||
GRAFANA_PASSWORD=$(openssl rand -base64 16 2>/dev/null || echo "dev_grafana_password")
|
||||
EOF
|
||||
|
||||
print_success "Secure configuration created"
|
||||
print_warning "Using development certificates and passwords"
|
||||
}
|
||||
|
||||
create_test_user() {
|
||||
print_info "Creating test user..."
|
||||
|
||||
# Generate API key for test user
|
||||
local api_key="dev_test_api_key_12345"
|
||||
local api_key_hash=$(echo -n "$api_key" | sha256sum | cut -d' ' -f1)
|
||||
|
||||
print_success "Test user created successfully"
|
||||
echo "Username: test_user"
|
||||
echo "API Key: $api_key"
|
||||
echo "API Key Hash: $api_key_hash"
|
||||
echo "Store this key safely!"
|
||||
echo ""
|
||||
echo "Environment variables in .env.dev"
|
||||
echo "Run: source .env.dev"
|
||||
}
|
||||
|
||||
test_setup() {
|
||||
print_info "Testing setup..."
|
||||
|
||||
# Test Go binaries
|
||||
if [[ -f "bin/api-server" ]]; then
|
||||
./bin/api-server --help > /dev/null 2>&1 || true
|
||||
print_success "API server binary OK"
|
||||
fi
|
||||
|
||||
if [[ -f "bin/worker" ]]; then
|
||||
./bin/worker --help > /dev/null 2>&1 || true
|
||||
print_success "Worker binary OK"
|
||||
fi
|
||||
|
||||
# Test Zig CLI
|
||||
if [[ -f "cli/zig-out/bin/ml" ]]; then
|
||||
./cli/zig-out/bin/ml --help > /dev/null 2>&1 || true
|
||||
print_success "Zig CLI binary OK"
|
||||
fi
|
||||
|
||||
# Test Redis connection
|
||||
if command -v redis-cli &> /dev/null; then
|
||||
if redis-cli ping > /dev/null 2>&1; then
|
||||
print_success "Redis connection OK"
|
||||
else
|
||||
print_warning "Redis not responding"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Test SSL certificates
|
||||
if [[ -f "ssl/cert.pem" && -f "ssl/key.pem" ]]; then
|
||||
if openssl x509 -in ssl/cert.pem -noout -checkend 86400 > /dev/null 2>&1; then
|
||||
print_success "SSL certificates valid"
|
||||
else
|
||||
print_warning "SSL certificates expired or invalid"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
show_next_steps() {
|
||||
print_success "Secure quick start completed!"
|
||||
echo
|
||||
echo "Next steps:"
|
||||
echo "1. Load environment variables:"
|
||||
echo " source .env.dev"
|
||||
echo
|
||||
echo "2. Start API server:"
|
||||
echo " ./bin/api-server -config configs/config.yaml"
|
||||
echo
|
||||
echo "3. Test Zig CLI:"
|
||||
echo " ./cli/zig-out/bin/ml --help"
|
||||
echo
|
||||
echo "4. Test with curl (HTTPS):"
|
||||
echo " curl -k -H 'X-API-Key: dev_test_api_key_12345' https://localhost:9101/health"
|
||||
echo
|
||||
echo "5. Deploy with Docker:"
|
||||
echo " docker-compose up -d"
|
||||
echo
|
||||
echo "Features Enabled:"
|
||||
echo " ✅ HTTPS/TLS encryption"
|
||||
echo " ✅ API key authentication"
|
||||
echo " ✅ Rate limiting"
|
||||
echo " ✅ IP whitelisting"
|
||||
echo " ✅ Security headers"
|
||||
echo " ✅ Audit logging"
|
||||
echo
|
||||
echo "Configuration Files:"
|
||||
echo " configs/config.yaml # Main configuration"
|
||||
echo " .env.dev # Environment variables"
|
||||
echo " ssl/cert.pem, ssl/key.pem # TLS certificates"
|
||||
echo
|
||||
echo "Documentation:"
|
||||
echo " docs/DEPLOYMENT.md # Deployment guide"
|
||||
echo ""
|
||||
print_success "Ready to run ML experiments!"
|
||||
}
|
||||
|
||||
# Main function
|
||||
main() {
|
||||
echo "Fetch ML Quick Start Script (with Security & Zig CLI)"
|
||||
echo "===================================================="
|
||||
echo ""
|
||||
|
||||
check_prerequisites
|
||||
setup_project
|
||||
build_project
|
||||
generate_ssl_certificates
|
||||
setup_redis
|
||||
create_secure_config
|
||||
create_test_user
|
||||
test_setup
|
||||
show_next_steps
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main "$@"
|
||||
124
scripts/legacy/setup_common.sh
Executable file
124
scripts/legacy/setup_common.sh
Executable file
|
|
@ -0,0 +1,124 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Shared helper functions for Fetch ML setup scripts (Ubuntu/Rocky)
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
# Configuration defaults
|
||||
FETCH_ML_USER="fetchml"
|
||||
FETCH_ML_HOME="/opt/fetchml"
|
||||
SERVICE_DIR="/etc/systemd/system"
|
||||
LOG_DIR="/var/log/fetchml"
|
||||
DATA_DIR="/var/lib/fetchml"
|
||||
CONFIG_DIR="$FETCH_ML_HOME/configs"
|
||||
|
||||
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
|
||||
log_warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
|
||||
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||
|
||||
# Download file with checksum verification
|
||||
# Args: url, checksum, dest
|
||||
secure_download() {
|
||||
local url="$1" checksum="$2" dest="$3"
|
||||
curl -fsSL "$url" -o "$dest"
|
||||
echo "$checksum $dest" | sha256sum --check --status || {
|
||||
log_error "Checksum verification failed for $dest"
|
||||
rm -f "$dest"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
|
||||
cleanup_temp() {
|
||||
if [[ -n "${TMP_FILES:-}" ]]; then
|
||||
rm -f $TMP_FILES || true
|
||||
fi
|
||||
}
|
||||
trap cleanup_temp EXIT
|
||||
|
||||
ensure_user() {
|
||||
if ! id "$FETCH_ML_USER" &>/dev/null; then
|
||||
useradd -m -d "$FETCH_ML_HOME" -s /bin/bash "$FETCH_ML_USER"
|
||||
fi
|
||||
usermod -aG podman "$FETCH_ML_USER" || true
|
||||
}
|
||||
|
||||
create_directories() {
|
||||
mkdir -p "$FETCH_ML_HOME" "$LOG_DIR" "$DATA_DIR" "$FETCH_ML_HOME/bin" "$CONFIG_DIR"
|
||||
chown -R "$FETCH_ML_USER":"$FETCH_ML_USER" "$FETCH_ML_HOME" "$LOG_DIR" "$DATA_DIR"
|
||||
}
|
||||
|
||||
setup_systemd_service() {
|
||||
local name="$1" exec="$2"
|
||||
cat > "$SERVICE_DIR/${name}.service" <<EOF
|
||||
[Unit]
|
||||
Description=Fetch ML ${name^} Service
|
||||
After=network.target redis.service
|
||||
Wants=redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$FETCH_ML_USER
|
||||
Group=$FETCH_ML_USER
|
||||
WorkingDirectory=$FETCH_ML_HOME
|
||||
Environment=PATH=$FETCH_ML_HOME/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||
ExecStart=$exec
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=fetch_ml_${name}
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
}
|
||||
|
||||
setup_logrotate() {
|
||||
cat > /etc/logrotate.d/fetch_ml <<'EOF'
|
||||
/var/log/fetchml/*.log {
|
||||
daily
|
||||
missingok
|
||||
rotate 14
|
||||
compress
|
||||
delaycompress
|
||||
notifempty
|
||||
create 0640 fetchml fetchml
|
||||
}
|
||||
EOF
|
||||
}
|
||||
|
||||
hardening_steps() {
|
||||
# Increase file limits
|
||||
if ! grep -q fetchml /etc/security/limits.conf; then
|
||||
cat >> /etc/security/limits.conf <<'EOF'
|
||||
fetchml soft nofile 65536
|
||||
fetchml hard nofile 65536
|
||||
EOF
|
||||
fi
|
||||
|
||||
# Enable unattended security upgrades if available
|
||||
if command -v apt-get &>/dev/null; then
|
||||
apt-get install -y unattended-upgrades >/dev/null || true
|
||||
elif command -v dnf &>/dev/null; then
|
||||
dnf install -y dnf-automatic >/dev/null || true
|
||||
fi
|
||||
}
|
||||
|
||||
selinux_guidance() {
|
||||
if command -v getenforce &>/dev/null; then
|
||||
local mode=$(getenforce)
|
||||
log_info "SELinux mode: $mode"
|
||||
if [[ "$mode" == "Enforcing" ]]; then
|
||||
log_info "Ensure systemd units and directories have proper contexts. Example:"
|
||||
echo " semanage fcontext -a -t bin_t '$FETCH_ML_HOME/bin(/.*)?'"
|
||||
echo " restorecon -Rv $FETCH_ML_HOME/bin"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
417
scripts/legacy/setup_rocky.sh
Executable file
417
scripts/legacy/setup_rocky.sh
Executable file
|
|
@ -0,0 +1,417 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Fetch ML Rocky Linux Setup Script
|
||||
# Optimized for ML experiments on Rocky Linux 8/9
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# shellcheck source=scripts/setup_common.sh
|
||||
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
|
||||
source "$SCRIPT_DIR/setup_common.sh"
|
||||
|
||||
check_root() {
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
log_error "This script must be run as root"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_rocky() {
|
||||
if ! command -v dnf &> /dev/null && ! command -v yum &> /dev/null; then
|
||||
log_error "This script is designed for Rocky Linux systems"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local rocky_version=$(cat /etc/rocky-release | grep -oE '[0-9]+\.[0-9]+')
|
||||
log_info "Rocky Linux version: $rocky_version"
|
||||
|
||||
# Use dnf for Rocky 9+, yum for Rocky 8
|
||||
if command -v dnf &> /dev/null; then
|
||||
PKG_MANAGER="dnf"
|
||||
else
|
||||
PKG_MANAGER="yum"
|
||||
fi
|
||||
}
|
||||
|
||||
update_system() {
|
||||
log_info "Updating system packages..."
|
||||
$PKG_MANAGER update -y
|
||||
$PKG_MANAGER upgrade -y
|
||||
$PKG_MANAGER install -y curl wget gnupg2
|
||||
}
|
||||
|
||||
enable_epel() {
|
||||
log_info "Enabling EPEL repository..."
|
||||
|
||||
if $PKG_MANAGER repolist | grep -q "epel"; then
|
||||
log_info "EPEL already enabled"
|
||||
return
|
||||
fi
|
||||
|
||||
$PKG_MANAGER install -y epel-release
|
||||
$PKG_MANAGER config-manager --set-enabled powertools
|
||||
|
||||
log_success "EPEL repository enabled"
|
||||
}
|
||||
|
||||
install_go() {
|
||||
log_info "Installing Go 1.25..."
|
||||
|
||||
if command -v go &> /dev/null; then
|
||||
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||
log_info "Go already installed: $go_version"
|
||||
return
|
||||
fi
|
||||
|
||||
cd /tmp
|
||||
TMP_FILES="/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||
secure_download "https://go.dev/dl/go1.25.0.linux-amd64.tar.gz" "b5b98c784d53115553848114fd3c74e565643b4e4c8e8db0c3bea3478fd8c345" "/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||
tar -C /usr/local -xzf go1.25.0.linux-amd64.tar.gz
|
||||
|
||||
# Add to PATH
|
||||
echo 'export PATH=$PATH:/usr/local/go/bin' >> /etc/profile
|
||||
echo 'export PATH=$PATH:$HOME/go/bin' >> /etc/profile
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
|
||||
log_success "Go 1.25 installed"
|
||||
}
|
||||
|
||||
install_podman() {
|
||||
log_info "Installing Podman..."
|
||||
|
||||
if command -v podman &> /dev/null; then
|
||||
log_info "Podman already installed"
|
||||
return
|
||||
fi
|
||||
|
||||
# Install Podman and related tools
|
||||
$PKG_MANAGER install -y podman podman-compose containernetworking-plugins
|
||||
|
||||
# Configure Podman
|
||||
mkdir -p /etc/containers
|
||||
cat > /etc/containers/containers.conf << EOF
|
||||
[containers]
|
||||
user_namespace_enable = 1
|
||||
runtime = "crun"
|
||||
|
||||
[network]
|
||||
network_backend = "netavark"
|
||||
|
||||
[engine]
|
||||
cgroup_manager = "systemd"
|
||||
EOF
|
||||
|
||||
# Enable user namespaces
|
||||
echo "user.max_user_namespaces=15000" >> /etc/sysctl.conf
|
||||
sysctl -p user.max_user_namespaces=15000
|
||||
|
||||
log_success "Podman installed"
|
||||
}
|
||||
|
||||
install_redis() {
|
||||
log_info "Installing Redis..."
|
||||
|
||||
if command -v redis-server &> /dev/null; then
|
||||
log_info "Redis already installed"
|
||||
return
|
||||
fi
|
||||
|
||||
$PKG_MANAGER install -y redis
|
||||
|
||||
# Configure Redis for production
|
||||
sed -i 's/supervised no/supervised systemd/' /etc/redis.conf
|
||||
sed -i 's/bind 127.0.0.1 ::1/bind 127.0.0.1/' /etc/redis.conf
|
||||
|
||||
systemctl enable redis
|
||||
systemctl start redis
|
||||
|
||||
log_success "Redis installed and configured"
|
||||
}
|
||||
|
||||
install_nvidia_drivers() {
|
||||
log_info "Checking for NVIDIA GPU..."
|
||||
|
||||
if command -v nvidia-smi &> /dev/null; then
|
||||
log_info "NVIDIA drivers already installed"
|
||||
nvidia-smi
|
||||
return
|
||||
fi
|
||||
|
||||
if lspci | grep -i nvidia &> /dev/null; then
|
||||
log_info "NVIDIA GPU detected, installing drivers..."
|
||||
|
||||
# Enable NVIDIA repository
|
||||
$PKG_MANAGER config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel$(rpm -E %rhel)/x86_64/cuda-rhel.repo
|
||||
|
||||
# Clean and install
|
||||
$PKG_MANAGER clean all
|
||||
$PKG_MANAGER module enable -y nvidia-driver:latest-dkms
|
||||
$PKG_MANAGER install -y nvidia-driver nvidia-cuda-toolkit
|
||||
|
||||
# Configure Podman for NVIDIA (only if needed)
|
||||
if ! podman run --rm --device nvidia.com/gpu=all alpine echo "NVIDIA GPU access configured" 2>/dev/null; then
|
||||
log_warning "NVIDIA GPU access test failed, you may need to reboot"
|
||||
else
|
||||
log_success "NVIDIA drivers installed and GPU access verified"
|
||||
fi
|
||||
|
||||
# Reboot required
|
||||
log_warning "System reboot required for NVIDIA drivers"
|
||||
log_info "Run: reboot"
|
||||
else
|
||||
log_info "No NVIDIA GPU detected, skipping driver installation"
|
||||
fi
|
||||
}
|
||||
|
||||
install_ml_tools() {
|
||||
log_info "Installing ML tools and dependencies..."
|
||||
|
||||
# Python and ML packages
|
||||
$PKG_MANAGER install -y python3 python3-pip python3-devel
|
||||
|
||||
# System dependencies for ML
|
||||
$PKG_MANAGER groupinstall -y "Development Tools"
|
||||
$PKG_MANAGER install -y cmake git pkgconfig
|
||||
$PKG_MANAGER install -y libjpeg-turbo-devel libpng-devel libtiff-devel
|
||||
$PKG_MANAGER install -y mesa-libGL-devel mesa-libGLU-devel
|
||||
$PKG_MANAGER install -y gtk3-devel
|
||||
$PKG_MANAGER install -y atlas-devel blas-devel lapack-devel
|
||||
|
||||
# Install common ML libraries
|
||||
pip3 install --upgrade pip
|
||||
pip3 install numpy scipy scikit-learn pandas
|
||||
pip3 install jupyter matplotlib seaborn
|
||||
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||
|
||||
log_success "ML tools installed"
|
||||
}
|
||||
|
||||
create_user() {
|
||||
log_info "Creating fetchml user..."
|
||||
|
||||
if id "$FETCH_ML_USER" &>/dev/null; then
|
||||
log_info "User $FETCH_ML_USER already exists"
|
||||
return
|
||||
fi
|
||||
|
||||
useradd -m -d $FETCH_ML_HOME -s /bin/bash $FETCH_ML_USER
|
||||
usermod -aG podman $FETCH_ML_USER
|
||||
|
||||
# Create directories
|
||||
mkdir -p $FETCH_ML_HOME/.config/containers
|
||||
mkdir -p $FETCH_ML_HOME/go/bin
|
||||
mkdir -p $LOG_DIR
|
||||
mkdir -p $DATA_DIR
|
||||
|
||||
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||
chown -R $FETCH_ML_USER:$FETCH_ML_USER $LOG_DIR
|
||||
chown -R $FETCH_ML_USER:$FETCH_ML_USER $DATA_DIR
|
||||
|
||||
log_success "User $FETCH_ML_USER created"
|
||||
}
|
||||
|
||||
setup_firewall() {
|
||||
log_info "Configuring firewall..."
|
||||
|
||||
if command -v firewall-cmd &> /dev/null; then
|
||||
systemctl enable firewalld
|
||||
systemctl start firewalld
|
||||
|
||||
firewall-cmd --permanent --add-service=ssh
|
||||
firewall-cmd --permanent --add-port=8080/tcp # Worker API
|
||||
firewall-cmd --permanent --add-port=8081/tcp # Data manager API
|
||||
firewall-cmd --permanent --add-port=6379/tcp # Redis
|
||||
firewall-cmd --reload
|
||||
|
||||
firewall-cmd --list-all
|
||||
else
|
||||
log_warning "Firewalld not available, skipping firewall configuration"
|
||||
fi
|
||||
}
|
||||
|
||||
setup_systemd_services() {
|
||||
log_info "Setting up systemd services..."
|
||||
|
||||
# Fetch ML Worker service
|
||||
cat > $SERVICE_DIR/fetch_ml_worker.service << EOF
|
||||
[Unit]
|
||||
Description=Fetch ML Worker Service
|
||||
After=network.target redis.service
|
||||
Wants=redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$FETCH_ML_USER
|
||||
Group=$FETCH_ML_USER
|
||||
WorkingDirectory=$FETCH_ML_HOME
|
||||
Environment=FETCH_ML_HOME=$FETCH_ML_HOME
|
||||
Environment=PATH=$FETCH_ML_HOME/go/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||
ExecStart=$FETCH_ML_HOME/bin/worker --config $FETCH_ML_HOME/configs/config-local.yaml
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=fetch_ml_worker
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Fetch ML Data Manager service
|
||||
cat > $SERVICE_DIR/fetch_ml_data_manager.service << EOF
|
||||
[Unit]
|
||||
Description=Fetch ML Data Manager Service
|
||||
After=network.target redis.service
|
||||
Wants=redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$FETCH_ML_USER
|
||||
Group=$FETCH_ML_USER
|
||||
WorkingDirectory=$FETCH_ML_HOME
|
||||
Environment=FETCH_ML_HOME=$FETCH_ML_HOME
|
||||
Environment=PATH=$FETCH_ML_HOME/go/bin:/usr/local/go/bin:/usr/bin:/bin
|
||||
ExecStart=$FETCH_ML_HOME/bin/data_manager --config $FETCH_ML_HOME/configs/config-local.yaml
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=fetch_ml_data_manager
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Enable services
|
||||
systemctl daemon-reload
|
||||
systemctl enable fetch_ml_worker
|
||||
systemctl enable fetch_ml_data_manager
|
||||
|
||||
log_success "Systemd services configured"
|
||||
}
|
||||
|
||||
setup_log_rotation() {
|
||||
log_info "Setting up log rotation..."
|
||||
|
||||
cat > /etc/logrotate.d/fetch_ml << EOF
|
||||
$LOG_DIR/*.log {
|
||||
daily
|
||||
missingok
|
||||
rotate 30
|
||||
compress
|
||||
delaycompress
|
||||
notifempty
|
||||
create 0644 $FETCH_ML_USER $FETCH_ML_USER
|
||||
postrotate
|
||||
systemctl reload fetch_ml_worker || true
|
||||
systemctl reload fetch_ml_data_manager || true
|
||||
endscript
|
||||
}
|
||||
EOF
|
||||
|
||||
log_success "Log rotation configured"
|
||||
}
|
||||
|
||||
optimize_system() {
|
||||
log_info "Optimizing system for ML workloads..."
|
||||
|
||||
# Increase file limits
|
||||
echo "* soft nofile 65536" >> /etc/security/limits.conf
|
||||
echo "* hard nofile 65536" >> /etc/security/limits.conf
|
||||
|
||||
# Optimize kernel parameters for ML
|
||||
cat >> /etc/sysctl.conf << EOF
|
||||
# ML Optimization
|
||||
net.core.rmem_max = 134217728
|
||||
net.core.wmem_max = 134217728
|
||||
vm.swappiness = 10
|
||||
vm.dirty_ratio = 15
|
||||
vm.dirty_background_ratio = 5
|
||||
EOF
|
||||
|
||||
sysctl -p
|
||||
|
||||
# Configure GPU persistence mode if NVIDIA available
|
||||
if command -v nvidia-smi &> /dev/null; then
|
||||
nvidia-smi -pm 1 || log_warning "Could not enable GPU persistence mode"
|
||||
fi
|
||||
|
||||
# Disable SELinux for better container compatibility (optional)
|
||||
if [[ -f /etc/selinux/config ]]; then
|
||||
log_warning "Consider setting SELinux to permissive mode for better container compatibility"
|
||||
log_info "Edit /etc/selinux/config and set SELINUX=permissive"
|
||||
fi
|
||||
|
||||
log_success "System optimized for ML workloads"
|
||||
}
|
||||
|
||||
install_fetch_ml() {
|
||||
log_info "Installing Fetch ML..."
|
||||
|
||||
# Clone or copy Fetch ML
|
||||
cd $FETCH_ML_HOME
|
||||
|
||||
if [[ ! -d "fetch_ml" ]]; then
|
||||
log_warning "Please clone Fetch ML repository manually to $FETCH_ML_HOME/fetch_ml"
|
||||
log_info "Example: git clone https://github.com/your-org/fetch_ml.git"
|
||||
return
|
||||
fi
|
||||
|
||||
cd fetch_ml
|
||||
|
||||
# Build
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
make build
|
||||
|
||||
# Copy binaries
|
||||
cp bin/* $FETCH_ML_HOME/bin/
|
||||
chmod +x $FETCH_ML_HOME/bin/*
|
||||
|
||||
# Copy configs
|
||||
mkdir -p $FETCH_ML_HOME/configs
|
||||
cp configs/config-local.yaml.example $FETCH_ML_HOME/configs/config-local.yaml
|
||||
|
||||
# Set permissions
|
||||
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||
|
||||
log_success "Fetch ML installed"
|
||||
}
|
||||
|
||||
main() {
|
||||
log_info "Starting Fetch ML Rocky Linux server setup..."
|
||||
|
||||
check_root
|
||||
check_rocky
|
||||
|
||||
update_system
|
||||
enable_epel
|
||||
install_go
|
||||
install_podman
|
||||
install_redis
|
||||
install_nvidia_drivers
|
||||
install_ml_tools
|
||||
ensure_user
|
||||
create_directories
|
||||
setup_firewall
|
||||
setup_systemd_services
|
||||
setup_logrotate
|
||||
hardening_steps
|
||||
selinux_guidance
|
||||
install_fetch_ml
|
||||
|
||||
log_success "Fetch ML setup complete!"
|
||||
echo
|
||||
log_info "Next steps:"
|
||||
echo "1. Clone Fetch ML repository: git clone https://github.com/your-org/fetch_ml.git $FETCH_ML_HOME/fetch_ml"
|
||||
echo "2. Configure: $FETCH_ML_HOME/configs/config-local.yaml"
|
||||
echo "3. Start services: systemctl start fetch_ml_worker fetch_ml_data_manager"
|
||||
echo "4. Check status: systemctl status fetch_ml_worker fetch_ml_data_manager"
|
||||
echo "5. View logs: journalctl -u fetch_ml_worker -f"
|
||||
echo
|
||||
log_info "Services will be available at:"
|
||||
echo "- Worker API: http://$(hostname -I | awk '{print $1}'):8080"
|
||||
echo "- Data Manager: http://$(hostname -I | awk '{print $1}'):8081"
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main "$@"
|
||||
294
scripts/legacy/setup_ubuntu.sh
Executable file
294
scripts/legacy/setup_ubuntu.sh
Executable file
|
|
@ -0,0 +1,294 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Fetch ML Ubuntu Server Setup Script
|
||||
# Optimized for ML experiments on Ubuntu 20.04/22.04
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# shellcheck source=scripts/setup_common.sh
|
||||
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
|
||||
source "$SCRIPT_DIR/setup_common.sh"
|
||||
|
||||
check_root() {
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
log_error "This script must be run as root"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_ubuntu() {
|
||||
if ! command -v apt-get &> /dev/null; then
|
||||
log_error "This script is designed for Ubuntu systems"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local ubuntu_version=$(lsb_release -rs)
|
||||
log_info "Ubuntu version: $ubuntu_version"
|
||||
|
||||
if (( $(echo "$ubuntu_version < 20.04" | bc -l) == 1 )); then
|
||||
log_warning "Ubuntu version < 20.04 may not support all features"
|
||||
fi
|
||||
}
|
||||
|
||||
update_system() {
|
||||
log_info "Updating system packages..."
|
||||
apt-get update -y
|
||||
apt-get upgrade -y
|
||||
apt-get install -y curl wget gnupg lsb-release software-properties-common
|
||||
}
|
||||
|
||||
install_go() {
|
||||
log_info "Installing Go 1.25..."
|
||||
|
||||
if command -v go &> /dev/null; then
|
||||
local go_version=$(go version | awk '{print $3}' | sed 's/go//')
|
||||
log_info "Go already installed: $go_version"
|
||||
return
|
||||
fi
|
||||
|
||||
cd /tmp
|
||||
TMP_FILES="/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||
secure_download "https://go.dev/dl/go1.25.0.linux-amd64.tar.gz" "b5b98c784d53115553848114fd3c74e565643b4e4c8e8db0c3bea3478fd8c345" "/tmp/go1.25.0.linux-amd64.tar.gz"
|
||||
tar -C /usr/local -xzf go1.25.0.linux-amd64.tar.gz
|
||||
|
||||
# Add to PATH
|
||||
echo 'export PATH=$PATH:/usr/local/go/bin' >> /etc/profile
|
||||
echo 'export PATH=$PATH:$HOME/go/bin' >> /etc/profile
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
|
||||
log_success "Go 1.25 installed"
|
||||
}
|
||||
|
||||
install_podman() {
|
||||
log_info "Installing Podman..."
|
||||
|
||||
if command -v podman &> /dev/null; then
|
||||
log_info "Podman already installed"
|
||||
return
|
||||
fi
|
||||
|
||||
# Add official Podman repository
|
||||
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_$(lsb_release -rs)/ /" | tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
|
||||
curl -L "https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_$(lsb_release -rs)/Release.key" | apt-key add -
|
||||
|
||||
apt-get update -y
|
||||
apt-get install -y podman podman-compose
|
||||
|
||||
# Configure Podman for rootless operation
|
||||
echo "user_namespace_enable = 1" >> /etc/containers/containers.conf
|
||||
echo "runtime = \"crun\"" >> /etc/containers/containers.conf
|
||||
|
||||
log_success "Podman installed"
|
||||
}
|
||||
|
||||
install_redis() {
|
||||
log_info "Installing Redis..."
|
||||
|
||||
if command -v redis-server &> /dev/null; then
|
||||
log_info "Redis already installed"
|
||||
return
|
||||
fi
|
||||
|
||||
apt-get install -y redis-server
|
||||
|
||||
# Configure Redis for production
|
||||
sed -i 's/supervised no/supervised systemd/' /etc/redis/redis.conf
|
||||
sed -i 's/bind 127.0.0.1 ::1/bind 127.0.0.1/' /etc/redis/redis.conf
|
||||
|
||||
systemctl enable redis-server
|
||||
systemctl start redis-server
|
||||
|
||||
log_success "Redis installed and configured"
|
||||
}
|
||||
|
||||
install_nvidia_drivers() {
|
||||
log_info "Checking for NVIDIA GPU..."
|
||||
|
||||
if command -v nvidia-smi &> /dev/null; then
|
||||
log_info "NVIDIA drivers already installed"
|
||||
nvidia-smi
|
||||
return
|
||||
fi
|
||||
|
||||
if lspci | grep -i nvidia &> /dev/null; then
|
||||
log_info "NVIDIA GPU detected, installing drivers..."
|
||||
|
||||
# Add NVIDIA repository
|
||||
TMP_FILES="/tmp/cuda-keyring_1.1-1_all.deb"
|
||||
secure_download "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu$(lsb_release -rs | cut -d. -f1)/x86_64/cuda-keyring_1.1-1_all.deb" "cfa6b4109e7e3d9be060a016b7dc07e8edcd5356c0eabcc0c537a76e6c603d76" "/tmp/cuda-keyring_1.1-1_all.deb"
|
||||
dpkg -i /tmp/cuda-keyring_1.1-1_all.deb
|
||||
apt-get update -y
|
||||
|
||||
# Install drivers
|
||||
apt-get install -y nvidia-driver-535 nvidia-cuda-toolkit
|
||||
|
||||
# Configure Podman for NVIDIA (only if needed)
|
||||
if ! podman run --rm --device nvidia.com/gpu=all alpine echo "NVIDIA GPU access configured" 2>/dev/null; then
|
||||
log_warning "NVIDIA GPU access test failed, you may need to reboot"
|
||||
else
|
||||
log_success "NVIDIA drivers installed and GPU access verified"
|
||||
fi
|
||||
|
||||
else
|
||||
log_info "No NVIDIA GPU detected, skipping driver installation"
|
||||
fi
|
||||
}
|
||||
|
||||
install_ml_tools() {
|
||||
log_info "Installing ML tools and dependencies..."
|
||||
|
||||
# Python and ML packages
|
||||
apt-get install -y python3 python3-pip python3-venv
|
||||
|
||||
# System dependencies for ML
|
||||
apt-get install -y build-essential cmake git pkg-config
|
||||
apt-get install -y libjpeg-dev libpng-dev libtiff-dev
|
||||
apt-get install -y libavcodec-dev libavformat-dev libswscale-dev
|
||||
apt-get install -y libgtk2.0-dev libcanberra-gtk-module
|
||||
apt-get install -y libxvidcore-dev libx264-dev
|
||||
apt-get install -y libatlas-base-dev gfortran
|
||||
|
||||
# Install common ML libraries
|
||||
pip3 install --upgrade pip
|
||||
pip3 install numpy scipy scikit-learn pandas
|
||||
pip3 install jupyter matplotlib seaborn
|
||||
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
|
||||
|
||||
log_success "ML tools installed"
|
||||
}
|
||||
|
||||
create_user() {
|
||||
log_info "Creating fetchml user..."
|
||||
ensure_user
|
||||
create_directories
|
||||
log_success "User $FETCH_ML_USER and directories created"
|
||||
}
|
||||
|
||||
setup_firewall() {
|
||||
log_info "Configuring firewall..."
|
||||
|
||||
if command -v ufw &> /dev/null; then
|
||||
ufw --force enable
|
||||
ufw allow ssh
|
||||
ufw allow 8080/tcp # Worker API
|
||||
ufw allow 8081/tcp # Data manager API
|
||||
ufw allow 6379/tcp # Redis
|
||||
ufw status
|
||||
else
|
||||
log_warning "UFW not available, skipping firewall configuration"
|
||||
fi
|
||||
}
|
||||
|
||||
setup_systemd_services() {
|
||||
log_info "Setting up systemd services..."
|
||||
|
||||
setup_systemd_service "fetch_ml_worker" "$FETCH_ML_HOME/bin/worker --config $FETCH_ML_HOME/configs/config-local.yaml"
|
||||
setup_systemd_service "fetch_ml_data_manager" "$FETCH_ML_HOME/bin/data_manager --config $FETCH_ML_HOME/configs/config-local.yaml"
|
||||
|
||||
# Enable services
|
||||
systemctl daemon-reload
|
||||
systemctl enable fetch_ml_worker
|
||||
systemctl enable fetch_ml_data_manager
|
||||
|
||||
log_success "Systemd services configured"
|
||||
}
|
||||
|
||||
setup_log_rotation() {
|
||||
log_info "Setting up log rotation..."
|
||||
setup_logrotate
|
||||
log_success "Log rotation configured"
|
||||
}
|
||||
|
||||
optimize_system() {
|
||||
log_info "Optimizing system for ML workloads..."
|
||||
hardening_steps
|
||||
|
||||
# Optimize kernel parameters for ML
|
||||
cat >> /etc/sysctl.conf << EOF
|
||||
# ML Optimization
|
||||
net.core.rmem_max = 134217728
|
||||
net.core.wmem_max = 134217728
|
||||
vm.swappiness = 10
|
||||
vm.dirty_ratio = 15
|
||||
vm.dirty_background_ratio = 5
|
||||
EOF
|
||||
|
||||
sysctl -p
|
||||
|
||||
# Configure GPU persistence mode if NVIDIA available
|
||||
if command -v nvidia-smi &> /dev/null; then
|
||||
nvidia-smi -pm 1 || log_warning "Could not enable GPU persistence mode"
|
||||
fi
|
||||
|
||||
log_success "System optimized for ML workloads"
|
||||
}
|
||||
|
||||
install_fetch_ml() {
|
||||
log_info "Installing Fetch ML..."
|
||||
|
||||
# Clone or copy Fetch ML
|
||||
cd $FETCH_ML_HOME
|
||||
|
||||
if [[ ! -d "fetch_ml" ]]; then
|
||||
# This would be replaced with actual repository URL
|
||||
log_warning "Please clone Fetch ML repository manually to $FETCH_ML_HOME/fetch_ml"
|
||||
log_info "Example: git clone https://github.com/your-org/fetch_ml.git"
|
||||
return
|
||||
fi
|
||||
|
||||
cd fetch_ml
|
||||
|
||||
# Build
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
make build
|
||||
|
||||
# Copy binaries
|
||||
cp bin/* $FETCH_ML_HOME/bin/
|
||||
chmod +x $FETCH_ML_HOME/bin/*
|
||||
|
||||
# Copy configs
|
||||
mkdir -p $FETCH_ML_HOME/configs
|
||||
cp configs/config-local.yaml.example $FETCH_ML_HOME/configs/config-local.yaml
|
||||
|
||||
# Set permissions
|
||||
chown -R $FETCH_ML_USER:$FETCH_ML_USER $FETCH_ML_HOME
|
||||
|
||||
log_success "Fetch ML installed"
|
||||
}
|
||||
|
||||
main() {
|
||||
log_info "Starting Fetch ML Ubuntu server setup..."
|
||||
|
||||
check_root
|
||||
check_ubuntu
|
||||
|
||||
update_system
|
||||
install_go
|
||||
install_podman
|
||||
install_redis
|
||||
install_nvidia_drivers
|
||||
install_ml_tools
|
||||
ensure_user
|
||||
create_directories
|
||||
setup_firewall
|
||||
setup_systemd_services
|
||||
setup_logrotate
|
||||
hardening_steps
|
||||
install_fetch_ml
|
||||
|
||||
log_success "Fetch ML setup complete!"
|
||||
echo
|
||||
log_info "Next steps:"
|
||||
echo "1. Clone Fetch ML repository: git clone https://github.com/your-org/fetch_ml.git $FETCH_ML_HOME/fetch_ml"
|
||||
echo "2. Configure: $FETCH_ML_HOME/configs/config-local.yaml"
|
||||
echo "3. Start services: systemctl start fetch_ml_worker fetch_ml_data_manager"
|
||||
echo "4. Check status: systemctl status fetch_ml_worker fetch_ml_data_manager"
|
||||
echo "5. View logs: journalctl -u fetch_ml_worker -f"
|
||||
echo
|
||||
log_info "Services will be available at:"
|
||||
echo "- Worker API: http://$(hostname -I | awk '{print $1}'):8080"
|
||||
echo "- Data Manager: http://$(hostname -I | awk '{print $1}'):8081"
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main "$@"
|
||||
67
scripts/legacy/test_tools.sh
Executable file
67
scripts/legacy/test_tools.sh
Executable file
|
|
@ -0,0 +1,67 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "=== Test Tools Harness ==="
|
||||
|
||||
# Function to check if Redis is running, start temporary instance if needed
|
||||
ensure_redis() {
|
||||
if ! redis-cli ping >/dev/null 2>&1; then
|
||||
echo "Starting temporary Redis instance..."
|
||||
redis-server --daemonize yes --port 6379
|
||||
sleep 2
|
||||
if ! redis-cli ping >/dev/null 2>&1; then
|
||||
echo "Failed to start Redis"
|
||||
exit 1
|
||||
fi
|
||||
echo "Redis started successfully"
|
||||
# Set up cleanup trap
|
||||
trap 'echo "Stopping temporary Redis..."; redis-cli shutdown || true' EXIT
|
||||
else
|
||||
echo "Redis is already running"
|
||||
fi
|
||||
}
|
||||
|
||||
# Step 1: Build Go binaries
|
||||
echo "Building Go binaries..."
|
||||
go build -o bin/api-server ./cmd/api-server
|
||||
go build -o bin/worker ./cmd/worker
|
||||
go build -o bin/data_manager ./cmd/data_manager
|
||||
go build -o bin/user_manager ./cmd/user_manager
|
||||
|
||||
# Step 2: Build Zig CLI
|
||||
echo "Building Zig CLI..."
|
||||
cd cli
|
||||
zig build
|
||||
cd ..
|
||||
|
||||
# Step 3: Ensure Redis is running
|
||||
ensure_redis
|
||||
|
||||
# Step 4: Run Go tests
|
||||
echo "Running Go tests..."
|
||||
go test ./...
|
||||
|
||||
# Step 5: Run Zig tests
|
||||
echo "Running Zig CLI tests..."
|
||||
cd cli
|
||||
zig test
|
||||
cd ..
|
||||
|
||||
# Step 6: Run Go E2E tests (Redis is already available)
|
||||
echo "Running Go E2E tests..."
|
||||
go test ./tests/e2e/...
|
||||
|
||||
# Step 7: Smoke test API server and CLI
|
||||
echo "Running smoke test..."
|
||||
# Start API server in background on different port
|
||||
./bin/api-server -config configs/config.yaml -port 19101 -no-tls > /tmp/api-server.log 2>&1 &
|
||||
API_PID=$!
|
||||
sleep 2
|
||||
|
||||
# Test CLI status
|
||||
./cli/zig-out/bin/ml status -server http://localhost:19101
|
||||
|
||||
# Clean up
|
||||
kill $API_PID 2>/dev/null || true
|
||||
|
||||
echo "=== All tests completed successfully ==="
|
||||
183
scripts/lib/common.sh
Executable file
183
scripts/lib/common.sh
Executable file
|
|
@ -0,0 +1,183 @@
|
|||
#!/bin/bash
|
||||
# Common shell functions for FetchML scripts
|
||||
# Source this file in other scripts: source "$(dirname "$0")/lib/common.sh"
|
||||
|
||||
# Colors for output
|
||||
export BOLD='\033[1m'
|
||||
export GREEN='\033[0;32m'
|
||||
export BLUE='\033[0;34m'
|
||||
export YELLOW='\033[0;33m'
|
||||
export RED='\033[0;31m'
|
||||
export NC='\033[0m'
|
||||
|
||||
###################
|
||||
# Logging functions
|
||||
###################
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}✓${NC} $1"
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1" >&2
|
||||
}
|
||||
|
||||
log_step() {
|
||||
local step=$1
|
||||
local total=$2
|
||||
local message=$3
|
||||
echo -e "${BLUE}[$step/$total]${NC} $message"
|
||||
}
|
||||
|
||||
print_header() {
|
||||
local title=$1
|
||||
echo ""
|
||||
echo -e "${BOLD}=== $title ===${NC}"
|
||||
echo ""
|
||||
}
|
||||
|
||||
###################
|
||||
# System detection
|
||||
###################
|
||||
|
||||
detect_distro() {
|
||||
if [ -f /etc/os-release ]; then
|
||||
. /etc/os-release
|
||||
export DISTRO=$ID
|
||||
export DISTRO_VERSION=$VERSION_ID
|
||||
elif [ -f /etc/redhat-release ]; then
|
||||
export DISTRO="rhel"
|
||||
export DISTRO_VERSION="unknown"
|
||||
else
|
||||
export DISTRO="unknown"
|
||||
export DISTRO_VERSION="unknown"
|
||||
fi
|
||||
|
||||
# Detect package manager
|
||||
if command -v dnf &>/dev/null; then
|
||||
export PKG_MANAGER="dnf"
|
||||
elif command -v yum &>/dev/null; then
|
||||
export PKG_MANAGER="yum"
|
||||
elif command -v apt-get &>/dev/null; then
|
||||
export PKG_MANAGER="apt"
|
||||
elif command -v pacman &>/dev/null; then
|
||||
export PKG_MANAGER="pacman"
|
||||
elif command -v zypper &>/dev/null; then
|
||||
export PKG_MANAGER="zypper"
|
||||
else
|
||||
log_warn "No known package manager found"
|
||||
export PKG_MANAGER="unknown"
|
||||
fi
|
||||
|
||||
log_info "Detected: $DISTRO $DISTRO_VERSION (using $PKG_MANAGER)"
|
||||
}
|
||||
|
||||
###################
|
||||
# Utility functions
|
||||
###################
|
||||
|
||||
check_command() {
|
||||
local cmd=$1
|
||||
local install_hint=$2
|
||||
|
||||
if ! command -v "$cmd" &>/dev/null; then
|
||||
log_error "$cmd not found"
|
||||
if [ -n "$install_hint" ]; then
|
||||
log_info "Install with: $install_hint"
|
||||
fi
|
||||
return 1
|
||||
fi
|
||||
return 0
|
||||
}
|
||||
|
||||
check_root() {
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
log_error "This script must be run with sudo"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
confirm() {
|
||||
local prompt=$1
|
||||
local default=${2:-n}
|
||||
|
||||
if [ "$default" = "y" ]; then
|
||||
read -p "$prompt [Y/n]: " -n 1 -r
|
||||
else
|
||||
read -p "$prompt [y/N]: " -n 1 -r
|
||||
fi
|
||||
echo
|
||||
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
return 0
|
||||
else
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
###################
|
||||
# Firewall management
|
||||
###################
|
||||
|
||||
setup_firewall() {
|
||||
local port=$1
|
||||
local comment=${2:-"FetchML"}
|
||||
|
||||
if command -v firewall-cmd &>/dev/null; then
|
||||
# RHEL/Rocky/Fedora (firewalld)
|
||||
sudo firewall-cmd --permanent --add-port="${port}/tcp" >/dev/null 2>&1
|
||||
log_success "Firewall rule added (firewalld): ${port}/tcp"
|
||||
return 0
|
||||
elif command -v ufw &>/dev/null; then
|
||||
# Ubuntu/Debian (ufw)
|
||||
sudo ufw allow "${port}/tcp" comment "$comment" >/dev/null 2>&1
|
||||
log_success "Firewall rule added (ufw): ${port}/tcp"
|
||||
return 0
|
||||
else
|
||||
log_warn "No firewall detected. Manually open port ${port}/tcp"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
reload_firewall() {
|
||||
if command -v firewall-cmd &>/dev/null; then
|
||||
sudo firewall-cmd --reload >/dev/null 2>&1
|
||||
log_success "Firewall reloaded (firewalld)"
|
||||
elif command -v ufw &>/dev/null; then
|
||||
sudo ufw reload >/dev/null 2>&1 || true
|
||||
log_success "Firewall reloaded (ufw)"
|
||||
fi
|
||||
}
|
||||
|
||||
###################
|
||||
# File/directory management
|
||||
###################
|
||||
|
||||
create_dir() {
|
||||
local dir=$1
|
||||
local owner=${2:-$USER}
|
||||
local group=${3:-$(id -gn)}
|
||||
|
||||
sudo mkdir -p "$dir"
|
||||
sudo chown "$owner:$group" "$dir"
|
||||
sudo chmod 755 "$dir"
|
||||
log_success "Created: $dir"
|
||||
}
|
||||
|
||||
check_service() {
|
||||
local service=$1
|
||||
|
||||
if systemctl list-unit-files | grep -q "^${service}"; then
|
||||
return 0
|
||||
else
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
275
scripts/setup-monitoring-prod.sh
Executable file
275
scripts/setup-monitoring-prod.sh
Executable file
|
|
@ -0,0 +1,275 @@
|
|||
#!/bin/bash
|
||||
# Production Monitoring Stack Setup for Linux
|
||||
# Deploys Prometheus/Grafana/Loki/Promtail as Podman containers with systemd
|
||||
# Compatible with: Rocky/RHEL/CentOS, Ubuntu/Debian, Arch, SUSE, etc.
|
||||
|
||||
set -e
|
||||
|
||||
BOLD='\033[1m'
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[0;33m'
|
||||
NC='\033[0m'
|
||||
|
||||
echo -e "${BOLD}=== FetchML Monitoring Stack Setup (Linux) ===${NC}\n"
|
||||
|
||||
# Detect Linux distribution and package manager
|
||||
detect_distro() {
|
||||
if [ -f /etc/os-release ]; then
|
||||
. /etc/os-release
|
||||
DISTRO=$ID
|
||||
DISTRO_VERSION=$VERSION_ID
|
||||
elif [ -f /etc/redhat-release ]; then
|
||||
DISTRO="rhel"
|
||||
else
|
||||
DISTRO="unknown"
|
||||
fi
|
||||
|
||||
# Detect package manager
|
||||
if command -v dnf &>/dev/null; then
|
||||
PKG_MANAGER="dnf"
|
||||
elif command -v yum &>/dev/null; then
|
||||
PKG_MANAGER="yum"
|
||||
elif command -v apt-get &>/dev/null; then
|
||||
PKG_MANAGER="apt"
|
||||
elif command -v pacman &>/dev/null; then
|
||||
PKG_MANAGER="pacman"
|
||||
elif command -v zypper &>/dev/null; then
|
||||
PKG_MANAGER="zypper"
|
||||
else
|
||||
echo -e "${YELLOW}Warning: No known package manager found${NC}"
|
||||
PKG_MANAGER="unknown"
|
||||
fi
|
||||
|
||||
echo "Detected distribution: $DISTRO (using $PKG_MANAGER)"
|
||||
}
|
||||
|
||||
detect_distro
|
||||
|
||||
# Configuration
|
||||
DATA_PATH="${1:-/data/monitoring}"
|
||||
ML_USER="${2:-ml-user}"
|
||||
ML_GROUP="${3:-ml-group}"
|
||||
|
||||
echo "Configuration:"
|
||||
echo " Monitoring data path: $DATA_PATH"
|
||||
echo " User: $ML_USER"
|
||||
echo " Group: $ML_GROUP"
|
||||
echo ""
|
||||
|
||||
# Create pod for monitoring stack
|
||||
POD_NAME="monitoring"
|
||||
|
||||
# 1. Create directories
|
||||
echo -e "${BLUE}[1/6]${NC} Creating directory structure..."
|
||||
sudo mkdir -p "${DATA_PATH}"/{prometheus,grafana,loki,promtail-config}
|
||||
sudo mkdir -p /etc/fetch_ml/monitoring
|
||||
sudo mkdir -p /var/lib/grafana/dashboards
|
||||
|
||||
sudo chown -R $ML_USER:$ML_GROUP $DATA_PATH
|
||||
sudo chmod 755 $DATA_PATH
|
||||
|
||||
echo -e "${GREEN}✓${NC} Directories created"
|
||||
|
||||
# 2. Copy configuration files
|
||||
echo -e "${BLUE}[2/6]${NC} Copying configuration files..."
|
||||
sudo cp monitoring/prometheus.yml /etc/fetch_ml/monitoring/
|
||||
sudo cp monitoring/loki-config.yml /etc/fetch_ml/monitoring/
|
||||
sudo cp monitoring/promtail-config.yml /etc/fetch_ml/monitoring/
|
||||
sudo cp monitoring/grafana/provisioning /etc/fetch_ml/monitoring/ -r
|
||||
sudo cp monitoring/grafana-dashboard.json /var/lib/grafana/dashboards/ml-queue.json
|
||||
sudo cp monitoring/logs-dashboard.json /var/lib/grafana/dashboards/logs.json
|
||||
|
||||
sudo chown -R $ML_USER:$ML_GROUP /etc/fetch_ml/monitoring
|
||||
sudo chown -R $ML_USER:$ML_GROUP /var/lib/grafana
|
||||
|
||||
echo -e "${GREEN}✓${NC} Configuration copied"
|
||||
|
||||
# 3. Create Podman pod
|
||||
echo -e "${BLUE}[3/6]${NC} Creating Podman pod..."
|
||||
sudo -u $ML_USER podman pod create \\
|
||||
--name $POD_NAME \\
|
||||
-p 3000:3000 \\
|
||||
-p 9090:9090 \\
|
||||
-p 3100:3100 \\
|
||||
|| echo "Pod may already exist"
|
||||
|
||||
echo -e "${GREEN}✓${NC} Pod created"
|
||||
|
||||
# 4. Create systemd service for monitoring pod
|
||||
echo -e "${BLUE}[4/6]${NC} Creating systemd services..."
|
||||
|
||||
# Prometheus service
|
||||
sudo tee /etc/systemd/system/prometheus.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Prometheus Monitoring
|
||||
After=network.target
|
||||
PartOf=$POD_NAME-pod.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 9090:9090
|
||||
ExecStart=/usr/bin/podman run --rm --name prometheus \\
|
||||
--pod $POD_NAME \\
|
||||
-v /etc/fetch_ml/monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro \\
|
||||
-v ${DATA_PATH}/prometheus:/prometheus \\
|
||||
docker.io/prom/prometheus:latest \\
|
||||
--config.file=/etc/prometheus/prometheus.yml \\
|
||||
--storage.tsdb.path=/prometheus \\
|
||||
--web.enable-lifecycle
|
||||
|
||||
ExecStop=/usr/bin/podman stop -t 10 prometheus
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Loki service
|
||||
sudo tee /etc/systemd/system/loki.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Loki Log Aggregation
|
||||
After=network.target
|
||||
PartOf=$POD_NAME-pod.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 3100:3100
|
||||
ExecStart=/usr/bin/podman run --rm --name loki \\
|
||||
--pod $POD_NAME \\
|
||||
-v /etc/fetch_ml/monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro \\
|
||||
-v ${DATA_PATH}/loki:/loki \\
|
||||
docker.io/grafana/loki:latest \\
|
||||
-config.file=/etc/loki/local-config.yaml
|
||||
|
||||
ExecStop=/usr/bin/podman stop -t 10 loki
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Grafana service
|
||||
sudo tee /etc/systemd/system/grafana.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Grafana Visualization
|
||||
After=network.target prometheus.service loki.service
|
||||
PartOf=$POD_NAME-pod.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME -p 3000:3000
|
||||
ExecStart=/usr/bin/podman run --rm --name grafana \\
|
||||
--pod $POD_NAME \\
|
||||
-v ${DATA_PATH}/grafana:/var/lib/grafana \\
|
||||
-v /etc/fetch_ml/monitoring/grafana/provisioning:/etc/grafana/provisioning:ro \\
|
||||
-v /var/lib/grafana/dashboards:/var/lib/grafana/dashboards:ro \\
|
||||
-e GF_SECURITY_ADMIN_PASSWORD=\${GRAFANA_ADMIN_PASSWORD:-$(openssl rand -base64 32)} \\
|
||||
-e GF_USERS_ALLOW_SIGN_UP=false \\
|
||||
-e GF_AUTH_ANONYMOUS_ENABLED=false \\
|
||||
docker.io/grafana/grafana:latest
|
||||
|
||||
ExecStop=/usr/bin/podman stop -t 10 grafana
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Promtail service
|
||||
sudo tee /etc/systemd/system/promtail.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=Promtail Log Collector
|
||||
After=network.target loki.service
|
||||
PartOf=$POD_NAME-pod.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
ExecStartPre=/usr/bin/podman pod exists $POD_NAME || /usr/bin/podman pod create --name $POD_NAME
|
||||
ExecStart=/usr/bin/podman run --rm --name promtail \\
|
||||
--pod $POD_NAME \\
|
||||
-v /etc/fetch_ml/monitoring/promtail-config.yml:/etc/promtail/config.yml:ro \\
|
||||
-v /var/log/fetch_ml:/var/log/app:ro \\
|
||||
docker.io/grafana/promtail:latest \\
|
||||
-config.file=/etc/promtail/config.yml
|
||||
|
||||
ExecStop=/usr/bin/podman stop -t 10 promtail
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
echo -e "${GREEN}✓${NC} Systemd services created"
|
||||
|
||||
# 5. Create monitoring pod service
|
||||
echo -e "${BLUE}[5/6]${NC} Creating pod management service..."
|
||||
sudo -u $ML_USER podman generate systemd --new --name $POD_NAME \\
|
||||
| sudo tee /etc/systemd/system/$POD_NAME-pod.service >/dev/null
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
echo -e "${GREEN}✓${NC} Pod service created"
|
||||
|
||||
# 6. Setup firewall rules
|
||||
echo -e "${BLUE}[6/6]${NC} Configuring firewall..."
|
||||
if command -v firewall-cmd &>/dev/null; then
|
||||
# RHEL/Rocky/Fedora (firewalld)
|
||||
sudo firewall-cmd --permanent --add-port=3000/tcp # Grafana
|
||||
sudo firewall-cmd --permanent --add-port=9090/tcp # Prometheus
|
||||
sudo firewall-cmd --reload
|
||||
echo -e "${GREEN}✓${NC} Firewall configured (firewalld)"
|
||||
elif command -v ufw &>/dev/null; then
|
||||
# Ubuntu/Debian (ufw)
|
||||
sudo ufw allow 3000/tcp comment 'Grafana'
|
||||
sudo ufw allow 9090/tcp comment 'Prometheus'
|
||||
echo -e "${GREEN}✓${NC} Firewall configured (ufw)"
|
||||
else
|
||||
echo -e "${YELLOW}!${NC} No firewall detected. You may need to manually open ports 3000 and 9090"
|
||||
fi
|
||||
|
||||
# Summary
|
||||
echo ""
|
||||
echo -e "${BOLD}=== Monitoring Stack Setup Complete! ===${NC}"
|
||||
echo ""
|
||||
echo "Services created:"
|
||||
echo " - prometheus.service (Metrics collection)"
|
||||
echo " - loki.service (Log aggregation)"
|
||||
echo " - grafana.service (Visualization)"
|
||||
echo " - promtail.service (Log shipping)"
|
||||
echo ""
|
||||
echo -e "${BOLD}Next steps:${NC}"
|
||||
echo "1. Start services:"
|
||||
echo " sudo systemctl start prometheus"
|
||||
echo " sudo systemctl start loki"
|
||||
echo " sudo systemctl start promtail"
|
||||
echo " sudo systemctl start grafana"
|
||||
echo ""
|
||||
echo "2. Enable on boot:"
|
||||
echo " sudo systemctl enable prometheus loki promtail grafana"
|
||||
echo ""
|
||||
echo "3. Access Grafana:"
|
||||
echo " http://YOUR_SERVER_IP:3000"
|
||||
echo " Username: admin"
|
||||
echo " Password: admin (change on first login)"
|
||||
echo ""
|
||||
echo "4. Check logs:"
|
||||
echo " sudo journalctl -u prometheus -f"
|
||||
echo " sudo journalctl -u grafana -f"
|
||||
echo ""
|
||||
229
scripts/setup-prod.sh
Executable file
229
scripts/setup-prod.sh
Executable file
|
|
@ -0,0 +1,229 @@
|
|||
#!/bin/bash
|
||||
# Production Setup Script for Rocky Linux (Bare Metal)
|
||||
# This script sets up the complete FetchML environment on bare metal
|
||||
|
||||
set -e
|
||||
|
||||
BOLD='\033[1m'
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
echo -e "${BOLD}=== FetchML Production Setup (Rocky Linux Bare Metal) ===${NC}\n"
|
||||
|
||||
# Configuration
|
||||
BASE_PATH="${1:-/data/ml-experiments}"
|
||||
ML_USER="${2:-ml-user}"
|
||||
ML_GROUP="${3:-ml-group}"
|
||||
|
||||
echo "Configuration:"
|
||||
echo " Base path: $BASE_PATH"
|
||||
echo " ML user: $ML_USER"
|
||||
echo " ML group: $ML_GROUP"
|
||||
echo ""
|
||||
|
||||
# 1. Create system user if it doesn't exist
|
||||
echo -e "${BLUE}[1/8]${NC} Creating system user..."
|
||||
if id "$ML_USER" &>/dev/null; then
|
||||
echo " User $ML_USER already exists"
|
||||
else
|
||||
sudo useradd -r -s /bin/bash -m -d /home/$ML_USER -c "ML System User" $ML_USER
|
||||
echo -e "${GREEN}✓${NC} Created user: $ML_USER"
|
||||
fi
|
||||
|
||||
# 2. Create directory structure
|
||||
echo -e "${BLUE}[2/8]${NC} Creating directory structure..."
|
||||
sudo mkdir -p "${BASE_PATH}"/{experiments,pending,running,finished,failed,datasets}
|
||||
sudo mkdir -p /var/log/fetch_ml
|
||||
sudo mkdir -p /etc/fetch_ml
|
||||
|
||||
echo -e "${GREEN}✓${NC} Created directories:"
|
||||
echo " $BASE_PATH/experiments/"
|
||||
echo " $BASE_PATH/pending/"
|
||||
echo " $BASE_PATH/running/"
|
||||
echo " $BASE_PATH/finished/"
|
||||
echo " $BASE_PATH/failed/"
|
||||
echo " $BASE_PATH/datasets/"
|
||||
echo " /var/log/fetch_ml/"
|
||||
echo " /etc/fetch_ml/"
|
||||
|
||||
# 3. Set ownership and permissions
|
||||
echo -e "${BLUE}[3/8]${NC} Setting permissions..."
|
||||
sudo chown -R $ML_USER:$ML_GROUP $BASE_PATH
|
||||
sudo chmod 755 $BASE_PATH
|
||||
sudo chmod 700 $BASE_PATH/experiments # Restrict experiment data
|
||||
|
||||
sudo chown -R $ML_USER:$ML_GROUP /var/log/fetch_ml
|
||||
sudo chmod 755 /var/log/fetch_ml
|
||||
|
||||
echo -e "${GREEN}✓${NC} Permissions set"
|
||||
|
||||
# 4. Install system dependencies (Rocky Linux)
|
||||
echo -e "${BLUE}[4/8]${NC} Installing system dependencies..."
|
||||
sudo dnf install -y \
|
||||
golang \
|
||||
podman \
|
||||
redis \
|
||||
git \
|
||||
make \
|
||||
gcc \
|
||||
|| echo "Some packages may already be installed"
|
||||
|
||||
echo -e "${GREEN}✓${NC} Dependencies installed"
|
||||
|
||||
# 5. Configure Podman for GPU access (if NVIDIA GPU present)
|
||||
echo -e "${BLUE}[5/8]${NC} Configuring Podman..."
|
||||
if lspci | grep -i nvidia &>/dev/null; then
|
||||
echo " NVIDIA GPU detected, configuring GPU access..."
|
||||
|
||||
# Install nvidia-container-toolkit if not present
|
||||
if ! command -v nvidia-container-toolkit &>/dev/null; then
|
||||
echo " Installing nvidia-container-toolkit..."
|
||||
sudo dnf config-manager --add-repo \
|
||||
https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
|
||||
sudo dnf install -y nvidia-container-toolkit
|
||||
fi
|
||||
|
||||
# Configure Podman CDI
|
||||
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
||||
echo -e "${GREEN}✓${NC} GPU support configured"
|
||||
else
|
||||
echo " No NVIDIA GPU detected, skipping GPU setup"
|
||||
fi
|
||||
|
||||
# 6. Configure Redis
|
||||
echo -e "${BLUE}[6/8]${NC} Configuring Redis..."
|
||||
sudo systemctl enable redis
|
||||
sudo systemctl start redis || echo "Redis may already be running"
|
||||
|
||||
# Set Redis password if not already configured
|
||||
if ! sudo grep -q "^requirepass" /etc/redis/redis.conf 2>/dev/null; then
|
||||
REDIS_PASSWORD=$(openssl rand -base64 32)
|
||||
echo "requirepass $REDIS_PASSWORD" | sudo tee -a /etc/redis/redis.conf >/dev/null
|
||||
sudo systemctl restart redis
|
||||
echo " Generated Redis password: $REDIS_PASSWORD"
|
||||
echo " Save this password for your configuration!"
|
||||
else
|
||||
echo " Redis password already configured"
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓${NC} Redis configured"
|
||||
|
||||
# 7. Setup systemd services
|
||||
echo -e "${BLUE}[7/8]${NC} Creating systemd services..."
|
||||
|
||||
# API Server service
|
||||
sudo tee /etc/systemd/system/fetchml-api.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=FetchML API Server
|
||||
After=network.target redis.service
|
||||
Wants=redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
WorkingDirectory=/opt/fetch_ml
|
||||
ExecStart=/usr/local/bin/fetchml-api -config /etc/fetch_ml/config.yaml
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
StandardOutput=append:/var/log/fetch_ml/api.log
|
||||
StandardError=append:/var/log/fetch_ml/api-error.log
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=$BASE_PATH /var/log/fetch_ml
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# Worker service
|
||||
sudo tee /etc/systemd/system/fetchml-worker.service >/dev/null <<EOF
|
||||
[Unit]
|
||||
Description=FetchML Worker
|
||||
After=network.target redis.service fetchml-api.service
|
||||
Wants=redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=$ML_USER
|
||||
Group=$ML_GROUP
|
||||
WorkingDirectory=/opt/fetch_ml
|
||||
ExecStart=/usr/local/bin/fetchml-worker -config /etc/fetch_ml/worker.toml
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
StandardOutput=append:/var/log/fetch_ml/worker.log
|
||||
StandardError=append:/var/log/fetch_ml/worker-error.log
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=$BASE_PATH /var/log/fetch_ml
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
echo -e "${GREEN}✓${NC} Systemd services created"
|
||||
|
||||
# 8. Setup logrotate
|
||||
echo -e "${BLUE}[8/8]${NC} Configuring log rotation..."
|
||||
sudo tee /etc/logrotate.d/fetchml >/dev/null <<EOF
|
||||
/var/log/fetch_ml/*.log {
|
||||
daily
|
||||
rotate 14
|
||||
compress
|
||||
delaycompress
|
||||
notifempty
|
||||
missingok
|
||||
create 0640 $ML_USER $ML_GROUP
|
||||
sharedscripts
|
||||
postrotate
|
||||
systemctl reload fetchml-api >/dev/null 2>&1 || true
|
||||
systemctl reload fetchml-worker >/dev/null 2>&1 || true
|
||||
endscript
|
||||
}
|
||||
EOF
|
||||
|
||||
echo -e "${GREEN}✓${NC} Log rotation configured"
|
||||
|
||||
# Summary
|
||||
echo ""
|
||||
echo -e "${BOLD}=== Setup Complete! ===${NC}"
|
||||
echo ""
|
||||
echo "Directory structure created at: $BASE_PATH"
|
||||
echo "Logs will be written to: /var/log/fetch_ml/"
|
||||
echo "Configuration directory: /etc/fetch_ml/"
|
||||
echo ""
|
||||
echo -e "${BOLD}Next steps:${NC}"
|
||||
echo "1. Copy your config files:"
|
||||
echo " sudo cp configs/config-prod.yaml /etc/fetch_ml/config.yaml"
|
||||
echo " sudo cp configs/worker-prod.toml /etc/fetch_ml/worker.toml"
|
||||
echo ""
|
||||
echo "2. Build and install binaries:"
|
||||
echo " make build"
|
||||
echo " sudo cp bin/api-server /usr/local/bin/fetchml-api"
|
||||
echo " sudo cp bin/worker /usr/local/bin/fetchml-worker"
|
||||
echo ""
|
||||
echo "3. Update config files with your settings (Redis password, API keys, etc.)"
|
||||
echo ""
|
||||
echo "4. Start services:"
|
||||
echo " sudo systemctl start fetchml-api"
|
||||
echo " sudo systemctl start fetchml-worker"
|
||||
echo ""
|
||||
echo "5. Enable services to start on boot:"
|
||||
echo " sudo systemctl enable fetchml-api"
|
||||
echo " sudo systemctl enable fetchml-worker"
|
||||
echo ""
|
||||
echo "6. Check status:"
|
||||
echo " sudo systemctl status fetchml-api"
|
||||
echo " sudo systemctl status fetchml-worker"
|
||||
echo " sudo journalctl -u fetchml-api -f"
|
||||
echo ""
|
||||
0
scripts/setup-production.sh
Normal file
0
scripts/setup-production.sh
Normal file
204
scripts/validate-prod-config.sh
Executable file
204
scripts/validate-prod-config.sh
Executable file
|
|
@ -0,0 +1,204 @@
|
|||
#!/bin/bash
|
||||
# Production Configuration Validator
|
||||
# Verifies all paths and configs are consistent for experiment lifecycle
|
||||
|
||||
set -e
|
||||
|
||||
BOLD='\033[1m'
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo -e "${BOLD}=== FetchML Production Configuration Validator ===${NC}\n"
|
||||
|
||||
# Configuration file paths
|
||||
API_CONFIG="${1:-configs/config-prod.yaml}"
|
||||
WORKER_CONFIG="${2:-configs/worker-prod.toml}"
|
||||
|
||||
errors=0
|
||||
warnings=0
|
||||
|
||||
check_pass() {
|
||||
echo -e "${GREEN}✓${NC} $1"
|
||||
}
|
||||
|
||||
check_fail() {
|
||||
echo -e "${RED}✗${NC} $1"
|
||||
((errors++))
|
||||
}
|
||||
|
||||
check_warn() {
|
||||
echo -e "${YELLOW}⚠${NC} $1"
|
||||
((warnings++))
|
||||
}
|
||||
|
||||
# 1. Check API server config exists
|
||||
echo -e "${BOLD}Checking API Server Configuration${NC}"
|
||||
if [ ! -f "$API_CONFIG" ]; then
|
||||
check_fail "API config not found: $API_CONFIG"
|
||||
else
|
||||
check_pass "API config found: $API_CONFIG"
|
||||
|
||||
# Extract base_path from API config
|
||||
API_BASE_PATH=$(grep 'base_path:' "$API_CONFIG" | head -1 | awk '{print $2}' | tr -d '"')
|
||||
echo " Base path: $API_BASE_PATH"
|
||||
|
||||
# Check if path is absolute
|
||||
if [[ "$API_BASE_PATH" != /* ]]; then
|
||||
check_fail "base_path must be absolute: $API_BASE_PATH"
|
||||
else
|
||||
check_pass "base_path is absolute"
|
||||
fi
|
||||
|
||||
# Check Redis config
|
||||
if grep -q 'redis:' "$API_CONFIG"; then
|
||||
check_pass "Redis configuration present"
|
||||
else
|
||||
check_fail "Redis configuration missing"
|
||||
fi
|
||||
|
||||
# Check auth enabled
|
||||
if grep -q 'enabled: true' "$API_CONFIG"; then
|
||||
check_pass "Authentication enabled"
|
||||
else
|
||||
check_warn "Authentication disabled (not recommended for production)"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 2. Check Worker config (if provided)
|
||||
if [ -f "$WORKER_CONFIG" ]; then
|
||||
echo -e "${BOLD}Checking Worker Configuration${NC}"
|
||||
check_pass "Worker config found: $WORKER_CONFIG"
|
||||
|
||||
# Extract base_path from worker config
|
||||
WORKER_BASE_PATH=$(grep 'base_path' "$WORKER_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||
echo " Base path: $WORKER_BASE_PATH"
|
||||
|
||||
# Compare paths
|
||||
if [ "$API_BASE_PATH" = "$WORKER_BASE_PATH" ]; then
|
||||
check_pass "API and Worker base_path match"
|
||||
else
|
||||
check_fail "base_path mismatch! API: $API_BASE_PATH, Worker: $WORKER_BASE_PATH"
|
||||
fi
|
||||
|
||||
# Check podman_image configured
|
||||
if grep -q 'podman_image' "$WORKER_CONFIG"; then
|
||||
PODMAN_IMAGE=$(grep 'podman_image' "$WORKER_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||
check_pass "Podman image configured: $PODMAN_IMAGE"
|
||||
else
|
||||
check_fail "podman_image not configured"
|
||||
fi
|
||||
else
|
||||
check_warn "Worker config not found: $WORKER_CONFIG (optional for API server only)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 3. Check directory structure (if base_path exists)
|
||||
if [ -n "$API_BASE_PATH" ] && [ -d "$API_BASE_PATH" ]; then
|
||||
echo -e "${BOLD}Checking Directory Structure${NC}"
|
||||
check_pass "Base directory exists: $API_BASE_PATH"
|
||||
|
||||
# Check subdirectories
|
||||
for dir in experiments pending running finished failed; do
|
||||
if [ -d "$API_BASE_PATH/$dir" ]; then
|
||||
check_pass "$dir/ directory exists"
|
||||
else
|
||||
check_warn "$dir/ directory missing (will be created automatically)"
|
||||
fi
|
||||
done
|
||||
|
||||
# Check permissions
|
||||
if [ -w "$API_BASE_PATH" ]; then
|
||||
check_pass "Base directory is writable"
|
||||
else
|
||||
check_fail "Base directory is not writable (check permissions)"
|
||||
fi
|
||||
|
||||
elif [ -n "$API_BASE_PATH" ]; then
|
||||
check_warn "Base directory does not exist: $API_BASE_PATH (will need to be created)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 4. Check Redis connectivity (if server is running)
|
||||
echo -e "${BOLD}Checking Redis Connectivity${NC}"
|
||||
if command -v redis-cli &> /dev/null; then
|
||||
if redis-cli ping &> /dev/null; then
|
||||
check_pass "Redis server is running and accessible"
|
||||
|
||||
# Check queue
|
||||
QUEUE_SIZE=$(redis-cli llen fetchml:tasks:queue 2>/dev/null || echo "0")
|
||||
echo " Queue size: $QUEUE_SIZE tasks"
|
||||
else
|
||||
check_warn "Redis server not accessible (start with: redis-server)"
|
||||
fi
|
||||
else
|
||||
check_warn "redis-cli not installed (cannot verify Redis connectivity)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 5. Check Podman (if worker config exists)
|
||||
if [ -f "$WORKER_CONFIG" ]; then
|
||||
echo -e "${BOLD}Checking Podman${NC}"
|
||||
if command -v podman &> /dev/null; then
|
||||
check_pass "Podman is installed"
|
||||
|
||||
# Check if image exists
|
||||
if [ -n "$PODMAN_IMAGE" ]; then
|
||||
if podman image exists "$PODMAN_IMAGE" 2>/dev/null; then
|
||||
check_pass "Podman image exists: $PODMAN_IMAGE"
|
||||
else
|
||||
check_warn "Podman image not found: $PODMAN_IMAGE (needs to be built)"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check GPU access (if configured)
|
||||
if grep -q 'gpu_access.*true' "$WORKER_CONFIG" 2>/dev/null; then
|
||||
if podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.8.0-base nvidia-smi &>/dev/null; then
|
||||
check_pass "GPU access working"
|
||||
else
|
||||
check_warn "GPU access configured but not working (check nvidia-container-toolkit)"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
check_fail "Podman not installed (required for worker)"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 6. Check CLI config consistency
|
||||
echo -e "${BOLD}Checking CLI Configuration${NC}"
|
||||
CLI_CONFIG="$HOME/.ml/config.toml"
|
||||
if [ -f "$CLI_CONFIG" ]; then
|
||||
check_pass "CLI config found: $CLI_CONFIG"
|
||||
|
||||
CLI_BASE=$(grep 'worker_base' "$CLI_CONFIG" | awk -F '=' '{print $2}' | tr -d ' "')
|
||||
if [ "$CLI_BASE" = "$API_BASE_PATH" ]; then
|
||||
check_pass "CLI worker_base matches server base_path"
|
||||
else
|
||||
check_warn "CLI worker_base ($CLI_BASE) differs from server ($API_BASE_PATH)"
|
||||
fi
|
||||
else
|
||||
check_warn "CLI config not found (run: ml init)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Summary
|
||||
echo -e "${BOLD}=== Summary ===${NC}"
|
||||
if [ $errors -eq 0 ] && [ $warnings -eq 0 ]; then
|
||||
echo -e "${GREEN}All checks passed! Configuration is ready for production.${NC}"
|
||||
exit 0
|
||||
elif [ $errors -eq 0 ]; then
|
||||
echo -e "${YELLOW}Configuration has $warnings warning(s). Review before deployment.${NC}"
|
||||
exit 0
|
||||
else
|
||||
echo -e "${RED}Configuration has $errors error(s) and $warnings warning(s). Fix before deployment.${NC}"
|
||||
exit 1
|
||||
fi
|
||||
313
setup.sh
Executable file
313
setup.sh
Executable file
|
|
@ -0,0 +1,313 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Balanced Homelab Setup Script
|
||||
# Keeps essential security (Fail2Ban, monitoring) while simplifying complexity
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
print_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
print_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
print_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
print_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# Simple dependency check
|
||||
check_deps() {
|
||||
print_info "Checking dependencies..."
|
||||
|
||||
local missing=()
|
||||
|
||||
if ! command -v go &> /dev/null; then
|
||||
missing+=("go")
|
||||
fi
|
||||
|
||||
if ! command -v zig &> /dev/null; then
|
||||
missing+=("zig")
|
||||
fi
|
||||
|
||||
if ! command -v redis-server &> /dev/null; then
|
||||
missing+=("redis-server")
|
||||
fi
|
||||
|
||||
if ! command -v docker &> /dev/null; then
|
||||
missing+=("docker")
|
||||
fi
|
||||
|
||||
if [[ ${#missing[@]} -gt 0 ]]; then
|
||||
print_error "Missing dependencies: ${missing[*]}"
|
||||
echo ""
|
||||
echo "Install with:"
|
||||
echo " macOS: brew install ${missing[*]}"
|
||||
echo " Ubuntu: sudo apt-get install ${missing[*]}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
print_success "Dependencies OK"
|
||||
}
|
||||
|
||||
# Simple setup
|
||||
setup_project() {
|
||||
print_info "Setting up project..."
|
||||
|
||||
# Create essential directories
|
||||
mkdir -p ssl logs configs data monitoring
|
||||
|
||||
# Generate simple SSL cert
|
||||
if [[ ! -f "ssl/cert.pem" ]]; then
|
||||
openssl req -x509 -newkey rsa:2048 -keyout ssl/key.pem -out ssl/cert.pem \
|
||||
-days 365 -nodes -subj "/C=US/ST=State/L=City/O=Homelab/CN=localhost" \
|
||||
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null
|
||||
print_success "SSL certificates generated"
|
||||
fi
|
||||
|
||||
# Create balanced config
|
||||
cat > configs/config.yaml << 'EOF'
|
||||
base_path: "./data/experiments"
|
||||
|
||||
auth:
|
||||
enabled: true
|
||||
api_keys:
|
||||
homelab_user:
|
||||
hash: "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8" # "password"
|
||||
admin: true
|
||||
roles: ["user", "admin"]
|
||||
permissions:
|
||||
read: true
|
||||
write: true
|
||||
delete: true
|
||||
|
||||
server:
|
||||
address: ":9101"
|
||||
tls:
|
||||
enabled: true
|
||||
cert_file: "./ssl/cert.pem"
|
||||
key_file: "./ssl/key.pem"
|
||||
|
||||
security:
|
||||
rate_limit:
|
||||
enabled: true
|
||||
requests_per_minute: 30
|
||||
burst_size: 5
|
||||
ip_whitelist:
|
||||
- "127.0.0.1"
|
||||
- "::1"
|
||||
- "192.168.0.0/16"
|
||||
- "10.0.0.0/8"
|
||||
- "172.16.0.0/12"
|
||||
failed_login_lockout:
|
||||
enabled: true
|
||||
max_attempts: 3
|
||||
lockout_duration: "15m"
|
||||
|
||||
redis:
|
||||
url: "redis://localhost:6379"
|
||||
|
||||
logging:
|
||||
level: "info"
|
||||
file: "./logs/app.log"
|
||||
audit_log: "./logs/audit.log"
|
||||
access_log: "./logs/access.log"
|
||||
|
||||
monitoring:
|
||||
enabled: true
|
||||
metrics_port: 9090
|
||||
health_check_interval: "30s"
|
||||
EOF
|
||||
|
||||
print_success "Configuration created"
|
||||
}
|
||||
|
||||
# Simple build
|
||||
build_project() {
|
||||
print_info "Building project..."
|
||||
|
||||
# Build Go apps
|
||||
go build -o bin/api-server ./cmd/api-server
|
||||
go build -o bin/worker ./cmd/worker
|
||||
go build -o bin/tui ./cmd/tui
|
||||
|
||||
# Build Zig CLI
|
||||
cd cli && zig build && cd ..
|
||||
|
||||
print_success "Build completed"
|
||||
}
|
||||
|
||||
# Setup Fail2Ban
|
||||
setup_fail2ban() {
|
||||
print_info "Setting up Fail2Ban..."
|
||||
|
||||
if ! command -v fail2ban-server &> /dev/null; then
|
||||
print_warning "Fail2Ban not installed, skipping..."
|
||||
return
|
||||
fi
|
||||
|
||||
# Create Fail2Ban configuration
|
||||
sudo mkdir -p /etc/fail2ban/jail.d 2>/dev/null || true
|
||||
|
||||
cat > /tmp/ml-experiments-jail.conf << 'EOF'
|
||||
[DEFAULT]
|
||||
bantime = 3600
|
||||
findtime = 600
|
||||
maxretry = 3
|
||||
backend = systemd
|
||||
|
||||
[sshd]
|
||||
enabled = true
|
||||
port = ssh
|
||||
logpath = /var/log/auth.log
|
||||
maxretry = 3
|
||||
|
||||
[ml-experiments-api]
|
||||
enabled = true
|
||||
port = 9101
|
||||
filter = ml-experiments-api
|
||||
logpath = ./logs/audit.log
|
||||
maxretry = 5
|
||||
bantime = 7200
|
||||
|
||||
[ml-experiments-auth]
|
||||
enabled = true
|
||||
filter = ml-experiments-auth
|
||||
logpath = ./logs/audit.log
|
||||
maxretry = 3
|
||||
bantime = 3600
|
||||
EOF
|
||||
|
||||
# Create filter definitions
|
||||
cat > /tmp/ml-experiments-api.conf << 'EOF'
|
||||
[Definition]
|
||||
failregex = ^.*<HOST>.*"status":40[13].*$
|
||||
ignoreregex =
|
||||
EOF
|
||||
|
||||
cat > /tmp/ml-experiments-auth.conf << 'EOF'
|
||||
[Definition]
|
||||
failregex = ^.*"event":"failed_login".*"client_ip":"<HOST>".*$
|
||||
ignoreregex =
|
||||
EOF
|
||||
|
||||
# Try to install configurations
|
||||
if sudo cp /tmp/ml-experiments-jail.conf /etc/fail2ban/jail.d/ 2>/dev/null; then
|
||||
sudo cp /tmp/ml-experiments-*.conf /etc/fail2ban/filter.d/ 2>/dev/null || true
|
||||
sudo systemctl restart fail2ban 2>/dev/null || true
|
||||
print_success "Fail2Ban configured"
|
||||
else
|
||||
print_warning "Could not configure Fail2Ban (requires sudo)"
|
||||
fi
|
||||
|
||||
rm -f /tmp/ml-experiments-*.conf
|
||||
}
|
||||
|
||||
# Setup Redis
|
||||
setup_redis() {
|
||||
print_info "Setting up Redis..."
|
||||
|
||||
if ! pgrep -f "redis-server" > /dev/null; then
|
||||
redis-server --daemonize yes --port 6379
|
||||
print_success "Redis started"
|
||||
else
|
||||
print_info "Redis already running"
|
||||
fi
|
||||
}
|
||||
|
||||
# Create simple management script
|
||||
create_manage_script() {
|
||||
cat > manage.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
|
||||
# Simple management script
|
||||
|
||||
case "${1:-status}" in
|
||||
"start")
|
||||
echo "Starting services..."
|
||||
redis-server --daemonize yes --port 6379 2>/dev/null || true
|
||||
./bin/api-server -config configs/config.yaml &
|
||||
echo "Services started"
|
||||
;;
|
||||
"stop")
|
||||
echo "Stopping services..."
|
||||
pkill -f "api-server" || true
|
||||
redis-cli shutdown 2>/dev/null || true
|
||||
echo "Services stopped"
|
||||
;;
|
||||
"status")
|
||||
echo "=== Status ==="
|
||||
if pgrep -f "redis-server" > /dev/null; then
|
||||
echo "✅ Redis: Running"
|
||||
else
|
||||
echo "❌ Redis: Stopped"
|
||||
fi
|
||||
|
||||
if pgrep -f "api-server" > /dev/null; then
|
||||
echo "✅ API Server: Running"
|
||||
else
|
||||
echo "❌ API Server: Stopped"
|
||||
fi
|
||||
;;
|
||||
"logs")
|
||||
echo "=== Recent Logs ==="
|
||||
tail -20 logs/app.log 2>/dev/null || echo "No logs yet"
|
||||
;;
|
||||
"test")
|
||||
echo "=== Testing ==="
|
||||
curl -k -s https://localhost:9101/health || echo "API server not responding"
|
||||
;;
|
||||
*)
|
||||
echo "Usage: $0 {start|stop|status|logs|test}"
|
||||
;;
|
||||
esac
|
||||
EOF
|
||||
|
||||
chmod +x manage.sh
|
||||
print_success "Management script created"
|
||||
}
|
||||
|
||||
# Show next steps
|
||||
show_next_steps() {
|
||||
print_success "Setup completed!"
|
||||
echo ""
|
||||
echo "🎉 Setup complete!"
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " 1. Start services: ./tools/manage.sh start"
|
||||
echo " 2. Check status: ./tools/manage.sh status"
|
||||
echo " 3. Test API: curl -k -H 'X-API-Key: password' https://localhost:9101/health"
|
||||
echo ""
|
||||
echo "Configuration: configs/config.yaml"
|
||||
echo "Logs: logs/app.log and logs/audit.log"
|
||||
echo ""
|
||||
print_success "Ready for homelab use!"
|
||||
}
|
||||
|
||||
# Main setup
|
||||
main() {
|
||||
echo "ML Experiment Manager - Homelab Setup"
|
||||
echo "====================================="
|
||||
echo ""
|
||||
|
||||
check_deps
|
||||
setup_project
|
||||
build_project
|
||||
setup_redis
|
||||
create_manage_script
|
||||
show_next_steps
|
||||
}
|
||||
|
||||
main "$@"
|
||||
396
tools/manage.sh
Executable file
396
tools/manage.sh
Executable file
|
|
@ -0,0 +1,396 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Project Management Script for ML Experiment Manager
|
||||
# Provides unified interface for managing all components
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
PURPLE='\033[0;35m'
|
||||
CYAN='\033[0;36m'
|
||||
NC='\033[0m'
|
||||
|
||||
print_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
print_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
print_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
print_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
print_header() {
|
||||
echo -e "${PURPLE}$1${NC}"
|
||||
}
|
||||
|
||||
print_app() {
|
||||
echo -e "${CYAN}$1${NC}"
|
||||
}
|
||||
|
||||
show_status() {
|
||||
print_header "ML Experiment Manager Status"
|
||||
echo "=================================="
|
||||
echo ""
|
||||
|
||||
# Check Go apps
|
||||
print_app "Go Applications:"
|
||||
local go_apps=("api-server" "worker" "tui" "data_manager" "user_manager")
|
||||
for app in "${go_apps[@]}"; do
|
||||
if [[ -f "bin/$app" ]]; then
|
||||
echo " ✅ $app: Built"
|
||||
else
|
||||
echo " ❌ $app: Not built"
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
|
||||
# Check Zig CLI
|
||||
print_app "Zig CLI:"
|
||||
if [[ -f "cli/zig-out/bin/ml" ]]; then
|
||||
echo " ✅ CLI: Built"
|
||||
else
|
||||
echo " ❌ CLI: Not built"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Check services
|
||||
print_app "Services:"
|
||||
if command -v redis-cli &> /dev/null; then
|
||||
if redis-cli ping | grep -q "PONG"; then
|
||||
echo " ✅ Redis: Running"
|
||||
else
|
||||
echo " ⚠️ Redis: Not running"
|
||||
fi
|
||||
else
|
||||
echo " ❌ Redis: Not installed"
|
||||
fi
|
||||
|
||||
if command -v docker &> /dev/null; then
|
||||
echo " ✅ Docker: Available"
|
||||
else
|
||||
echo " ❌ Docker: Not installed"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Check configuration
|
||||
print_app "Configuration:"
|
||||
if [[ -f "configs/config-local.yaml" ]]; then
|
||||
echo " ✅ Security config: Found"
|
||||
else
|
||||
echo " ⚠️ Security config: Not found"
|
||||
fi
|
||||
|
||||
if [[ -f ".env.dev" ]]; then
|
||||
echo " ✅ Environment: Found"
|
||||
else
|
||||
echo " ⚠️ Environment: Not found"
|
||||
fi
|
||||
|
||||
if [[ -f "ssl/cert.pem" && -f "ssl/key.pem" ]]; then
|
||||
echo " ✅ SSL certificates: Found"
|
||||
else
|
||||
echo " ⚠️ SSL certificates: Not found"
|
||||
fi
|
||||
echo ""
|
||||
}
|
||||
|
||||
build_all() {
|
||||
print_header "Building All Components"
|
||||
echo "============================="
|
||||
echo ""
|
||||
|
||||
print_info "Building Go applications..."
|
||||
make build
|
||||
|
||||
if command -v zig &> /dev/null; then
|
||||
print_info "Building Zig CLI..."
|
||||
make cli-build
|
||||
else
|
||||
print_warning "Zig not found, skipping CLI build"
|
||||
fi
|
||||
|
||||
print_success "Build completed!"
|
||||
}
|
||||
|
||||
test_all() {
|
||||
print_header "Running All Tests"
|
||||
echo "===================="
|
||||
echo ""
|
||||
|
||||
print_info "Running main test suite..."
|
||||
make test
|
||||
|
||||
print_info "Running comprehensive tests..."
|
||||
make test-all
|
||||
|
||||
print_success "All tests completed!"
|
||||
}
|
||||
|
||||
start_services() {
|
||||
print_header "Starting Services"
|
||||
echo "==================="
|
||||
echo ""
|
||||
|
||||
# Start Redis if available
|
||||
if command -v redis-server &> /dev/null; then
|
||||
if ! pgrep -f "redis-server" > /dev/null; then
|
||||
print_info "Starting Redis..."
|
||||
redis-server --daemonize yes --port 6379
|
||||
print_success "Redis started"
|
||||
else
|
||||
print_info "Redis already running"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Start API server if built
|
||||
if [[ -f "bin/api-server" ]]; then
|
||||
print_info "Starting API server..."
|
||||
if [[ -f "configs/config-local.yaml" ]]; then
|
||||
./bin/api-server --config configs/config-local.yaml &
|
||||
else
|
||||
print_warning "No config found, using defaults"
|
||||
./bin/api-server &
|
||||
fi
|
||||
print_success "API server started (PID: $!)"
|
||||
else
|
||||
print_error "API server not built. Run 'make build' first."
|
||||
fi
|
||||
|
||||
print_success "Services started!"
|
||||
}
|
||||
|
||||
check_health() {
|
||||
print_header "API Health Check"
|
||||
echo "=================="
|
||||
echo ""
|
||||
|
||||
print_info "Checking if API port is open..."
|
||||
|
||||
# First check if port 9101 is open
|
||||
if ! nc -z localhost 9101 2>/dev/null; then
|
||||
print_error "API port 9101 not open - is it running?"
|
||||
print_info "Start with: ./tools/manage.sh start"
|
||||
return 1
|
||||
fi
|
||||
|
||||
print_info "Port 9101 is open, checking API health endpoint..."
|
||||
|
||||
# Try the health endpoint
|
||||
response=$(curl -k -s --max-time 3 -H 'X-API-Key: password' -H 'X-Forwarded-For: 127.0.0.1' https://localhost:9101/health 2>/dev/null)
|
||||
|
||||
if [[ "$response" == "OK" ]]; then
|
||||
print_success "API is healthy: $response"
|
||||
elif [[ "$response" == *"IP not whitelisted"* ]]; then
|
||||
print_warning "API running but IP not whitelisted (expected behavior)"
|
||||
print_info "Try: curl -k -H 'X-API-Key: password' -H 'X-Forwarded-For: 127.0.0.1' https://localhost:9101/health"
|
||||
else
|
||||
print_error "Unexpected response: $response"
|
||||
fi
|
||||
}
|
||||
|
||||
stop_services() {
|
||||
print_header "Stopping Services"
|
||||
echo "=================="
|
||||
echo ""
|
||||
|
||||
# Stop API server
|
||||
if pgrep -f "api-server" > /dev/null; then
|
||||
print_info "Stopping API server..."
|
||||
pkill -f "api-server"
|
||||
print_success "API server stopped"
|
||||
fi
|
||||
|
||||
# Stop Redis
|
||||
if command -v redis-cli &> /dev/null; then
|
||||
print_info "Stopping Redis..."
|
||||
redis-cli shutdown 2>/dev/null || true
|
||||
print_success "Redis stopped"
|
||||
fi
|
||||
|
||||
print_success "All services stopped!"
|
||||
}
|
||||
|
||||
run_security() {
|
||||
print_header "Security Management"
|
||||
echo "===================="
|
||||
echo ""
|
||||
|
||||
case "${1:-check}" in
|
||||
"check")
|
||||
print_info "Running security checks..."
|
||||
make security-check
|
||||
;;
|
||||
"monitor")
|
||||
print_info "Starting security monitoring..."
|
||||
make security-monitor
|
||||
;;
|
||||
"deploy")
|
||||
print_info "Deploying with security..."
|
||||
make security-deploy
|
||||
;;
|
||||
"audit")
|
||||
print_info "Running security audit..."
|
||||
make security-audit
|
||||
;;
|
||||
*)
|
||||
echo "Usage: $0 security {check|monitor|deploy|audit}"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
run_development() {
|
||||
print_header "Development Environment"
|
||||
echo "========================="
|
||||
echo ""
|
||||
|
||||
case "${1:-setup}" in
|
||||
"setup")
|
||||
print_info "Setting up development environment..."
|
||||
./scripts/auto_setup.sh
|
||||
;;
|
||||
"quick")
|
||||
print_info "Running quick start..."
|
||||
./scripts/quick_start.sh
|
||||
;;
|
||||
"deps")
|
||||
print_info "Installing dependencies..."
|
||||
make install-deps
|
||||
;;
|
||||
*)
|
||||
echo "Usage: $0 dev {setup|quick|deps}"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
show_logs() {
|
||||
print_header "Application Logs"
|
||||
echo "=================="
|
||||
echo ""
|
||||
|
||||
# Show application logs
|
||||
if [[ -f "logs/fetch_ml.log" ]]; then
|
||||
print_app "Application Log:"
|
||||
tail -20 logs/fetch_ml.log
|
||||
echo ""
|
||||
fi
|
||||
|
||||
if [[ -f "logs/audit.log" ]]; then
|
||||
print_app "Security Log:"
|
||||
tail -20 logs/audit.log
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Show Docker logs if running
|
||||
if command -v docker &> /dev/null; then
|
||||
local containers=$(docker ps --format "table {{.Names}}" | grep "ml-experiment" || true)
|
||||
if [[ -n "$containers" ]]; then
|
||||
print_app "Docker Logs:"
|
||||
docker logs --tail=20 $(echo "$containers" | tail -1) 2>/dev/null || true
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
cleanup() {
|
||||
print_header "Cleanup Project"
|
||||
echo "================"
|
||||
echo ""
|
||||
|
||||
print_info "Cleaning project artifacts..."
|
||||
make clean-all
|
||||
|
||||
print_info "Stopping services..."
|
||||
stop_services
|
||||
|
||||
print_success "Cleanup completed!"
|
||||
}
|
||||
|
||||
show_help() {
|
||||
print_header "Project Management Script"
|
||||
echo "==========================="
|
||||
echo ""
|
||||
echo "Usage: ./tools/manage.sh {status|build|test|start|stop|health|security|dev|logs|cleanup|help}"
|
||||
echo ""
|
||||
echo "Commands:"
|
||||
echo " status - Show project status"
|
||||
echo " build - Build all components"
|
||||
echo " test - Run all tests"
|
||||
echo " start - Start all services"
|
||||
echo " stop - Stop all services"
|
||||
echo " health - Check API health endpoint"
|
||||
echo " security - Security management (check|monitor|deploy|audit)"
|
||||
echo " dev - Development environment (setup|quick|deps)"
|
||||
echo " logs - Show application logs"
|
||||
echo " cleanup - Clean project artifacts and stop services"
|
||||
echo " help - Show this help"
|
||||
echo ""
|
||||
echo "Examples:"
|
||||
echo " $0 status # Show current status"
|
||||
echo " $0 health # Check API health"
|
||||
echo " $0 build && $0 test # Build and test everything"
|
||||
echo " $0 start # Start all services"
|
||||
echo " $0 security monitor # Start security monitoring"
|
||||
echo " $0 dev setup # Setup development environment"
|
||||
echo ""
|
||||
echo "Quick Start:"
|
||||
echo " $0 dev setup && $0 start && $0 status"
|
||||
}
|
||||
|
||||
# Main function
|
||||
main() {
|
||||
case "${1:-help}" in
|
||||
"status")
|
||||
show_status
|
||||
;;
|
||||
"build")
|
||||
build_all
|
||||
;;
|
||||
"test")
|
||||
test_all
|
||||
;;
|
||||
"start")
|
||||
start_services
|
||||
;;
|
||||
"stop")
|
||||
stop_services
|
||||
;;
|
||||
"health")
|
||||
check_health
|
||||
;;
|
||||
"security")
|
||||
run_security "${2:-check}"
|
||||
;;
|
||||
"dev")
|
||||
run_development "${2:-setup}"
|
||||
;;
|
||||
"logs")
|
||||
show_logs
|
||||
;;
|
||||
"cleanup")
|
||||
cleanup
|
||||
;;
|
||||
"help"|"-h"|"--help")
|
||||
show_help
|
||||
;;
|
||||
*)
|
||||
print_error "Unknown command: $1"
|
||||
echo "Use '$0 help' for usage information"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main "$@"
|
||||
Loading…
Reference in a new issue