# infra ## Overview This repo manages two hosts: - `web` (`jfraeys.com`) - `services` (`services.jfraeys.com`) The routing convention is `service.server.jfraeys.com`. Examples: - `git.jfraeys.com` -> services host (Forgejo) - `auth.jfraeys.com` -> services host (Authelia) - `app.jfraeys.com` -> services host (App) Traefik runs on both servers and routes only the services running on that server. ## Quickstart This repo is intended to be driven by `setup`: ```bash ./setup ``` For options: ```bash ./setup --help ``` What it does: - Applies Terraform from `terraform/` - Writes `inventory/hosts.yml` and `inventory/host_vars/web.yml` (gitignored) - Runs `playbooks/services.yml` and `playbooks/web.yml` If you want Terraform only: ```bash ./setup --no-ansible ``` If you want Ansible only (requires an existing `inventory/hosts.yml`): ```bash ./setup --ansible-only ``` ## Prereqs (local) - `terraform` - `ansible` - `python3` (for helper scripts) - `pip` / `python3 -m pip` - SSH access to the hosts If your SSH key is passphrase-protected, you must load it into your agent before running Ansible non-interactively: ```bash ssh-add --apple-use-keychain ~/.ssh/id_ed25519 ``` ## DNS (Cloudflare) Create A/CNAME records that point to the correct server IP. **Active records:** - `jfraeys.com` -> A record to web server IPv4 - `services.jfraeys.com` -> A record to services server IPv4 - `git.jfraeys.com` -> A/CNAME to services (Forgejo) - `auth.jfraeys.com` -> A/CNAME to services (Authelia) - `app.jfraeys.com` -> A/CNAME to services (App) **Commented out (unused):** - `grafana.jfraeys.com` -> A/CNAME to services (Grafana - currently disabled) - `prometheus.jfraeys.com` -> A/CNAME to services (Prometheus - currently disabled) To enable, uncomment the records in `terraform/main.tf`. ## TLS Traefik uses Let’s Encrypt via Cloudflare DNS-01. You must provide a Cloudflare API token in your local environment when running Ansible: - `CF_DNS_API_TOKEN` - `CF_ZONE_API_TOKEN` ## SSO (Authelia OIDC) Authelia is exposed at: - `https://auth.jfraeys.com` (issuer) - `https://auth.jfraeys.com/.well-known/openid-configuration` (discovery) Grafana is configured via `roles/grafana` using the Generic OAuth provider. Forgejo is configured via `roles/forgejo` using the Forgejo admin CLI with `--provider=openidConnect` and `--auto-discover-url`. Note: Forgejo pages that ask for an "OpenID URI" are legacy OpenID 2.0 and are not used for OIDC. ## Email (Postfix + Postmark) Transactional email is delivered via Postfix relay to Postmark: - **Sender**: `notifications@jfraeys.com` - **Relay**: `smtp.postmarkapp.com:2525` - **Auth**: Server token authentication Services using email: - Authelia (password resets) - Alertmanager (monitoring alerts) - Forgejo (CI/CD notifications) ### DNS Records for Email Terraform manages these Cloudflare records: | Record | Type | Purpose | |--------|------|---------| | `YYYYMMDDDDpm._domainkey` | TXT | DKIM signature | | `pm-bounces` | CNAME | Return-path for bounces | | `_dmarc` | TXT | DMARC policy | Postmark validates these during account setup. ### Vault Variables Add to `secrets/vault.yml`: **Email (Postfix + Postmark):** ```yaml POSTFIX_RELAYHOST_USERNAME: "your-postmark-server-token" POSTFIX_RELAYHOST_PASSWORD: "your-postmark-server-token" AUTHELIA_SMTP_SENDER: "notifications@jfraeys.com" AUTHELIA_SMTP_IDENTIFIER: "jfraeys.com" ``` **Backups (Restic):** ```yaml RESTIC_REPOSITORY: "s3:https://us-east-1.linodeobjects.com/mybucket/backups" RESTIC_PASSWORD: "strong-encryption-password" RESTIC_AWS_ACCESS_KEY_ID: "your-linode-access-key" RESTIC_AWS_SECRET_ACCESS_KEY: "your-linode-secret-key" # Optional: RESTIC_AWS_DEFAULT_REGION: "us-east-1" RESTIC_KEEP_DAILY: 7 RESTIC_KEEP_WEEKLY: 4 RESTIC_KEEP_MONTHLY: 6 INFRA_BACKUP_ONCALENDAR: "daily" # systemd calendar spec ``` **Alerting (set exactly one):** ```yaml # Slack option: ALERTMANAGER_SLACK_WEBHOOK_URL: "https://hooks.slack.com/services/..." ALERTMANAGER_SLACK_CHANNEL: "#alerts" ALERTMANAGER_SLACK_USERNAME: "alertmanager" # Discord option: ALERTMANAGER_DISCORD_WEBHOOK_URL: "https://discord.com/api/webhooks/..." ``` ## Secrets (Ansible Vault) Secrets are stored in `secrets/vault.yml` (encrypted). Create your vault from the template: - `secrets/vault.example.yml` -> `secrets/vault.yml` Run playbooks with either: - `--ask-vault-pass` - or a local password file (not committed): `--vault-password-file .vault_pass` Notes: - `secrets/vault.yml` is intentionally gitignored - `inventory/hosts.yml` and `inventory/host_vars/web.yml` are generated by `setup` and intentionally gitignored ## Playbooks - `playbooks/services.yml`: deploy observability + forgejo on `services` - `playbooks/web.yml`: deploy app-side dependencies on `web` - `playbooks/test_config.yml`: smoke test host config and deployed stacks - `playbooks/deploy.yml`: legacy/all-in-one deploy for the services host (no tags) ## Configuration split - Vault (`secrets/vault.yml`): secrets (API tokens, passwords, access keys, and sensitive Terraform `TF_VAR_*` values) - `.env`: non-secret configuration (still treated as sensitive), such as region/instance type and non-secret endpoints ## Linode Object Storage (demo apps) If you already have a Linode Object Storage bucket, demo apps can use it via the S3-compatible API. Recommended env vars (see `.env.example`): - `S3_BUCKET` - `S3_ENDPOINT` (example: `https://us-east-1.linodeobjects.com`) - `S3_REGION` Secrets (store in `secrets/vault.yml`): - `S3_ACCESS_KEY_ID` - `S3_SECRET_ACCESS_KEY` Create a dedicated access key for demos and scope permissions as tightly as possible. ## Grafana provisioning Grafana is provisioned with Prometheus and Loki datasources via the Grafana provisioning mechanism (no manual UI setup required). **Note**: Grafana is deployed but DNS records are commented out. Access via `grafana.jfraeys.com` by uncommenting the records in `terraform/main.tf`, or access directly via the services host IP. ## Host vars Set `inventory/host_vars/web.yml`: - `public_ipv4`: public IPv4 of `jfraeys.com` This is used to allowlist Loki (`services:3100`) to only the web host. ## Forgejo Actions runner (web host) A Forgejo runner is deployed on the `web` host (`roles/forgejo_runner`). - Requires `FORGEJO_RUNNER_REGISTRATION_TOKEN` in `secrets/vault.yml`. - Uses a single `self-hosted` label by default. - The role auto re-registers the runner if labels change. ### AI Scrapers Blocklist Forgejo includes a weekly cron job (`roles/forgejo/update-ai-scrapers.sh`) that updates `robots.txt` to block AI scrapers (GPTBot, ClaudeBot, etc.). ### OIDC Configuration Forgejo is configured with: - Group claim mapping from Authelia (`groups`) - Admin group: `admins` - Auto-discovery from `https://auth.jfraeys.com/.well-known/openid-configuration` To force re-register (e.g. after deleting the runner in Forgejo UI): ```bash ansible-playbook playbooks/web.yml \ --vault-password-file secrets/.vault_pass \ --limit web \ --tags forgejo_runner \ -e forgejo_runner_force_reregister=true ``` ## SSH from Actions to services If a workflow running on the `web` runner needs SSH access to the `services` host: The controller expects two separate SSH keys restricted to forced commands: - `infra-register-stdin` (register) - `infra-deregister` (deregister) Public keys (installed on the `services` host via Ansible/vault): - `SERVICE_SSH_REGISTER_PUBLIC_KEY` - `SERVICE_SSH_DEREGISTER_PUBLIC_KEY` Private keys (stored as Forgejo Actions secrets): - `SERVICE_SSH_KEY_REGISTER` - `SERVICE_SSH_KEY_DEREGISTER` To generate/update both Actions secrets (and optionally update both public keys in vault): Install Python deps first: ```bash python3 -m pip install -r requirements.txt ``` ```bash python3 scripts/forgejo_set_actions_secret.py \ --repo jfraeysd/infra-controller \ --generate-ssh-keys \ --update-vault-both-public-keys ``` ## Deploy Services: ```bash ansible-playbook playbooks/services.yml --ask-vault-pass ``` Web: ```bash ansible-playbook playbooks/web.yml --ask-vault-pass ``` ## Terraform `./setup` will export `TF_VAR_*` from `secrets/vault.yml` (prompting for vault password if needed) and then run Terraform with a saved plan. ## Notes - **Grafana/Prometheus/Loki**: Available as optional roles but not deployed by default (commented out in `services.yml`). Enable by uncommenting the role entries. - Loki is exposed on `services:3100` but allowlisted in UFW to `web` only. - Watchtower is enabled with label-based updates. - **Traefik**: Uses file provider exclusively (Docker socket access removed). Services have static router definitions in `/opt/traefik/dynamic/base.yml`. - **Postfix**: Relays through Postmark port 2525 (avoids ISP blocking on 587). - **Hardening**: SSH config and unattended-upgrades managed via `hardening` role to prevent StackScript drift. ## Role layout Services host (`services`): - `roles/traefik` (file provider only - no Docker socket) - `roles/postfix` (Postmark SMTP relay for transactional email) - `roles/exporters` (node-exporter + cAdvisor) - `roles/app` (active - DNS enabled) - `roles/prometheus` (optional - commented out in services.yml) - `roles/loki` (optional - commented out in services.yml) - `roles/grafana` (optional - commented out in services.yml) - `roles/forgejo` - `roles/alertmanager` (uses localhost:25 Postfix relay) - `roles/watchtower` - `roles/hardening` (SSH hardening, unattended-upgrades) - `roles/backups` - `roles/fail2ban` (Docker-based fail2ban) Web host (`web`): - `roles/traefik` - `roles/app_core` (optional shared Postgres/Redis) - `roles/forgejo_runner` - `roles/app_deployer` (CI/CD webhook and deployment automation) - `roles/hardening` (SSH hardening, unattended-upgrades) ## App Deployment The `app_deployer` role provides automated deployment via webhooks from Forgejo or GitHub Actions. ### Prerequisites 1. **Generate deploy token** (run once): ```bash ./scripts/gen-auth-secrets.sh # Creates VAULT_DEPLOY_TOKEN # Or add to secrets/vault.yml manually ``` 2. **Set DEPLOY_TOKEN in your app repo**: - **Forgejo**: Use the helper script: ```bash ./scripts/set_deploy_token.py --owner --repo ``` - **GitHub**: Set `DEPLOY_TOKEN` secret via Settings > Secrets and variables > Actions 3. **Add deploy workflow to your app repo**: Copy the sample workflow and customize: ```bash cp roles/app_deployer/files/forgejo-deploy-workflow.yml .forgejo/workflows/deploy.yml # For GitHub: cp to .github/workflows/deploy.yml ``` Update the workflow for your build (Go, Rust, Node.js, etc.) and app name. ### How It Works 1. **CI builds the app** and uploads binary + checksum to `deploy@web:/opt/artifacts/` 2. **CI triggers webhook** with `X-Deploy-Token` header 3. **Webhook validates token** (timing-safe comparison) and runs deployment 4. **Ansible deploys the app**: - Verifies artifact checksum - Creates app user and directories - Sets up systemd service - Keeps last 5 versions for rollback ### Manual Deployment For manual deploys or rollbacks: ```bash # Deploy a specific version ssh deploy@web /opt/deploy/scripts/deploy.sh my-api abc123 prod # Rollback to previous version ssh deploy@web /opt/deploy/scripts/rollback.sh my-api # Lists available versions, then: ssh deploy@web /opt/deploy/scripts/rollback.sh my-api ``` ### Security Features - **Timing-safe token validation** prevents timing attacks - **Artifact checksums** ensure binary integrity - **Sudoers restricted** to only deployment script - **Last 5 versions kept** for quick rollback - **Deploy user** runs as unprivileged user per app ### Troubleshooting ```bash # Check webhook logs ssh web sudo journalctl -u webhook -f # Check deploy logs ssh web sudo cat /var/log/deploy.log # Verify systemd service ssh web sudo systemctl status my-api ```