No description
Find a file
Jeremie Fraeys dbe7b1b6b2
feat(docker): add timezone mounts to all containers for log sync
Add /etc/localtime:/etc/localtime:ro volume mount to:
- alertmanager, authelia, traefik
- exporters (node-exporter, cadvisor)
- fail2ban, lldap, postfix
- forgejo, forgejo_runner
- grafana, loki, prometheus
- watchtower, app_core (postgres, redis)

Ensures container logs use host timezone for consistent timestamps.
2026-03-06 15:13:52 -05:00
inventory/group_vars refactor(core): update ansible configuration and playbooks 2026-03-06 14:28:02 -05:00
playbooks infra: cleanup repository and add rollback documentation 2026-03-06 14:40:56 -05:00
roles feat(docker): add timezone mounts to all containers for log sync 2026-03-06 15:13:52 -05:00
scripts feat(hardening): add container security scanning with Trivy 2026-03-06 14:27:20 -05:00
secrets infra: cleanup repository and add rollback documentation 2026-03-06 14:40:56 -05:00
stackscripts refactor(infra): update terraform and setup configurations 2026-03-06 14:27:53 -05:00
terraform infra: cleanup repository and add rollback documentation 2026-03-06 14:40:56 -05:00
.env.example docs: update .env.example and README documentation 2026-03-06 14:31:21 -05:00
.gitignore chore(infra): add Postmark DNS records and update example secrets 2026-03-06 10:32:08 -05:00
.python-version Add documentation and infrastructure gap analysis 2026-02-21 18:30:33 -05:00
ansible.cfg refactor(core): update ansible configuration and playbooks 2026-03-06 14:28:02 -05:00
README.md infra: cleanup repository and add rollback documentation 2026-03-06 14:40:56 -05:00
requirements.txt feat(actions-ssh): use register/deregister keys for services access 2026-01-20 17:10:02 -05:00
ROLLBACK.md infra: cleanup repository and add rollback documentation 2026-03-06 14:40:56 -05:00
setup feat(setup): add 'clean' subcommand to remove generated files 2026-03-06 14:42:58 -05:00

infra

Overview

This repo manages two hosts:

  • web (jfraeys.com)
  • services (services.jfraeys.com)

The routing convention is service.server.jfraeys.com.

Examples:

  • git.jfraeys.com -> services host (Forgejo)
  • auth.jfraeys.com -> services host (Authelia)
  • app.jfraeys.com -> services host (App)

Traefik runs on both servers and routes only the services running on that server.

Quickstart

This repo is intended to be driven by setup:

./setup

For options:

./setup --help

What it does:

  • Applies Terraform from terraform/
  • Writes inventory/hosts.yml and inventory/host_vars/web.yml (gitignored)
  • Runs playbooks/services.yml and playbooks/web.yml

If you want Terraform only:

./setup --no-ansible

If you want Ansible only (requires an existing inventory/hosts.yml):

./setup --ansible-only

Prereqs (local)

  • terraform
  • ansible
  • python3 (for helper scripts)
  • pip / python3 -m pip
  • SSH access to the hosts

If your SSH key is passphrase-protected, you must load it into your agent before running Ansible non-interactively:

ssh-add --apple-use-keychain ~/.ssh/id_ed25519

DNS (Cloudflare)

Create A/CNAME records that point to the correct server IP.

Active records:

  • jfraeys.com -> A record to web server IPv4
  • services.jfraeys.com -> A record to services server IPv4
  • git.jfraeys.com -> A/CNAME to services (Forgejo)
  • auth.jfraeys.com -> A/CNAME to services (Authelia)
  • app.jfraeys.com -> A/CNAME to services (App)

Commented out (unused):

  • grafana.jfraeys.com -> A/CNAME to services (Grafana - currently disabled)
  • prometheus.jfraeys.com -> A/CNAME to services (Prometheus - currently disabled)

To enable, uncomment the records in terraform/main.tf.

TLS

Traefik uses Lets Encrypt via Cloudflare DNS-01.

You must provide a Cloudflare API token in your local environment when running Ansible:

  • CF_DNS_API_TOKEN
  • CF_ZONE_API_TOKEN

SSO (Authelia OIDC)

Authelia is exposed at:

  • https://auth.jfraeys.com (issuer)
  • https://auth.jfraeys.com/.well-known/openid-configuration (discovery)

Grafana is configured via roles/grafana using the Generic OAuth provider.

Forgejo is configured via roles/forgejo using the Forgejo admin CLI with --provider=openidConnect and --auto-discover-url.

Note: Forgejo pages that ask for an "OpenID URI" are legacy OpenID 2.0 and are not used for OIDC.

Email (Postfix + Postmark)

Transactional email is delivered via Postfix relay to Postmark:

  • Sender: notifications@jfraeys.com
  • Relay: smtp.postmarkapp.com:2525
  • Auth: Server token authentication

Services using email:

  • Authelia (password resets)
  • Alertmanager (monitoring alerts)
  • Forgejo (CI/CD notifications)

DNS Records for Email

Terraform manages these Cloudflare records:

Record Type Purpose
YYYYMMDDDDpm._domainkey TXT DKIM signature
pm-bounces CNAME Return-path for bounces
_dmarc TXT DMARC policy

Postmark validates these during account setup.

Vault Variables

Add to secrets/vault.yml:

Email (Postfix + Postmark):

POSTFIX_RELAYHOST_USERNAME: "your-postmark-server-token"
POSTFIX_RELAYHOST_PASSWORD: "your-postmark-server-token"
AUTHELIA_SMTP_SENDER: "notifications@jfraeys.com"
AUTHELIA_SMTP_IDENTIFIER: "jfraeys.com"

Backups (Restic):

RESTIC_REPOSITORY: "s3:https://us-east-1.linodeobjects.com/mybucket/backups"
RESTIC_PASSWORD: "strong-encryption-password"
RESTIC_AWS_ACCESS_KEY_ID: "your-linode-access-key"
RESTIC_AWS_SECRET_ACCESS_KEY: "your-linode-secret-key"
# Optional:
RESTIC_AWS_DEFAULT_REGION: "us-east-1"
RESTIC_KEEP_DAILY: 7
RESTIC_KEEP_WEEKLY: 4
RESTIC_KEEP_MONTHLY: 6
INFRA_BACKUP_ONCALENDAR: "daily"  # systemd calendar spec

Alerting (set exactly one):

# Slack option:
ALERTMANAGER_SLACK_WEBHOOK_URL: "https://hooks.slack.com/services/..."
ALERTMANAGER_SLACK_CHANNEL: "#alerts"
ALERTMANAGER_SLACK_USERNAME: "alertmanager"

# Discord option:
ALERTMANAGER_DISCORD_WEBHOOK_URL: "https://discord.com/api/webhooks/..."

Secrets (Ansible Vault)

Secrets are stored in secrets/vault.yml (encrypted).

Create your vault from the template:

  • secrets/vault.example.yml -> secrets/vault.yml

Run playbooks with either:

  • --ask-vault-pass
  • or a local password file (not committed): --vault-password-file .vault_pass

Notes:

  • secrets/vault.yml is intentionally gitignored
  • inventory/hosts.yml and inventory/host_vars/web.yml are generated by setup and intentionally gitignored

Playbooks

  • playbooks/services.yml: deploy observability + forgejo on services
  • playbooks/web.yml: deploy app-side dependencies on web
  • playbooks/test_config.yml: smoke test host config and deployed stacks
  • playbooks/deploy.yml: legacy/all-in-one deploy for the services host (no tags)

Configuration split

  • Vault (secrets/vault.yml): secrets (API tokens, passwords, access keys, and sensitive Terraform TF_VAR_* values)
  • .env: non-secret configuration (still treated as sensitive), such as region/instance type and non-secret endpoints

Linode Object Storage (demo apps)

If you already have a Linode Object Storage bucket, demo apps can use it via the S3-compatible API.

Recommended env vars (see .env.example):

  • S3_BUCKET
  • S3_ENDPOINT (example: https://us-east-1.linodeobjects.com)
  • S3_REGION

Secrets (store in secrets/vault.yml):

  • S3_ACCESS_KEY_ID
  • S3_SECRET_ACCESS_KEY

Create a dedicated access key for demos and scope permissions as tightly as possible.

Grafana provisioning

Grafana is provisioned with Prometheus and Loki datasources via the Grafana provisioning mechanism (no manual UI setup required).

Note: Grafana is deployed but DNS records are commented out. Access via grafana.jfraeys.com by uncommenting the records in terraform/main.tf, or access directly via the services host IP.

Host vars

Set inventory/host_vars/web.yml:

  • public_ipv4: public IPv4 of jfraeys.com

This is used to allowlist Loki (services:3100) to only the web host.

Forgejo Actions runner (web host)

A Forgejo runner is deployed on the web host (roles/forgejo_runner).

  • Requires FORGEJO_RUNNER_REGISTRATION_TOKEN in secrets/vault.yml.
  • Uses a single self-hosted label by default.
  • The role auto re-registers the runner if labels change.

AI Scrapers Blocklist

Forgejo includes a weekly cron job (roles/forgejo/update-ai-scrapers.sh) that updates robots.txt to block AI scrapers (GPTBot, ClaudeBot, etc.).

OIDC Configuration

Forgejo is configured with:

  • Group claim mapping from Authelia (groups)
  • Admin group: admins
  • Auto-discovery from https://auth.jfraeys.com/.well-known/openid-configuration

To force re-register (e.g. after deleting the runner in Forgejo UI):

ansible-playbook playbooks/web.yml \
  --vault-password-file secrets/.vault_pass \
  --limit web \
  --tags forgejo_runner \
  -e forgejo_runner_force_reregister=true

SSH from Actions to services

If a workflow running on the web runner needs SSH access to the services host:

The controller expects two separate SSH keys restricted to forced commands:

  • infra-register-stdin (register)
  • infra-deregister (deregister)

Public keys (installed on the services host via Ansible/vault):

  • SERVICE_SSH_REGISTER_PUBLIC_KEY
  • SERVICE_SSH_DEREGISTER_PUBLIC_KEY

Private keys (stored as Forgejo Actions secrets):

  • SERVICE_SSH_KEY_REGISTER
  • SERVICE_SSH_KEY_DEREGISTER

To generate/update both Actions secrets (and optionally update both public keys in vault):

Install Python deps first:

python3 -m pip install -r requirements.txt
python3 scripts/forgejo_set_actions_secret.py \
  --repo jfraeysd/infra-controller \
  --generate-ssh-keys \
  --update-vault-both-public-keys

Deploy

Services:

ansible-playbook playbooks/services.yml --ask-vault-pass

Web:

ansible-playbook playbooks/web.yml --ask-vault-pass

Terraform

./setup will export TF_VAR_* from secrets/vault.yml (prompting for vault password if needed) and then run Terraform with a saved plan.

Notes

  • Grafana/Prometheus/Loki: Available as optional roles but not deployed by default (commented out in services.yml). Enable by uncommenting the role entries.
  • Loki is exposed on services:3100 but allowlisted in UFW to web only.
  • Watchtower is enabled with label-based updates.
  • Traefik: Uses file provider exclusively (Docker socket access removed). Services have static router definitions in /opt/traefik/dynamic/base.yml.
  • Postfix: Relays through Postmark port 2525 (avoids ISP blocking on 587).
  • Hardening: SSH config and unattended-upgrades managed via hardening role to prevent StackScript drift.

Role layout

Services host (services):

  • roles/traefik (file provider only - no Docker socket)
  • roles/postfix (Postmark SMTP relay for transactional email)
  • roles/exporters (node-exporter + cAdvisor)
  • roles/app (active - DNS enabled)
  • roles/prometheus (optional - commented out in services.yml)
  • roles/loki (optional - commented out in services.yml)
  • roles/grafana (optional - commented out in services.yml)
  • roles/forgejo
  • roles/alertmanager (uses localhost:25 Postfix relay)
  • roles/watchtower
  • roles/hardening (SSH hardening, unattended-upgrades)
  • roles/backups
  • roles/fail2ban (Docker-based fail2ban)

Web host (web):

  • roles/traefik
  • roles/app_core (optional shared Postgres/Redis)
  • roles/forgejo_runner
  • roles/app_deployer (CI/CD webhook and deployment automation)
  • roles/hardening (SSH hardening, unattended-upgrades)

App Deployment

The app_deployer role provides automated deployment via webhooks from Forgejo or GitHub Actions.

Prerequisites

  1. Generate deploy token (run once):

    ./scripts/gen-auth-secrets.sh  # Creates VAULT_DEPLOY_TOKEN
    # Or add to secrets/vault.yml manually
    
  2. Set DEPLOY_TOKEN in your app repo:

    • Forgejo: Use the helper script:
      ./scripts/set_deploy_token.py --owner <you> --repo <app-name>
      
    • GitHub: Set DEPLOY_TOKEN secret via Settings > Secrets and variables > Actions
  3. Add deploy workflow to your app repo:

    Copy the sample workflow and customize:

    cp roles/app_deployer/files/forgejo-deploy-workflow.yml .forgejo/workflows/deploy.yml
    # For GitHub: cp to .github/workflows/deploy.yml
    

    Update the workflow for your build (Go, Rust, Node.js, etc.) and app name.

How It Works

  1. CI builds the app and uploads binary + checksum to deploy@web:/opt/artifacts/
  2. CI triggers webhook with X-Deploy-Token header
  3. Webhook validates token (timing-safe comparison) and runs deployment
  4. Ansible deploys the app:
    • Verifies artifact checksum
    • Creates app user and directories
    • Sets up systemd service
    • Keeps last 5 versions for rollback

Manual Deployment

For manual deploys or rollbacks:

# Deploy a specific version
ssh deploy@web /opt/deploy/scripts/deploy.sh my-api abc123 prod

# Rollback to previous version
ssh deploy@web /opt/deploy/scripts/rollback.sh my-api
# Lists available versions, then:
ssh deploy@web /opt/deploy/scripts/rollback.sh my-api <older-sha>

Security Features

  • Timing-safe token validation prevents timing attacks
  • Artifact checksums ensure binary integrity
  • Sudoers restricted to only deployment script
  • Last 5 versions kept for quick rollback
  • Deploy user runs as unprivileged user per app

Troubleshooting

# Check webhook logs
ssh web sudo journalctl -u webhook -f

# Check deploy logs
ssh web sudo cat /var/log/deploy.log

# Verify systemd service
ssh web sudo systemctl status my-api