Add /etc/localtime:/etc/localtime:ro volume mount to: - alertmanager, authelia, traefik - exporters (node-exporter, cadvisor) - fail2ban, lldap, postfix - forgejo, forgejo_runner - grafana, loki, prometheus - watchtower, app_core (postgres, redis) Ensures container logs use host timezone for consistent timestamps. |
||
|---|---|---|
| inventory/group_vars | ||
| playbooks | ||
| roles | ||
| scripts | ||
| secrets | ||
| stackscripts | ||
| terraform | ||
| .env.example | ||
| .gitignore | ||
| .python-version | ||
| ansible.cfg | ||
| README.md | ||
| requirements.txt | ||
| ROLLBACK.md | ||
| setup | ||
infra
Overview
This repo manages two hosts:
web(jfraeys.com)services(services.jfraeys.com)
The routing convention is service.server.jfraeys.com.
Examples:
git.jfraeys.com-> services host (Forgejo)auth.jfraeys.com-> services host (Authelia)app.jfraeys.com-> services host (App)
Traefik runs on both servers and routes only the services running on that server.
Quickstart
This repo is intended to be driven by setup:
./setup
For options:
./setup --help
What it does:
- Applies Terraform from
terraform/ - Writes
inventory/hosts.ymlandinventory/host_vars/web.yml(gitignored) - Runs
playbooks/services.ymlandplaybooks/web.yml
If you want Terraform only:
./setup --no-ansible
If you want Ansible only (requires an existing inventory/hosts.yml):
./setup --ansible-only
Prereqs (local)
terraformansiblepython3(for helper scripts)pip/python3 -m pip- SSH access to the hosts
If your SSH key is passphrase-protected, you must load it into your agent before running Ansible non-interactively:
ssh-add --apple-use-keychain ~/.ssh/id_ed25519
DNS (Cloudflare)
Create A/CNAME records that point to the correct server IP.
Active records:
jfraeys.com-> A record to web server IPv4services.jfraeys.com-> A record to services server IPv4git.jfraeys.com-> A/CNAME to services (Forgejo)auth.jfraeys.com-> A/CNAME to services (Authelia)app.jfraeys.com-> A/CNAME to services (App)
Commented out (unused):
grafana.jfraeys.com-> A/CNAME to services (Grafana - currently disabled)prometheus.jfraeys.com-> A/CNAME to services (Prometheus - currently disabled)
To enable, uncomment the records in terraform/main.tf.
TLS
Traefik uses Let’s Encrypt via Cloudflare DNS-01.
You must provide a Cloudflare API token in your local environment when running Ansible:
CF_DNS_API_TOKENCF_ZONE_API_TOKEN
SSO (Authelia OIDC)
Authelia is exposed at:
https://auth.jfraeys.com(issuer)https://auth.jfraeys.com/.well-known/openid-configuration(discovery)
Grafana is configured via roles/grafana using the Generic OAuth provider.
Forgejo is configured via roles/forgejo using the Forgejo admin CLI with --provider=openidConnect and --auto-discover-url.
Note: Forgejo pages that ask for an "OpenID URI" are legacy OpenID 2.0 and are not used for OIDC.
Email (Postfix + Postmark)
Transactional email is delivered via Postfix relay to Postmark:
- Sender:
notifications@jfraeys.com - Relay:
smtp.postmarkapp.com:2525 - Auth: Server token authentication
Services using email:
- Authelia (password resets)
- Alertmanager (monitoring alerts)
- Forgejo (CI/CD notifications)
DNS Records for Email
Terraform manages these Cloudflare records:
| Record | Type | Purpose |
|---|---|---|
YYYYMMDDDDpm._domainkey |
TXT | DKIM signature |
pm-bounces |
CNAME | Return-path for bounces |
_dmarc |
TXT | DMARC policy |
Postmark validates these during account setup.
Vault Variables
Add to secrets/vault.yml:
Email (Postfix + Postmark):
POSTFIX_RELAYHOST_USERNAME: "your-postmark-server-token"
POSTFIX_RELAYHOST_PASSWORD: "your-postmark-server-token"
AUTHELIA_SMTP_SENDER: "notifications@jfraeys.com"
AUTHELIA_SMTP_IDENTIFIER: "jfraeys.com"
Backups (Restic):
RESTIC_REPOSITORY: "s3:https://us-east-1.linodeobjects.com/mybucket/backups"
RESTIC_PASSWORD: "strong-encryption-password"
RESTIC_AWS_ACCESS_KEY_ID: "your-linode-access-key"
RESTIC_AWS_SECRET_ACCESS_KEY: "your-linode-secret-key"
# Optional:
RESTIC_AWS_DEFAULT_REGION: "us-east-1"
RESTIC_KEEP_DAILY: 7
RESTIC_KEEP_WEEKLY: 4
RESTIC_KEEP_MONTHLY: 6
INFRA_BACKUP_ONCALENDAR: "daily" # systemd calendar spec
Alerting (set exactly one):
# Slack option:
ALERTMANAGER_SLACK_WEBHOOK_URL: "https://hooks.slack.com/services/..."
ALERTMANAGER_SLACK_CHANNEL: "#alerts"
ALERTMANAGER_SLACK_USERNAME: "alertmanager"
# Discord option:
ALERTMANAGER_DISCORD_WEBHOOK_URL: "https://discord.com/api/webhooks/..."
Secrets (Ansible Vault)
Secrets are stored in secrets/vault.yml (encrypted).
Create your vault from the template:
secrets/vault.example.yml->secrets/vault.yml
Run playbooks with either:
--ask-vault-pass- or a local password file (not committed):
--vault-password-file .vault_pass
Notes:
secrets/vault.ymlis intentionally gitignoredinventory/hosts.ymlandinventory/host_vars/web.ymlare generated bysetupand intentionally gitignored
Playbooks
playbooks/services.yml: deploy observability + forgejo onservicesplaybooks/web.yml: deploy app-side dependencies onwebplaybooks/test_config.yml: smoke test host config and deployed stacksplaybooks/deploy.yml: legacy/all-in-one deploy for the services host (no tags)
Configuration split
- Vault (
secrets/vault.yml): secrets (API tokens, passwords, access keys, and sensitive TerraformTF_VAR_*values) .env: non-secret configuration (still treated as sensitive), such as region/instance type and non-secret endpoints
Linode Object Storage (demo apps)
If you already have a Linode Object Storage bucket, demo apps can use it via the S3-compatible API.
Recommended env vars (see .env.example):
S3_BUCKETS3_ENDPOINT(example:https://us-east-1.linodeobjects.com)S3_REGION
Secrets (store in secrets/vault.yml):
S3_ACCESS_KEY_IDS3_SECRET_ACCESS_KEY
Create a dedicated access key for demos and scope permissions as tightly as possible.
Grafana provisioning
Grafana is provisioned with Prometheus and Loki datasources via the Grafana provisioning mechanism (no manual UI setup required).
Note: Grafana is deployed but DNS records are commented out. Access via grafana.jfraeys.com by uncommenting the records in terraform/main.tf, or access directly via the services host IP.
Host vars
Set inventory/host_vars/web.yml:
public_ipv4: public IPv4 ofjfraeys.com
This is used to allowlist Loki (services:3100) to only the web host.
Forgejo Actions runner (web host)
A Forgejo runner is deployed on the web host (roles/forgejo_runner).
- Requires
FORGEJO_RUNNER_REGISTRATION_TOKENinsecrets/vault.yml. - Uses a single
self-hostedlabel by default. - The role auto re-registers the runner if labels change.
AI Scrapers Blocklist
Forgejo includes a weekly cron job (roles/forgejo/update-ai-scrapers.sh) that updates robots.txt to block AI scrapers (GPTBot, ClaudeBot, etc.).
OIDC Configuration
Forgejo is configured with:
- Group claim mapping from Authelia (
groups) - Admin group:
admins - Auto-discovery from
https://auth.jfraeys.com/.well-known/openid-configuration
To force re-register (e.g. after deleting the runner in Forgejo UI):
ansible-playbook playbooks/web.yml \
--vault-password-file secrets/.vault_pass \
--limit web \
--tags forgejo_runner \
-e forgejo_runner_force_reregister=true
SSH from Actions to services
If a workflow running on the web runner needs SSH access to the services host:
The controller expects two separate SSH keys restricted to forced commands:
infra-register-stdin(register)infra-deregister(deregister)
Public keys (installed on the services host via Ansible/vault):
SERVICE_SSH_REGISTER_PUBLIC_KEYSERVICE_SSH_DEREGISTER_PUBLIC_KEY
Private keys (stored as Forgejo Actions secrets):
SERVICE_SSH_KEY_REGISTERSERVICE_SSH_KEY_DEREGISTER
To generate/update both Actions secrets (and optionally update both public keys in vault):
Install Python deps first:
python3 -m pip install -r requirements.txt
python3 scripts/forgejo_set_actions_secret.py \
--repo jfraeysd/infra-controller \
--generate-ssh-keys \
--update-vault-both-public-keys
Deploy
Services:
ansible-playbook playbooks/services.yml --ask-vault-pass
Web:
ansible-playbook playbooks/web.yml --ask-vault-pass
Terraform
./setup will export TF_VAR_* from secrets/vault.yml (prompting for vault password if needed) and then run Terraform with a saved plan.
Notes
- Grafana/Prometheus/Loki: Available as optional roles but not deployed by default (commented out in
services.yml). Enable by uncommenting the role entries. - Loki is exposed on
services:3100but allowlisted in UFW towebonly. - Watchtower is enabled with label-based updates.
- Traefik: Uses file provider exclusively (Docker socket access removed). Services have static router definitions in
/opt/traefik/dynamic/base.yml. - Postfix: Relays through Postmark port 2525 (avoids ISP blocking on 587).
- Hardening: SSH config and unattended-upgrades managed via
hardeningrole to prevent StackScript drift.
Role layout
Services host (services):
roles/traefik(file provider only - no Docker socket)roles/postfix(Postmark SMTP relay for transactional email)roles/exporters(node-exporter + cAdvisor)roles/app(active - DNS enabled)roles/prometheus(optional - commented out in services.yml)roles/loki(optional - commented out in services.yml)roles/grafana(optional - commented out in services.yml)roles/forgejoroles/alertmanager(uses localhost:25 Postfix relay)roles/watchtowerroles/hardening(SSH hardening, unattended-upgrades)roles/backupsroles/fail2ban(Docker-based fail2ban)
Web host (web):
roles/traefikroles/app_core(optional shared Postgres/Redis)roles/forgejo_runnerroles/app_deployer(CI/CD webhook and deployment automation)roles/hardening(SSH hardening, unattended-upgrades)
App Deployment
The app_deployer role provides automated deployment via webhooks from Forgejo or GitHub Actions.
Prerequisites
-
Generate deploy token (run once):
./scripts/gen-auth-secrets.sh # Creates VAULT_DEPLOY_TOKEN # Or add to secrets/vault.yml manually -
Set DEPLOY_TOKEN in your app repo:
- Forgejo: Use the helper script:
./scripts/set_deploy_token.py --owner <you> --repo <app-name> - GitHub: Set
DEPLOY_TOKENsecret via Settings > Secrets and variables > Actions
- Forgejo: Use the helper script:
-
Add deploy workflow to your app repo:
Copy the sample workflow and customize:
cp roles/app_deployer/files/forgejo-deploy-workflow.yml .forgejo/workflows/deploy.yml # For GitHub: cp to .github/workflows/deploy.ymlUpdate the workflow for your build (Go, Rust, Node.js, etc.) and app name.
How It Works
- CI builds the app and uploads binary + checksum to
deploy@web:/opt/artifacts/ - CI triggers webhook with
X-Deploy-Tokenheader - Webhook validates token (timing-safe comparison) and runs deployment
- Ansible deploys the app:
- Verifies artifact checksum
- Creates app user and directories
- Sets up systemd service
- Keeps last 5 versions for rollback
Manual Deployment
For manual deploys or rollbacks:
# Deploy a specific version
ssh deploy@web /opt/deploy/scripts/deploy.sh my-api abc123 prod
# Rollback to previous version
ssh deploy@web /opt/deploy/scripts/rollback.sh my-api
# Lists available versions, then:
ssh deploy@web /opt/deploy/scripts/rollback.sh my-api <older-sha>
Security Features
- Timing-safe token validation prevents timing attacks
- Artifact checksums ensure binary integrity
- Sudoers restricted to only deployment script
- Last 5 versions kept for quick rollback
- Deploy user runs as unprivileged user per app
Troubleshooting
# Check webhook logs
ssh web sudo journalctl -u webhook -f
# Check deploy logs
ssh web sudo cat /var/log/deploy.log
# Verify systemd service
ssh web sudo systemctl status my-api