infra/README.md
Jeremie Fraeys 9e7b51b69a
docs: document Actions SSH key setup
- Document required register/deregister SSH keys for controller workflows\n- Update vault.example.yml with FORGEJO_API_TOKEN and SSH public key placeholders
2026-01-20 17:10:41 -05:00

5.8 KiB
Raw Blame History

infra

Overview

This repo manages two hosts:

  • web (jfraeys.com)
  • services (services.jfraeys.com)

The routing convention is service.server.jfraeys.com.

Examples:

  • grafana.jfraeys.com -> services host
  • git.jfraeys.com -> services host

Traefik runs on both servers and routes only the services running on that server.

Quickstart

This repo is intended to be driven by setup.sh:

./setup.sh

What it does:

  • Applies Terraform from terraform/
  • Writes inventory/hosts.yml and inventory/host_vars/web.yml (gitignored)
  • Runs playbooks/services.yml and playbooks/app.yml

If you want Terraform only:

./setup.sh --no-ansible

Prereqs (local)

  • terraform
  • ansible
  • SSH access to the hosts

If your SSH key is passphrase-protected, you must load it into your agent before running Ansible non-interactively:

ssh-add --apple-use-keychain ~/.ssh/id_ed25519

DNS (Cloudflare)

Create A/CNAME records that point to the correct server IP.

Recommended:

  • jfraeys.com -> A record to web server IPv4
  • services.jfraeys.com -> A record to services server IPv4
  • grafana.jfraeys.com -> A/CNAME to services
  • git.jfraeys.com -> A/CNAME to services

TLS

Traefik uses Lets Encrypt via Cloudflare DNS-01.

You must provide a Cloudflare API token in your local environment when running Ansible:

  • CF_DNS_API_TOKEN (preferred)
  • or TF_VAR_cloudflare_api_token

SSO (Authelia OIDC)

Authelia is exposed at:

  • https://auth.jfraeys.com (issuer)
  • https://auth.jfraeys.com/.well-known/openid-configuration (discovery)

Grafana is configured via roles/grafana using the Generic OAuth provider.

Forgejo is configured via roles/forgejo using the Forgejo admin CLI with --provider=openidConnect and --auto-discover-url.

Note: Forgejo pages that ask for an "OpenID URI" are legacy OpenID 2.0 and are not used for OIDC.

Secrets (Ansible Vault)

Secrets are stored in secrets/vault.yml (encrypted).

Create your vault from the template:

  • secrets/vault.example.yml -> secrets/vault.yml

Run playbooks with either:

  • --ask-vault-pass
  • or a local password file (not committed): --vault-password-file .vault_pass

Notes:

  • secrets/vault.yml is intentionally gitignored
  • inventory/hosts.yml and inventory/host_vars/web.yml are generated by setup.sh and intentionally gitignored

Playbooks

  • playbooks/services.yml: deploy observability + forgejo on services
  • playbooks/app.yml: deploy app-side dependencies on web
  • playbooks/test_config.yml: smoke test host config and deployed stacks
  • playbooks/deploy.yml: legacy/all-in-one deploy for the services host (no tags)

Configuration split

  • Vault (secrets/vault.yml): secrets (API tokens, passwords, access keys, and sensitive Terraform TF_VAR_* values)
  • .env: non-secret configuration (still treated as sensitive), such as region/instance type and non-secret endpoints

Linode Object Storage (demo apps)

If you already have a Linode Object Storage bucket, demo apps can use it via the S3-compatible API.

Recommended env vars (see .env.example):

  • S3_BUCKET
  • S3_ENDPOINT (example: https://us-east-1.linodeobjects.com)
  • S3_REGION

Secrets (store in secrets/vault.yml):

  • S3_ACCESS_KEY_ID
  • S3_SECRET_ACCESS_KEY

Create a dedicated access key for demos and scope permissions as tightly as possible.

Grafana provisioning

Grafana is provisioned with Prometheus and Loki datasources via the Grafana provisioning mechanism (no manual UI setup required).

Host vars

Set inventory/host_vars/web.yml:

  • public_ipv4: public IPv4 of jfraeys.com

This is used to allowlist Loki (services:3100) to only the web host.

Forgejo Actions runner (web host)

A Forgejo runner is deployed on the web host (roles/forgejo_runner).

  • Requires FORGEJO_RUNNER_REGISTRATION_TOKEN in secrets/vault.yml.
  • Uses a single self-hosted label by default.
  • The role auto re-registers the runner if labels change.

To force re-register (e.g. after deleting the runner in Forgejo UI):

ansible-playbook playbooks/app.yml \
  --vault-password-file secrets/.vault_pass \
  --limit web \
  --tags forgejo_runner \
  -e forgejo_runner_force_reregister=true

SSH from Actions to services

If a workflow running on the web runner needs SSH access to the services host:

The controller expects two separate SSH keys restricted to forced commands:

  • infra-register-stdin (register)
  • infra-deregister (deregister)

Public keys (installed on the services host via Ansible/vault):

  • SERVICE_SSH_REGISTER_PUBLIC_KEY
  • SERVICE_SSH_DEREGISTER_PUBLIC_KEY

Private keys (stored as Forgejo Actions secrets):

  • SERVICE_SSH_KEY_REGISTER
  • SERVICE_SSH_KEY_DEREGISTER

To generate/update both Actions secrets (and optionally update both public keys in vault):

python3 scripts/forgejo_set_actions_secret.py \
  --repo jfraeysd/infra-controller \
  --generate-ssh-keys \
  --update-vault-both-public-keys

Deploy

Services:

ansible-playbook playbooks/services.yml --ask-vault-pass

Web:

ansible-playbook playbooks/app.yml --ask-vault-pass

Terraform

./setup.sh will export TF_VAR_* from secrets/vault.yml (prompting for vault password if needed) and then run Terraform with a saved plan.

Notes

  • Loki is exposed on services:3100 but allowlisted in UFW to web only.
  • Watchtower is enabled with label-based updates.
  • Airflow/Spark are intentionally optional and can be enabled later via deploy_airflow / deploy_spark.

Role layout

Services host (services):

  • roles/traefik
  • roles/exporters (node-exporter + cAdvisor)
  • roles/prometheus
  • roles/loki
  • roles/grafana
  • roles/forgejo
  • roles/watchtower

Web host (web):

  • roles/traefik
  • roles/app_core (optional shared Postgres/Redis)
  • roles/forgejo_runner