# infra ## Overview This repo manages two hosts: - `web` (`jfraeys.com`) - `services` (`services.jfraeys.com`) The routing convention is `service.server.jfraeys.com`. Examples: - `grafana.jfraeys.com` -> services host - `git.jfraeys.com` -> services host Traefik runs on both servers and routes only the services running on that server. ## Quickstart This repo is intended to be driven by `setup.sh`: ```bash ./setup.sh ``` For options: ```bash ./setup.sh --help ``` What it does: - Applies Terraform from `terraform/` - Writes `inventory/hosts.yml` and `inventory/host_vars/web.yml` (gitignored) - Runs `playbooks/services.yml` and `playbooks/app.yml` If you want Terraform only: ```bash ./setup.sh --no-ansible ``` If you want Ansible only (requires an existing `inventory/hosts.yml`): ```bash ./setup.sh --ansible-only ``` ## Prereqs (local) - `terraform` - `ansible` - `python3` (for helper scripts) - `pip` / `python3 -m pip` - SSH access to the hosts If your SSH key is passphrase-protected, you must load it into your agent before running Ansible non-interactively: ```bash ssh-add --apple-use-keychain ~/.ssh/id_ed25519 ``` ## DNS (Cloudflare) Create A/CNAME records that point to the correct server IP. Recommended: - `jfraeys.com` -> A record to web server IPv4 - `services.jfraeys.com` -> A record to services server IPv4 - `grafana.jfraeys.com` -> A/CNAME to services - `git.jfraeys.com` -> A/CNAME to services ## TLS Traefik uses Let’s Encrypt via Cloudflare DNS-01. You must provide a Cloudflare API token in your local environment when running Ansible: - `CF_DNS_API_TOKEN` (preferred) - or `TF_VAR_cloudflare_api_token` ## SSO (Authelia OIDC) Authelia is exposed at: - `https://auth.jfraeys.com` (issuer) - `https://auth.jfraeys.com/.well-known/openid-configuration` (discovery) Grafana is configured via `roles/grafana` using the Generic OAuth provider. Forgejo is configured via `roles/forgejo` using the Forgejo admin CLI with `--provider=openidConnect` and `--auto-discover-url`. Note: Forgejo pages that ask for an "OpenID URI" are legacy OpenID 2.0 and are not used for OIDC. ## Secrets (Ansible Vault) Secrets are stored in `secrets/vault.yml` (encrypted). Create your vault from the template: - `secrets/vault.example.yml` -> `secrets/vault.yml` Run playbooks with either: - `--ask-vault-pass` - or a local password file (not committed): `--vault-password-file .vault_pass` Notes: - `secrets/vault.yml` is intentionally gitignored - `inventory/hosts.yml` and `inventory/host_vars/web.yml` are generated by `setup.sh` and intentionally gitignored ## Playbooks - `playbooks/services.yml`: deploy observability + forgejo on `services` - `playbooks/app.yml`: deploy app-side dependencies on `web` - `playbooks/test_config.yml`: smoke test host config and deployed stacks - `playbooks/deploy.yml`: legacy/all-in-one deploy for the services host (no tags) ## Configuration split - Vault (`secrets/vault.yml`): secrets (API tokens, passwords, access keys, and sensitive Terraform `TF_VAR_*` values) - `.env`: non-secret configuration (still treated as sensitive), such as region/instance type and non-secret endpoints ## Linode Object Storage (demo apps) If you already have a Linode Object Storage bucket, demo apps can use it via the S3-compatible API. Recommended env vars (see `.env.example`): - `S3_BUCKET` - `S3_ENDPOINT` (example: `https://us-east-1.linodeobjects.com`) - `S3_REGION` Secrets (store in `secrets/vault.yml`): - `S3_ACCESS_KEY_ID` - `S3_SECRET_ACCESS_KEY` Create a dedicated access key for demos and scope permissions as tightly as possible. ## Grafana provisioning Grafana is provisioned with Prometheus and Loki datasources via the Grafana provisioning mechanism (no manual UI setup required). ## Host vars Set `inventory/host_vars/web.yml`: - `public_ipv4`: public IPv4 of `jfraeys.com` This is used to allowlist Loki (`services:3100`) to only the web host. ## Forgejo Actions runner (web host) A Forgejo runner is deployed on the `web` host (`roles/forgejo_runner`). - Requires `FORGEJO_RUNNER_REGISTRATION_TOKEN` in `secrets/vault.yml`. - Uses a single `self-hosted` label by default. - The role auto re-registers the runner if labels change. To force re-register (e.g. after deleting the runner in Forgejo UI): ```bash ansible-playbook playbooks/app.yml \ --vault-password-file secrets/.vault_pass \ --limit web \ --tags forgejo_runner \ -e forgejo_runner_force_reregister=true ``` ## SSH from Actions to services If a workflow running on the `web` runner needs SSH access to the `services` host: The controller expects two separate SSH keys restricted to forced commands: - `infra-register-stdin` (register) - `infra-deregister` (deregister) Public keys (installed on the `services` host via Ansible/vault): - `SERVICE_SSH_REGISTER_PUBLIC_KEY` - `SERVICE_SSH_DEREGISTER_PUBLIC_KEY` Private keys (stored as Forgejo Actions secrets): - `SERVICE_SSH_KEY_REGISTER` - `SERVICE_SSH_KEY_DEREGISTER` To generate/update both Actions secrets (and optionally update both public keys in vault): Install Python deps first: ```bash python3 -m pip install -r requirements.txt ``` ```bash python3 scripts/forgejo_set_actions_secret.py \ --repo jfraeysd/infra-controller \ --generate-ssh-keys \ --update-vault-both-public-keys ``` ## Deploy Services: ```bash ansible-playbook playbooks/services.yml --ask-vault-pass ``` Web: ```bash ansible-playbook playbooks/app.yml --ask-vault-pass ``` ## Terraform `./setup.sh` will export `TF_VAR_*` from `secrets/vault.yml` (prompting for vault password if needed) and then run Terraform with a saved plan. ## Notes - Loki is exposed on `services:3100` but allowlisted in UFW to `web` only. - Watchtower is enabled with label-based updates. - Airflow/Spark are intentionally optional and can be enabled later via `deploy_airflow` / `deploy_spark`. ## Role layout Services host (`services`): - `roles/traefik` - `roles/exporters` (node-exporter + cAdvisor) - `roles/prometheus` - `roles/loki` - `roles/grafana` - `roles/forgejo` - `roles/watchtower` Web host (`web`): - `roles/traefik` - `roles/app_core` (optional shared Postgres/Redis) - `roles/forgejo_runner`