Skip to content

Deployment + CI/CD spec — two targets, two stories

Bucket: Backend (Agent B) Status: Reviewed — 2026-05-12 (Phase B seams applied: R2 release manifest auto-update parked; Phase I Jetson updates are SSH push) Owner: Sophia · Reviewers: Andrew, Agent A (Jetson side) Supersedes / refines: docs/technical/deployment.md is on-site procedure; this doc is the CI/CD plumbing layer above it.

Two completely different deploy targets, often conflated:

  1. lbzfai.com — Cloudflare auto-deploys on push to main via its native GitHub integration. A scoped Cloudflare API token now lives in .env so agents can also trigger a deploy directly.
  2. The Jetson — sits on LBZF’s network in Pereira; auto-deploy story has to handle “what if the deploy bricks the box and Sophia is in California?”

The decision being made: the safe blast-radius pattern for agent-driven Cloudflare deploys, and how the Jetson updates without a Sophia-flies-to-Colombia recovery scenario.

  • An agent can deploy a website change with the minimum required token scope and a clear blast-radius preview
  • Jetson updates work without a physical visit, with a rollback path that doesn’t require Tailscale (in case the update breaks Tailscale)
  • The two deploy targets are clearly named, clearly scoped, never confused for each other
  • The CI loop covers test, build, deploy, smoke-check
  • Secrets never end up in commits or in agent transcripts
  • Multi-environment (staging vs prod) for the Jetson — Phase I has one Jetson; staging is “Sophia’s dev Jetson in CA before flight”
  • Blue/green deploys; canarying — overkill at our scale
  • IaC for the Cloudflare account itself (Terraform / Pulumi) — Cloudflare dashboard is fine until we have >5 routes
  • Container orchestration on the Jetson — systemd is sufficient

Target 1: lbzfai.com (Cloudflare Workers Static Assets)

Section titled “Target 1: lbzfai.com (Cloudflare Workers Static Assets)”
  1. Push to main on github.com/sophiamann/lbzfai
  2. Cloudflare’s GitHub integration receives a webhook
  3. Cloudflare build VM clones the repo
  4. Auto-runs astro add cloudflare (build VM only — does NOT touch the repo’s astro.config.mjs)
  5. Runs npm run build → static dist/
  6. Deploys to the lbzfai-com Worker (and lbzfai.sophiainesmann.workers.dev route)
  7. Cache flush is automatic
  8. Build vars (PUBLIC_AUTH0_DOMAIN, PUBLIC_AUTH0_CLIENT_ID) come from Cloudflare dashboard → Settings → Builds → Build variables

This works and shouldn’t be changed casually.

A scoped Cloudflare API token is in .env. Token scope (recommended; verify in dashboard):

ScopeWhy
Account → Workers Scripts → Editpublish the Worker
Account → Workers Routes → Editadd/move routes
Account → Workers KV Storage → Edit(Phase II) write user table once it moves to KV/D1
User → User Details → ReadAPI health check
Zone scope: only lbzfai.comblast-radius

Do NOT grant Account → Account Settings → Edit, Account → DNS → Edit (for new zones), or User → API Tokens → Edit — those are nuclear.

Three rules:

  1. Agents do not push to main. Agents open a PR from a worktree branch. The auto-deploy fires only after a human merges. Cloudflare’s GitHub integration is the deploy mechanism; agents are just authors.

  2. For preview deploys, agents use wrangler deploy --env preview (or equivalent Worker route like pr-<n>.lbzfai-com.preview). Previews are namespaced and do not touch lbzfai.com. The Cloudflare API token used by agents is preview-scoped if possible; if Cloudflare doesn’t support a preview-only token, agents call a wrangler deploy --dry-run to surface the intended change before any actual write.

  3. The .env Cloudflare token is dev-only. It is in .gitignore and exists in .env.example only as a placeholder. Token rotation is manual; rotate quarterly and after every contractor offboard.

Smallest blast-radius deploy pattern, in order of preference:

ActionAffected surfaceRisk
Open PR; let CI build a preview Workerpr-N.lbzfai-com.preview.workers.devNone to prod
wrangler deploy --dry-run to validate confignothing changesNone
wrangler deploy --env previewpreview Worker onlyNone to prod
Push to a non-main branchnothing deploys (GitHub integration is main-only)None
Merge to main (after PR review)lbzfai.com prodOwned by reviewer
Direct wrangler deploy --env production (skipping git)lbzfai.com prodForbidden for agents — emergency only, Sophia/Andrew

Agent checklist for any Cloudflare-touching change

Section titled “Agent checklist for any Cloudflare-touching change”
  • Run wrangler whoami to confirm the token is the scoped agent token, not a personal token
  • If editing wrangler.toml or astro.config.mjs, dry-run the build locally first (npm run build)
  • Open PR; do not push to main
  • Include the URL of the preview deploy in the PR description
  • Note any new Cloudflare resources (routes, secrets) so they’re caught at review

The Jetson is in Pereira. The only inbound channel is Tailscale. If a deploy breaks Tailscale, recovery requires Armando or a local technical contact physically rebooting. This is the operational gating constraint on Jetson CI/CD.

Phase I uses Option A (SSH push) for Jetson updates. The Watchtower-style R2-manifest auto-pull (Option C) is parked to docs/design/60-parking/r2-release-manifest.md.

Reasoning:

  • Phase I has one production Jetson + one ITBA twin and one release author (Sophia). The R2 manifest design pays off when there is a fleet to coordinate or a non-author operator; neither holds in Phase I.
  • SSH push is the simplest mechanism that works and is fully reversible: ssh sophia@<jetson> "cd /opt/lbzf && git pull && ./bin/migrate.py && systemctl restart lbzf-*".
  • Rollback is git checkout <prev_sha> && systemctl restart lbzf-* over the same SSH session.
  • The ITBA Jetson uses the same workflow over its own Tailscale tag (tag:itba-dev).

Trigger to thaw the R2 design: more than one production Jetson or a non-Sophia engineer regularly merging to main.

Tailscale-breaking changes (still applies)

Section titled “Tailscale-breaking changes (still applies)”

The single failure that defeats any unattended update path: a release that breaks Tailscale itself. Mitigations:

  • tailscaled is pinned and updated separately from lbzf-*.
  • The systemd unit tailscaled has Restart=always and is independent of lbzf-* units.
  • Tested in CA: Sophia does a “deliberately break Tailscale” rehearsal once before the Pereira flight to ensure the local technical contact at LBZF has the runbook (out-of-band: phone Armando, who walks LBZF staff through the reboot).
  • SSH via Tailscale: ssh sophia@<jetson> "cd /opt/lbzf && git checkout <prev_sha> && systemctl restart lbzf-*".
  • If Tailscale is broken, the runbook escalates to Armando → LBZF on-site reboot.

Rollback is always faster than diagnose; we rollback first, debug later.

SecretWhereRotation
Cloudflare API token (agent).env (dev), Cloudflare Worker secret (prod)Quarterly + on offboard
Auth0 tenant config (public client ID, domain)Repo + Cloudflare build varsNever (it’s public)
JETSON_PROXY_SECRET (Worker → Jetson)Cloudflare Worker secret + /etc/lbzf/proxy.token on JetsonQuarterly
Internal HMAC for CV writer → API/etc/lbzf/internal.token on JetsonOn model deploy
SQLite encryption key (if we add SQLCipher)Not in Phase In/a
Camera RTSP passwords/etc/lbzf/cameras.yaml (mode 0600) on JetsonWhen cameras change
Tailscale auth keyPer-machine; not stored after enrollmentn/a
AWS credentialsPer Andrew’s “AWS day 1” — not yet provisionedTBD

Rule: no secret in the repo, ever. .env.example shows the keys; .env is .gitignored. Pre-commit hook scans for AKIA / sk_live_ / etc. patterns (gitleaks or hand-rolled).

WorkflowTriggerTargetJob
web-build.ymlpush to any branchlbzfai.com Cloudflare(handled by Cloudflare native integration, not GH Actions)
web-preview.yml (new)PRpreview Workerwrangler deploy --env preview-pr-<n>
lint.yml (new)every pushnoneruff, mypy, eslint
pytest.yml (new)push to main if app/** changednone (test results only)lint, mypy, pytest — but does not publish artifacts

jetson-build.yml and jetson-promote.yml are parked alongside the R2 manifest design. Phase II re-introduces them when fleet size justifies.

GH Actions secrets (Phase I):

  • CLOUDFLARE_API_TOKEN_CI — scoped only to Workers Scripts → Edit for the preview worker. No prod write, no R2 (since no manifest in Phase I).
  • CLOUDFLARE_ACCOUNT_ID — public-ish, but in secret store for tidiness.
  • Every Worker deploy logs a structured line {deploy_id, sha, user, ts} to Cloudflare Logs (free; 7-day retention).
  • Every SSH-driven Jetson update is a manual operation. Sophia logs the git SHA range she deployed in a deploy-log.md (or just git log --oneline <prev>..<new>) and pastes it into the project Slack/Discord channel.
  • Phase II adds structured deploy logs when the R2-manifest auto-pull thaws.

ITBA’s Jetson uses the same SSH-push workflow as LBZF’s, scoped to the tag:itba-dev tailnet tag. Configuration that differs (camera RTSP URLs, org_id, Tailscale auth) lives in /etc/lbzf/site.yaml and is not in the git repo — it’s per-machine.

When ITBA wants to diverge (try a different CV model, etc.), they branch from main locally on their Jetson and pin to that branch.

  • GitHub Actions deploys directly to Cloudflare instead of Cloudflare’s native GitHub integration: works, but duplicates the existing wiring. Native integration handles the build VM, secrets, and route configuration; reimplementing in Actions is a regression.
  • R2 release manifest + Watchtower-style auto-pull on the Jetson — the right design at fleet scale. Parked to 60-parking/r2-release-manifest.md until fleet exists or a non-author engineer ships.
  • systemd-timer git pull on the Jetson — every commit to main ships immediately. Fast but no decoupling. Sophia explicitly opts into SSH-push instead so “merged ≠ deployed” stays true.
  • Container image deploys (Docker on Jetson) — adds Docker to the Jetson stack we don’t otherwise need. JetPack + apt + systemd is the simpler base; revisit in Phase II if we get tired of dependency drift.
  • Ansible / Salt / cloud-init for Jetson provisioning — overkill for one box. A bash script + this doc is fine.
  • Cloudflare Pages instead of Workers Static Assets — we’re already on Workers Static Assets (per CLAUDE.md); the README is stale on this. Migrating is not a Phase I priority.
  • PARKED: Cloudflare R2 cost at our scale — moot for Phase I; revisit when R2 manifest thaws.
  • OPEN: agent push permission scope — currently .env has the Cloudflare token; should an agent ever directly call wrangler against prod, or only against preview? Default: preview only. Owner: Sophia.
  • OPEN: pre-commit secret scannergitleaks is the standard; install + add to CI. Owner: Sophia.
  • OPEN: how do migrations roll back? If 0.4.7 adds a column, 0.4.6 should still work; if 0.4.7 removes a column, downgrade silently breaks. Defaulting to additive-only migrations in Phase I; document the policy.
This doc depends onOwner bucketWhat we need
Tailscale on JetsonHardware (Agent A) + existing deployment.mdThe always-on Tailscale daemon is the rollback escape hatch
Cloudflare Worker route configThis bucket (lbzfai-jetson-integration.md)We deploy the Worker
systemd unit files for lbzf-dashboard, lbzf-cv-writer, lbzf-exporterthis bucket + ML (Agent D for cv-writer)All units co-evolve. lbzf-updater ships only when the R2-manifest design thaws.
DB migration policyThis bucket (data-model.md)Additive-only by default
This doc impliesOwnerAsk
ML/CV writer is part of the same git repo on the JetsonAgent DA single git pull updates all Jetson processes
Frontend deploys are decoupled from backend Jetson deploysFrontend (Agent C)They can ship UI changes without waiting for a Jetson SSH push
  1. SSH-push depends on Sophia being available. A 3-day Sophia outage (sickness, travel without internet) means no Jetson updates. Phase II thaws the R2-manifest auto-pull design specifically to remove this single-person bottleneck.
  2. Rollback rehearsal is mentioned but not scheduled. Without a documented dry-run on the dev Jetson, the first actual rollback in production is going to be exciting.
  3. .env Cloudflare token + agents — a clever attacker who gets read access to .env (e.g., a careless cat .env in an agent transcript) has Worker write access. The blast radius is the lbzfai.com Worker, not the AWS account, not Auth0, not the Jetson. Acceptable but worth being clear-eyed about.
  4. No security scan in CI yet (pip-audit, npm audit, gitleaks). Should be table stakes.
  5. No structured deploy audit trail in Phase I. Sophia’s hand-written “I deployed <sha> at <time>” notes are the only record. Acceptable at this scale; the R2-manifest thaw replaces this with proper structured logs.
  • Now (May 2026):
    • Document the scoped Cloudflare token in .env.example
    • Write web-preview.yml GH Action for PR previews
    • Add gitleaks pre-commit
  • Before Argentina (2026-05-15):
    • Demo a PR-driven preview deploy of lbzfai.com
    • ITBA gets the SSH-push workflow on the BA handoff Jetson
  • Before Pereira (Jul 2026):
    • SSH-push runbook tested end-to-end on the dev Jetson in CA, including a forced rollback
    • Runbook for local technical contact: “if the dashboard is down, call Armando, who phones Sophia”
  • Day-1 Pereira:
    • Deploy a known-good main SHA before flight; freeze the SHA during the first week of operation
    • First in-prod release is the bug-fix release that comes out of the deploy week
  • Phase II:
    • Thaw 60-parking/r2-release-manifest.md: build R2 bucket + manifest + lbzf-updater.service
    • Canary on ITBA’s Jetson first; promote to LBZF after 72h of green
    • Auto-rollback richer signals (cycle_events count drops to zero for 5 min → automatic rollback)

Unblocks: every other backend doc’s path to running on the Jetson; safer agent-driven website iteration; ITBA’s parallel work.