Excel export spec — `INDICADORES` drop-in
Bucket: Backend (Agent B) Status: Draft v0.1 — 2026-05-10 Owner: Sophia · Reviewers: Ronald (drop-in compatibility), Andrew, Agent E Source-of-truth files we are mirroring:
INDICADORES ABRIL.xlsx,Ref22 Slim - Angela.xlsx(“Balanceo” tab)
Context
Section titled “Context”Ronald Gonzalez’s adoption of the CV system is conditional on the Excel export looking exactly like the spreadsheet he produces today. If he has to retype rows into his existing template, the system is a net negative for him. This doc pins the export format, the cadence, the delivery channel, and the es-CO locale rules, even though we currently do not have direct read access to INDICADORES ABRIL.xlsx in the repo — assumptions are documented explicitly so they can be checked against the real file.
The decision being made: what file, what columns, what cadence, what locale, what delivery channel.
- A single Excel file per shift / per day that Ronald can drop into his workflow without re-entry
- es-CO locale throughout:
dd/mm/yyyydates, comma decimal separator, period thousand separator - Deterministic file naming and a stable storage location on the Jetson + cloud
- Reproducible from the SQLite source-of-truth — re-running the exporter for a past day yields the same output
- Both human-readable formatting (formulas, banding, freeze panes) and a parallel
.csvmirror for any downstream automation
Non-goals
Section titled “Non-goals”- Replacing Ronald’s monthly summary roll-ups (lost-time by code, presence %, monthly efficiency curves). Phase I exports the daily indicador rows; the monthly summary is computed by Ronald’s own spreadsheet — we feed its inputs.
- Live charts inside the Excel file. The dashboard provides live; the Excel is a record.
- PDF export. Out of scope; if needed, generate via LibreOffice
--convert-to pdflater. - Pushing data into Ronald’s existing spreadsheet file (no in-place mutation of an
.xlsxhe owns).
Proposed approach
Section titled “Proposed approach”File format and library
Section titled “File format and library”.xlsx(Excel 2007+) — what Ronald’s current workflow uses.- openpyxl to write — already in the v1.9 stack. Pandas only for the SQL-pull layer; we hand the DataFrame to openpyxl ourselves so we can control formatting (we cannot get banding, freeze-panes, and locale-aware number formats from
df.to_excel()alone). .csvmirror written alongside (UTF-8 with BOM, semicolon separator — see locale below) so downstream automation can ingest without an xlsx parser.
Filename and storage location
Section titled “Filename and storage location”/data/exports/ indicadores-<module>-<YYYY-MM-DD>.xlsx # daily, per module indicadores-<module>-<YYYY-MM-DD>.csv # csv mirror indicadores-<module>-<YYYY-MM>-monthly.xlsx # monthly summary (Phase II)Example: indicadores-ANGELA-2026-07-15.xlsx.
Mirrored to Google Drive at LBZF Computer Vision Project / Exports / <YYYY-MM>/ once the AWS / Drive integration is live (Phase II), and to S3 at s3://lbzfai-exports/<YYYY-MM>/ for retention.
Cadence — when does the exporter run
Section titled “Cadence — when does the exporter run”Decision: one file per (module, local-date), generated at end-of-shift plus on-demand from the dashboard.
| Cadence | Why |
|---|---|
| End of shift (automatic) | Mirrors Ronald’s current “end of day, do the Excel” routine. Triggered by a systemd timer that runs at 18:00 CO time on weekdays. |
| On-demand from dashboard | If the day’s data needs a re-run (a manual operator correction, a model re-deploy), Ronald or Sophia can re-export. |
| Per-week summary | Phase I non-goal. The day files concatenate trivially in Excel. |
| Per-shift (if LBZF runs two) | Re-run with --shift=1 and --shift=2. OPEN: confirm shift count with Ronald. |
Delivery channel — where does the file land
Section titled “Delivery channel — where does the file land”Phase I:
- Written to
/data/exports/on the Jetson (canonical). - Emailed to Ronald (
ingenierialbarton@gmail.com) and Mariana, with the file attached. SMTP via SendGrid (1k/day free tier) or AWS SES once the account is live. - Downloadable from the dashboard (
GET /api/exports/<filename>, seedashboard-api.md).
Phase II adds Google Drive upload to a Ronald-owned folder.
Email is the Phase I primary delivery because (a) Ronald’s existing inbox is the workflow he checks daily, (b) Drive integration requires OAuth into Ronald’s account and that’s not on the May 15 critical path.
OPEN: SMTP credentials and sender domain — owner: Sophia / Andrew, blocker for: end-to-end email delivery. Recommended sender: reports@lbzfai.com (configure SPF/DKIM on the lbzfai.com Cloudflare zone).
Locale (es-CO) — applied everywhere, not just the export
Section titled “Locale (es-CO) — applied everywhere, not just the export”- Dates:
dd/mm/yyyy(e.g.,15/07/2026) - Datetimes (where shown):
dd/mm/yyyy HH:MM(24-hour) - Decimal separator:
,(comma) - Thousand separator:
.(period) - Percentage:
45,7%(not45.7%) - Excel number format strings:
"#.##0,00"for numbers,"0,0%"for percentages,"dd/mm/yyyy"for dates - CSV: semicolon delimiter (not comma) because the comma is the decimal separator in es-CO. UTF-8 with BOM so Excel opens it correctly.
- Negative numbers in efficiency columns rendered red via conditional formatting (Ronald’s spreadsheet does this).
Sheet layout (assumed — confirm against actual file)
Section titled “Sheet layout (assumed — confirm against actual file)”All structural details below are assumptions based on what a typical Latin-American industrial-engineer’s daily indicadores sheet contains. They WILL need correction once the real
INDICADORES ABRIL.xlsxis parsed. Flag the diff for Ronald.
Sheet 1: Indicadores (per-cycle detail, one row per cycle_event)
| Col | Header (Spanish) | Source | Format | Notes |
|---|---|---|---|---|
| A | fecha | cycle_events.start_ts → CO local date | dd/mm/yyyy | |
| B | hora_inicio | cycle_events.start_ts → CO local time | HH:MM:SS | |
| C | hora_fin | cycle_events.end_ts → CO local time | HH:MM:SS | blank if in_progress |
| D | módulo | modules.display_name | text | |
| E | puesto | workstations.display_name | text | |
| F | operación | workstations.operation | text | matches Spanish vocab in operations.md |
| G | operario | operators.display_name | text | blank in Phase I if operator capture deferred |
| H | OP | cycle_events.ref_op_order | text | optional in Phase I |
| I | SAM (min) | standard_times.sam_minutes joined on operation | #.##0,00 | |
| J | tiempo std (s) | standard_times.sec_per_cycle | #.##0,0 | derived from SAM × 60 |
| K | tiempo real (s) | cycle_events.duration_seconds | #.##0,0 | |
| L | eficiencia | tiempo_std / tiempo_real | 0,0% | conditional format: <80% red, 80–95% yellow, >95% green |
| M | piezas | always 1 in Phase I | 0 | one cycle = one piece; aggregated on Sheet 2 |
| N | máquina | standard_times.machine_code | text | MPP / MREV / MOJA / MBOT / PLCH |
| O | confianza CV | cycle_events.confidence | 0,00 | so Ronald can see when CV was uncertain |
| P | modelo CV | cv_model_versions.name + version | text | |
| Q | observaciones | cycle_events.notes | text | manual override reason, etc. |
Header row frozen (Window > Freeze top row). Autofilter on row 1.
Sheet 2: Resumen Módulo (per-day per-module summary, mirrors INDICADORES ABRIL.xlsx’s monthly headers)
| Col | Header | Source / Formula |
|---|---|---|
| A | fecha | local date |
| B | módulo | |
| C | jornada (min) | from shifts.end_ts - shifts.start_ts |
| D | #operarios | distinct cycle_events.operator_id for the day OR seeded modules.operator_count (Phase I default 22 for ANGELA) |
| E | OP | distinct cycle_events.ref_op_order for the day |
| F | SAM/garment (min) | standard_times total for the reference (24,16 for Ref22) |
| G | unidades | distinct completed cycles at the last station in the sequence (i.e., DOBLAR Y PRESENTAR CAMISA ML) |
| H | total minutos productivos | sum(cycle_events.duration_seconds) / 60 |
| I | eficiencia módulo | (unidades × SAM/garment) / (jornada × #operarios) |
| J | rendimiento módulo | unidades / capacidad_objetivo_horaria * 100 |
Column names and formulas need to be diffed against INDICADORES ABRIL.xlsx once it’s parseable. The columns above are the v1.9 doc’s stated list — see docs/factory/current-state.md.
Sheet 3: Tiempos Estándar (read-only mirror of standard_times)
So the file is self-contained for an auditor — they don’t need a second spreadsheet to interpret it.
| Col | Header | Source |
|---|---|---|
| A | referencia | standard_times.reference |
| B | operación | standard_times.operation |
| C | SAM (min) | standard_times.sam_minutes |
| D | máquina | standard_times.machine_code |
| E | fuente | standard_times.source (default tiempo std manual) |
Sheet 4 (Phase II placeholder): Tiempo Perdido
For the 22-code C001..C022 taxonomy. Blank in Phase I.
Implementation sketch
Section titled “Implementation sketch”from datetime import datefrom zoneinfo import ZoneInfoimport pandas as pdfrom openpyxl import Workbookfrom openpyxl.styles import PatternFill, Font, NamedStylefrom openpyxl.utils import get_column_letter
CO = ZoneInfo("America/Bogota")
def export_day(db_path: str, module_code: str, target_date: date, out_dir: str) -> str: rows = _query_cycles(db_path, module_code, target_date) # Sheet 1 summary = _summarize(rows, module_code, target_date) # Sheet 2 std_times = _query_std_times(db_path, module_code) # Sheet 3
wb = Workbook() _write_indicadores(wb.active, rows) _write_resumen(wb.create_sheet("Resumen Módulo"), summary) _write_std_times(wb.create_sheet("Tiempos Estándar"), std_times) wb.create_sheet("Tiempo Perdido") # Phase II placeholder
out = f"{out_dir}/indicadores-{module_code}-{target_date.isoformat()}.xlsx" wb.save(out) _write_csv_mirror(rows, out.replace(".xlsx", ".csv")) return outExcel-write functions own the number-format strings, the conditional-format rules, the freeze pane, and the column widths. pandas.DataFrame.to_excel() is not used as the final writer — we want full control.
Determinism
Section titled “Determinism”Re-running export_day(db, "ANGELA", 2026-07-15) must produce byte-identical output if the underlying data is unchanged. This means:
- Stable row ordering:
ORDER BY workstation_id, start_ts - No
datetime.now()in cell values (only in a “generated_at” footer cell, which is fine) - openpyxl’s
Workbook.save()is deterministic except for the modified-at metadata; we strip / pin it viawb.properties.modified = wb.properties.created = datetime(2026,1,1)so file hashes match across re-runs
This matters for the paper: reviewers ask for the exact export file used in figures.
Sample row (illustrative)
Section titled “Sample row (illustrative)”fecha hora_inicio hora_fin módulo puesto operación operario OP SAM (min) tiempo std (s) tiempo real (s) eficiencia piezas máquina confianza CV modelo CV observaciones15/07/2026 07:14:22 07:15:08 Angela Puesto 1 PESPUNTAR CUELLO Operario A OP-2026-0712 0,38 22,8 46,1 49,5% 1 MPP 0,87 yolov8n-coco/8.1.3015/07/2026 07:15:35 07:16:09 Angela Perrillas PRENDER PERILLA Operario B OP-2026-0712 0,44 26,4 34,2 77,2% 1 MPP 0,91 yolov8n-coco/8.1.30Note the comma decimal separator and dd/mm/yyyy. This is what Ronald sees.
Confidentiality marker
Section titled “Confidentiality marker”The footer of each sheet writes Confidencial — Uso Interno per the LBZF Needs Definition. Sheet headers do not embed operator names in the file path (so screenshots of the filename in a project status report don’t leak names).
Alternatives considered
Section titled “Alternatives considered”- CSV-only. Cheap to write, hostile to Ronald’s existing workflow. Rejected.
- Google Sheets API direct writes (skip the .xlsx file). Couples us to Ronald’s Drive auth on day one; defers cleanly to Phase II. Rejected for Phase I.
- Live Excel “linked” file (Excel opens and refreshes from a SQL connection) — fragile, requires Excel + ODBC on Ronald’s machine. Rejected.
pandas.to_excel()only — no formatting control, no freeze pane, no conditional format. The 30-line openpyxl wrapper buys Ronald the visual parity. Rejected.- Per-cycle PDF receipts — interesting for IEEE paper figures, not for Ronald’s workflow. Defer.
Open questions
Section titled “Open questions”- OPEN: actual column set in
INDICADORES ABRIL.xlsx— owner: Ronald / Armando, blocker for: every column header above being right. Action: pull a copy of the April file into the repo asdocs/references/sample-indicadores.xlsx(gitignored if it contains operator names; committed as anonymized CSV mirror). - OPEN: shift count — owner: Ronald. Phase I assumes single shift, M-F.
- OPEN: email sender and SMTP credentials — owner: Andrew / Sophia.
- OPEN: who receives the email automatically every day? Ronald + Mariana is the safe default; add Sophia / Andrew as BCC? Owner: Mariana.
- OPEN: how should the exporter behave if a cycle is still
in_progressat export time — drop it, include it withend_tsblank, or wait? Default: include withend_tsblank and an “(en curso)” observation. Owner: Ronald. - OPEN: monthly summary — Phase II per scope, but Ronald may ask immediately. Defer.
- OPEN: locale of weekday names (e.g., “lunes 15/07/2026”) — Spanish, but do we need it? Ronald’s existing sheet doesn’t, per the v1.9 doc.
Cross-bucket dependencies
Section titled “Cross-bucket dependencies”| This doc depends on | Owner bucket | What we need |
|---|---|---|
| Standard-times seed | This bucket (data-model.md) | Without standard_times populated, columns I/J are blank |
| Operator capture decision | This bucket + Agent E | Column G blank or not |
| Email sending infra | This bucket (deployment-and-cicd.md) | SES / SendGrid / SMTP credentials |
| Drive integration (Phase II) | Frontend or new sub-bucket | OAuth into Ronald’s account |
| This doc implies | Owner bucket | Ask |
|---|---|---|
Auth0 user identity flows into audit_log when Ronald downloads an export | Frontend (Agent C) + this bucket | Pass identity through |
ML/CV writer populates cycle_events.confidence, detection_count, cv_model_version_id | ML (Agent D) | Otherwise columns O/P are NULL |
What’s weak in this doc
Section titled “What’s weak in this doc”- The single biggest assumption is the column set in
INDICADORES ABRIL.xlsx— we cannot guarantee drop-in until we open the file. If even one column header differs, Ronald has to copy-paste between sheets and we lose his adoption. - No locale-correctness test in CI. A late-introduced bug (e.g., openpyxl version that defaults to US format) would silently break Ronald’s workflow. Need a golden-file test: fixture DB → exporter → byte-diff against a checked-in golden xlsx.
- Email delivery is the brittlest part of the pipeline. SMTP from the Jetson is fragile (Colombia ISPs often block port 25); we should use SES/SendGrid HTTP API, but that means an outbound network dependency, which means a Tailscale-vs-public-internet question we haven’t answered.
- No charts on Sheet 2. Ronald’s existing sheet may rely on hand-drawn charts; if so we’ll need to add an openpyxl chart object or accept a degraded version-1.
- Deterministic / paper-reproducible export is asserted but not tested. Even with metadata pinning, openpyxl can drift across minor versions and break byte-equality.
- “Resumen Módulo” formulas are educated guesses. The
eficienciaformula in particular needs to be verified against Ronald’s actual computation — there are several reasonable formulas (unidades × SAM / (jornada × operarios), vs.total_minutos_productivos / (jornada × operarios)) and the wrong choice will produce numbers that disagree with his existing reports.
Rollout
Section titled “Rollout”- Now (May 2026): stub exporter that runs against the seed DB and produces a one-row demo file with hard-coded data, to validate locale formatting visually.
- Before Argentina (2026-05-15): show ITBA the demo
.xlsxso they can mirror format in their own twin. - Before Pereira (Jul 2026): real exporter against a 1-week run on the dev Jetson; diff against an actual
INDICADORES ABRIL.xlsxpage that Armando obtains; iterate until Ronald says “this is the file I would have made.” - Day-1 of Pereira deploy: end-of-shift cron + email sending live; Sophia BCC’s herself for the first week.
- Phase II: Drive upload; monthly roll-up; lost-time sheet populated by automated detectors.