Skip to content

Reproducibility & artifacts

Bucket: technical/ml (Agent D) · Status: draft · Owner: Sophia Mann · Phase: I (paper-grade) · Last updated: 2026-05-10

For the IEEE submission (CASE / Access / T-ASE / IROS targets), reproducibility is increasingly an explicit reviewer criterion. IEEE Access in particular asks authors to make data and code available “to the extent permitted by the institution / data provider.” A reviewer who can re-run the analysis script against the released dataset and see the paper’s numbers come out is a paper that gets accepted; one who cannot is a paper that gets rejected at R2.

LBZF data is classified Confidencial — Uso Interno (per the Needs Definition). Operator identifiers are linked to named individuals via Ref22 Slim - Angela.xlsx and INDICADORES ABRIL.xlsx. So the reproducibility plan cannot be “publish everything.” It must be: publish enough that the methodology is reproducible without compromising the operators or LBZF.

This doc specifies the artifact set, the open-source / data-release posture, and the storage / version discipline.

  • G1: Every figure/table in the paper can be regenerated from a small, named set of artifacts that lives in a known location.
  • G2: A third-party reviewer can run an evaluation script that exercises at least the model-side reproducibility, against a public sample.
  • G3: Operator identifiability is removable at the artifact level (face-blurring is a step in the pipeline, not a manual figure edit).
  • G4: LBZF retains control over its own footage; nothing exits without Mariana’s sign-off.
  • Long-term archive of all 4TB rolling video (this is operational, not artifact).
  • Public release of operator-identifiable production data.
  • Open-sourcing the dashboard/landing-page web stack — Agent A’s call.
  • Hosting infrastructure for downloadable artifacts beyond what’s already in scope (S3, GitHub, Drive).

The full set a paper-reviewer should be able to access (modulo LBZF approval per category):

ArtifactFormatPublic?Location
Code: detector + state machinePython package, lbzf-cv/YESGitHub sophiamann/lbzfai-cv (separate repo from the landing page)
Code: calibration toolPython + JS in same repoYESsame
Code: training pipelineSame repo, training/ subdirYESsame
Code: validation analysis script (validate_v1.py)Same repoYESsame
Model: yolov8n_lbzf_v0.pt (fine-tuned, if Stage B runs)PyTorch checkpointMAYBE — needs LBZF approvalIf yes: GitHub LFS or HuggingFace Hub
Model: yolov8n.engine (TensorRT)TensorRT engineNO — non-portable; rebuild on user’s Jetson(doc only)
Dataset: full Phase I training setCOCO-format JSON + framesNO (operator-identifiable)LBZF-owned S3
Dataset: 100-frame “public sample”COCO-format + frames, face-blurred + manual reviewMAYBE — LBZF sign-offHuggingFace Datasets or Zenodo
Dataset: split CSV (dataset_split_v0.csv)CSV, frame-id only (no images)YESGitHub
Validation: raw cycle_events.csv (anonymized)CSVMAYBE — aggregated onlyOSF or Zenodo
Validation: per-shift manual_observations.csv (Ronald stopwatch)CSVMAYBE — aggregated onlyOSF or Zenodo
Validation: paired Bland-Altman input dataCSVYES (aggregated, no operator IDs)OSF or Zenodo
Telemetry: health-score timeseries for validation windowCSVYESsame
Paper figures source files (notebooks / matplotlib scripts).ipynb / .pyYESsame
Design docs (this directory)MarkdownYESthis repo

The artifact set is intentionally conservative on raw data and liberal on code and aggregates. A reviewer can:

  • Read every design decision (markdown in this repo).
  • Re-run the model architecture and training pipeline (code).
  • Re-run the analysis pipeline against the released aggregate-level CSVs to confirm the paper’s statistical conclusions (analysis script).
  • Run the model on their own footage (a public-sample dataset with LBZF sign-off, optionally).

What a reviewer cannot do:

  • Re-train the model on the full LBZF dataset (data is LBZF-owned).
  • Audit operator-level performance (deliberately anonymized).
FieldConventionExample
model_nameyolov8n_lbzf_vN (N = 0, 1, …) or yolov8n_coco_pretrained (off-shelf baseline)yolov8n_lbzf_v0
model_sha256SHA-256 of the .pt filea3b9...
trained_ondataset hash, see belowlbzf_person_v0
trained_atUTC timestamp of training run2026-06-12T14:33:00Z
git_shacommit SHA of the training repo at training timef4e5...
framework_versionultralytics==8.x.y, torch==2.z.w(pinned in requirements.txt)
tensorrt_engine_jetpackJetPack version the .engine was built against (TRT engines are non-portable)6.0-r36.3

Each dataset version is a directory with:

  • images/ (frames, anonymized if released)
  • annotations.coco.json
  • dataset_split_vN.csv — frame_id, split (train/val/test), source_video, recorded_at
  • manifest.json — hash of every image + annotation
  • LICENSE — for the public sample; the private full set has an LBZF data-use agreement (Agent E)

Hash the manifest. dataset_id = sha256(manifest.json). This is what model.trained_on references.

Every row in cycle_events carries model_version, roi_version (per roi-calibration.md), and git_sha of the inference code. So any historical analysis can name the exact configuration that produced any given event.

TierWhatWhereWho
Tier 0 — publicCode, design docs, dataset split CSV, aggregate CSVs, public-sample dataset (if approved), figures source.GitHub sophiamann/lbzfai-cv; HuggingFace Hub for model + sample dataset (if approved); Zenodo / OSF for aggregate CSVs (durable DOIs for paper citation).Anyone
Tier 1 — restrictedFull training dataset, full validation manual_observations, raw video clips used for figures.LBZF-owned S3 bucket (or LBZF Google Drive). Access via signed-URL for reviewers under NDA, or via Mariana-approved researcher list.LBZF + named collaborators (Sophia, Andrew, ITBA, designated reviewers)
Tier 2 — private4TB rolling video buffer; per-frame logs; operator-identity coding keys; consent forms.Jetson NVMe (locally); not exported.LBZF only

GitHub LFS for model checkpoints (Tier 0). Avoid putting large files in the main repo. The training-pipeline repo is sophiamann/lbzfai-cv (or whatever Sophia + Andrew settle on; the landing-page repo is intentionally separate per project hygiene).

Open-source posture (the LBZF coordination question)

Section titled “Open-source posture (the LBZF coordination question)”

Coordinate with Agent E. This doc proposes the following default, to be confirmed with Mariana before any public release:

  • Code (Tier 0): permissive license (MIT or Apache 2.0). No business reason to restrict the CV pipeline code; it’s a reference implementation of a small-factory deployment. Andrew’s “open-source ambition” framing (docs/overview/project.md) supports this.
  • Public-sample dataset (~100 frames, face-blurred): CC-BY-4.0 or CDLA-Permissive-2.0 with an explicit “this dataset is for research only; do not attempt to re-identify operators” clause. Requires Mariana sign-off.
  • Full dataset: NOT released. LBZF-owned. A reviewer-NDA path exists but is gated on Mariana.
  • Fine-tuned model checkpoint: released IF Stage B fine-tuning happens AND the model does not memorize identifiable operators (verify via membership inference attack on a sample subset before release). Default conservative: not released without check.

A reviewer who wants to reproduce the paper’s claims should be able to:

Terminal window
# 1. Clone the repo
git clone https://github.com/sophiamann/lbzfai-cv
cd lbzfai-cv
git checkout paper-v1 # tagged at paper submission
# 2. Set up env
pip install -r requirements.txt # pinned versions
# 3. Pull the public sample dataset (if released)
make data-sample # downloads from HuggingFace
# 4. Re-run the validation analysis against aggregate CSVs
python validate_v1.py --inputs data/aggregate/ --output results/
# 5. Compare to the paper's results
diff results/ paper_results/ # files should match

This is the test: if diff is empty, the paper’s numbers are reproducible from the released artifacts. Any divergence is a paper bug.

AltWhy rejected
Release everything publiclyViolates LBZF confidentiality; jeopardizes operator consent; legally fraught under Colombian Law 1581/2012.
Release nothing publiclyDisqualifies the paper from competitive IEEE venues; defeats the project’s reference-implementation framing.
Release only code, no dataAcceptable but weaker — reviewers can’t independently verify any data-dependent claim. The aggregate-CSV-and-public-sample middle path is better.
Use Roboflow Public for the datasetSame Roboflow-public-license issue as training-and-finetuning.md §B.1 — incompatible with LBZF confidentiality without re-licensing.
HuggingFace gated dataset (request access)Reasonable for the full dataset if LBZF allows. Default for now: full dataset not released even gated; revisit.
Self-host artifact storage on the lbzfai.com Cloudflare workerCute but Cloudflare R2/Workers are not durable-DOI venues. Zenodo / OSF / HuggingFace are the right homes for paper-cited artifacts.
  • OPEN[Mariana via Agent E, by 2026-08-01]: Approve the open-source posture. Specifically: (a) public release of code, (b) public release of a face-blurred 100-frame sample, (c) NDA path for the full dataset for reviewers.
  • OPEN[Andrew, by 2026-06-01]: License choice for the code — MIT vs Apache 2.0. (Apache 2.0 has the patent grant which is mildly relevant if this is ever commercialized by LBZF.)
  • OPEN[Sophia, by 2026-06-15]: Reserve a Zenodo DOI for the aggregate-data archive before paper submission so the paper can cite a durable URL.
  • OPEN[ITBA, by 2026-06-15]: ITBA’s twin install may add its own dataset and metrics. Coordinate whether ITBA’s data is in the same Zenodo deposit or separate. Co-authorship implications.
  • OPEN[Sophia, by 2026-09-01]: Membership-inference attack on the fine-tuned model — does the model leak operator identity? If yes, do not release the checkpoint.
  • OPEN[Agent C]: TensorRT engine versioning is JetPack-pinned; what’s the right “minimum-supported JetPack version” claim for the paper?
  • Agent A (frontend): footer of the dashboard surfaces model_version + roi_version + commit SHA so any user can ask “which version produced this number.” Five-line UI change but essential for reproducibility.
  • Agent B (backend): cycle_events.model_version, cycle_events.roi_version, cycle_events.code_sha columns. Already requested in cycle-event-detection.md and roi-calibration.md.
  • Agent C (hardware): TensorRT engine is non-portable; the artifact set documents how to rebuild it on a Jetson but does not include the binary. Confirm with Agent C the rebuild recipe is robust enough that a reviewer can follow it.
  • Agent E (business/legal): LBZF approval; dataset license drafting; reviewer-NDA template.
  1. The “public sample” is conditional on Mariana’s approval and the approval has not been requested. Without it, the paper falls back to “code-only reproducibility” which is weaker. The ask has to be teed up well before paper submission; the conversation with Mariana should happen before the Pereira trip, not after.
  2. No formal data-management plan (DMP). Funder DMPs (NSF, EU) are not strictly required here (no funder), but IEEE Access reviewers increasingly look for one. A 1-page DMP would close the gap.
  3. Membership-inference checks on the fine-tuned model are listed as a release gate but no procedure is specified. Real procedure: hold out 50 known-training and 50 known-not-training operator frames, run a shadow-model-based inference attack; if AUC > 0.6, do not release. This needs to be a step in the training pipeline.
  4. No story for retracting an artifact once released. If LBZF revokes consent or an operator withdraws after publication, what’s the takedown procedure? HuggingFace and Zenodo both support takedowns but the process is manual; not specified.
  5. GitHub LFS for model checkpoints has a free-tier 1 GB storage / 1 GB-month bandwidth cap. YOLOv8n’s .pt is ~6 MB — fine — but if Phase II’s bigger models go in the same repo, the cap bites. Plan to migrate model artifacts to HuggingFace Hub from day 1 to dodge the issue.
  6. Reproducibility-via-diff assumes deterministic analysis. Bootstrap CIs depend on RNG seed; the analysis script must pin seeds and the validation doc must say which seed. A reviewer running with a different seed sees ε-level disagreement and may flag it. Specify the canonical seed.
DateGate
2026-05-25sophiamann/lbzfai-cv repo created with skeleton + LICENSE.
2026-06-01model_version / roi_version columns in DB; inference code logs both per cycle.
2026-06-15First version of validate_v1.py running against synthetic data.
2026-07-15First version of validate_v1.py running against real Pereira data.
2026-08-01Mariana approval ask for the public-sample dataset; Zenodo DOI reserved.
2026-09-01Paper-v1 tag in repo; artifact set frozen.
2026-09-15Paper submitted.
  • Data availability section of the paper points to the Tier-0 artifacts and to the reviewer-NDA path for Tier-1.
  • Reproducibility checklist (some venues require it; if so, this doc is the checklist).
  • Methods cite the model_version / roi_version / dataset_id triple — makes the experimental setup unambiguous.
  • A footnote: “Code at https://github.com/sophiamann/lbzfai-cv, tagged paper-v1.”
  • An acknowledgement: “Data made available under the terms of an institutional research agreement with Louis Barton Zona Franca SA.”