GaugeAnything — Promptable Quantitative Inspection for Industrial Micro-Vision

Detection ≠ Measurement

Finding it is easy. Reading it in millimetres is the hard part.

Foundation models tell you what and where. The quantity that drives the field decision — crack width, part dimension, count, condition grade — is a different problem. Our central finding is a decomposition: mask for where, signal for how wide. A binary mask quantizes a gradual intensity edge and destroys sub-pixel evidence; reading width from the raw signal keeps it.

Width from mask geometry

144–186 µm

→

Width from the signal (ours)

23.2 µm

Median crack-width error vs. manually measured physical ground truth (krkCMd). The information was in the image all along.

Architecture

A frozen perception front-end, an explicit metrology core

A frozen SAM 3 backbone (wrapped in a synonym prompt-set ensemble) proposes instances; the metrology core converts pixels to audited physical units. Only SAM 3 and four small heads hold learned parameters — none touch the scale arithmetic.

GaugeAnything pipeline: image + prompt → SAM 3 (frozen) → metrology core (scale resolvers, regime router, signal width, robustness & audit) → Inspection Atoms — image + prompt → SAM 3 (frozen) → metrology core → Inspection Atom (m, c, s, g, u).

One model, many scenes

Explore the use cases

One promptable measurement layer across physical quantities, materials, and scenes — each audited against real ground truth. Pick a use case to see the live inference and the before → after.

Real inference clips and audited evaluation assets across twenty scenes.

Use case:

→

Benchmarks

Audited against physical ground truth

Val/test separation, held-out sources, multi-seed where applicable, empty-GT scored separately. Lower is better for error; higher for IoU/AUC.

Crack width — mask vs. signal

krkCMd, manual physical width (µm). Lower is better.

mask geometry

144–186

signal, gated

39.9

signal, median

23.2

authors' DLM

11.1

Crack segmentation, zero-shot

CrackSeg9k, crack-only IoU, 3 seeds. Higher is better.

frangi

0.115

adaptive

0.181

SAM 3

0.442

2.44× the best classical baseline — and it also wins detection.

Industrial part metrology

T-LESS, CAD + pose ground truth (mm). Relative error.

Setting	median	worst
Single-view plane scale	2.10%	36.8%
Multi-view consensus	0.49%	11.5%
SAM 3 promptable (per view)	2.5%	—

Diffuse defects & dynamic scenes

Magnetic-Tile (AUC ↑) · TUM / ADT (relative error ↓).

Track	result
Fray defect (hand-crafted → PaDiM-lite)	0.68 → 0.79
Blowhole / crack / break (PaDiM-lite)	0.94–0.99
TUM handheld checkerboard	1.06%
ADT egocentric 3-D (oracle gate)	8.7%

Rigor

We attacked our own results first

The traps and the negatives are surfaced before a reviewer finds them.

Measurement gap

Best mask ≠ best measurement

SAM 3 wins segmentation mIoU yet its 62.9% width error is worse than classical adaptive (43.5%). Segmentation rank does not equal measurement rank.

Honest negative

We publish what fails

Dense counting under-counts touching parts; a matting head that won 20× synthetically failed to transfer until the synthesis was made realistic. Reported, not hidden.

Calibrated uncertainty

It says "not measurable"

Per-source conformal intervals keep 90% coverage; gated frames are reported as not-measurable rather than guessed — coverage is always shown next to accuracy.

Reference

Cite

@misc{gaugeanything2026,
  title  = {GaugeAnything: Promptable Quantitative Inspection for Industrial Micro-Vision},
  author = {Joo, Hyunwoo},
  year   = {2026},
  url    = {https://github.com/falcons-eyes/GaugeAnything}
}

Full method, experiments, and the architecture diagram are in the paper. Code is on GitHub; trained task heads on Hugging Face.