GaugeAnything

Promptable measurement for industrial micro-vision โ€” masks in, millimetres out.

SAM 3 backbone physical ground truth audited benchmarks honest negatives
Prompt an object โ†’ get millimetres, micrometres, counts, and a condition grade. Not just where โ€” the output is a physical quantity.
Detection โ‰  Measurement

Finding it is easy. Reading it in millimetres is the hard part.

Foundation models tell you what and where. The quantity that drives the field decision โ€” crack width, part dimension, count, condition grade โ€” is a different problem. Our central finding is a decomposition: mask for where, signal for how wide. A binary mask quantizes a gradual intensity edge and destroys sub-pixel evidence; reading width from the raw signal keeps it.

Width from mask geometry
144โ€“186 ยตm
โ†’
Width from the signal (ours)
23.2 ยตm

Median crack-width error vs. manually measured physical ground truth (krkCMd). The information was in the image all along.

Architecture

A frozen perception front-end, an explicit metrology core

A frozen SAM 3 backbone (wrapped in a synonym prompt-set ensemble) proposes instances; the metrology core converts pixels to audited physical units. Only SAM 3 and four small heads hold learned parameters โ€” none touch the scale arithmetic.

GaugeAnything pipeline: image + prompt โ†’ SAM 3 (frozen) โ†’ metrology core (scale resolvers, regime router, signal width, robustness & audit) โ†’ Inspection Atoms
image + prompt โ†’ SAM 3 (frozen) โ†’ metrology core โ†’ Inspection Atom (m, c, s, g, u).
One model, many scenes

Explore the use cases

One promptable measurement layer across physical quantities, materials, and scenes โ€” each audited against real ground truth. Pick a use case to see the live inference and the before โ†’ after.

Real inference clips and audited evaluation assets across twenty scenes.

โ†’
Benchmarks

Audited against physical ground truth

Val/test separation, held-out sources, multi-seed where applicable, empty-GT scored separately. Lower is better for error; higher for IoU/AUC.

Crack width โ€” mask vs. signal

krkCMd, manual physical width (ยตm). Lower is better.
mask geometry
144โ€“186
signal, gated
39.9
signal, median
23.2
authors' DLM
11.1

Crack segmentation, zero-shot

CrackSeg9k, crack-only IoU, 3 seeds. Higher is better.
frangi
0.115
adaptive
0.181
SAM 3
0.442
2.44ร— the best classical baseline โ€” and it also wins detection.

Industrial part metrology

T-LESS, CAD + pose ground truth (mm). Relative error.
Settingmedianworst
Single-view plane scale2.10%36.8%
Multi-view consensus0.49%11.5%
SAM 3 promptable (per view)2.5%โ€”

Diffuse defects & dynamic scenes

Magnetic-Tile (AUC โ†‘) ยท TUM / ADT (relative error โ†“).
Trackresult
Fray defect (hand-crafted โ†’ PaDiM-lite)0.68 โ†’ 0.79
Blowhole / crack / break (PaDiM-lite)0.94โ€“0.99
TUM handheld checkerboard1.06%
ADT egocentric 3-D (oracle gate)8.7%
Rigor

We attacked our own results first

The traps and the negatives are surfaced before a reviewer finds them.

Measurement gap
Best mask โ‰  best measurement

SAM 3 wins segmentation mIoU yet its 62.9% width error is worse than classical adaptive (43.5%). Segmentation rank does not equal measurement rank.

Honest negative
We publish what fails

Dense counting under-counts touching parts; a matting head that won 20ร— synthetically failed to transfer until the synthesis was made realistic. Reported, not hidden.

Calibrated uncertainty
It says "not measurable"

Per-source conformal intervals keep 90% coverage; gated frames are reported as not-measurable rather than guessed โ€” coverage is always shown next to accuracy.

Reference

Cite

@misc{gaugeanything2026,
  title  = {GaugeAnything: Promptable Quantitative Inspection for Industrial Micro-Vision},
  author = {Joo, Hyunwoo},
  year   = {2026},
  url    = {https://github.com/falcons-eyes/GaugeAnything}
}

Full method, experiments, and the architecture diagram are in the paper. Code is on GitHub; trained task heads on Hugging Face.