Promptable measurement for industrial micro-vision โ masks in, millimetres out.
Foundation models tell you what and where. The quantity that drives the field decision โ crack width, part dimension, count, condition grade โ is a different problem. Our central finding is a decomposition: mask for where, signal for how wide. A binary mask quantizes a gradual intensity edge and destroys sub-pixel evidence; reading width from the raw signal keeps it.
Median crack-width error vs. manually measured physical ground truth (krkCMd). The information was in the image all along.
A frozen SAM 3 backbone (wrapped in a synonym prompt-set ensemble) proposes instances; the metrology core converts pixels to audited physical units. Only SAM 3 and four small heads hold learned parameters โ none touch the scale arithmetic.
One promptable measurement layer across physical quantities, materials, and scenes โ each audited against real ground truth. Pick a use case to see the live inference and the before โ after.
Val/test separation, held-out sources, multi-seed where applicable, empty-GT scored separately. Lower is better for error; higher for IoU/AUC.
| Setting | median | worst |
|---|---|---|
| Single-view plane scale | 2.10% | 36.8% |
| Multi-view consensus | 0.49% | 11.5% |
| SAM 3 promptable (per view) | 2.5% | โ |
| Track | result |
|---|---|
| Fray defect (hand-crafted โ PaDiM-lite) | 0.68 โ 0.79 |
| Blowhole / crack / break (PaDiM-lite) | 0.94โ0.99 |
| TUM handheld checkerboard | 1.06% |
| ADT egocentric 3-D (oracle gate) | 8.7% |
The traps and the negatives are surfaced before a reviewer finds them.
SAM 3 wins segmentation mIoU yet its 62.9% width error is worse than classical adaptive (43.5%). Segmentation rank does not equal measurement rank.
Dense counting under-counts touching parts; a matting head that won 20ร synthetically failed to transfer until the synthesis was made realistic. Reported, not hidden.
Per-source conformal intervals keep 90% coverage; gated frames are reported as not-measurable rather than guessed โ coverage is always shown next to accuracy.
@misc{gaugeanything2026,
title = {GaugeAnything: Promptable Quantitative Inspection for Industrial Micro-Vision},
author = {Joo, Hyunwoo},
year = {2026},
url = {https://github.com/falcons-eyes/GaugeAnything}
}
Full method, experiments, and the architecture diagram are in the paper. Code is on GitHub; trained task heads on Hugging Face.