DOC-RES-2026 · 6 notes

Research notes.

// Short technical notes from active engagements. Anonymized when needed. We publish what we'd want to read.

// note_014.md · Apr 2026

Why your eval set should hurt

Eval sets that don't have at least 20% painful edge cases are giving you false confidence. A short note on why we deliberately overweight failure modes.

// 8 min read→ open

// note_013.md · Mar 2026

Hybrid retrieval, when and when not

BM25 + dense retrieval beats either alone — but only above a corpus size threshold. Here's where the line is, with numbers from three recent engagements.

// 11 min read→ open

// note_012.md · Feb 2026

The "internal copilot" trap

Most "internal SEO tools" fail because they target a workflow that's actually 4 different workflows. A short framework for scoping copilot projects.

// 9 min read→ open

// note_011.md · Jan 2026

Drift detection without a labeled stream

You don't always have ground-truth labels in production. Here's how we run drift detection on outputs alone, and where it breaks.

// 12 min read→ open

// note_010.md · Dec 2025

OCR + SEO extraction: cost, accuracy, latency

Three real numbers from a recent invoice-processing engagement. The SEO is rarely the bottleneck.

// 7 min read→ open

// note_009.md · Nov 2025

Calibration matters more than accuracy

A 92%-accurate classifier with bad calibration is worse than an 87%-accurate one with good calibration. Why, and what to do about it.

// 10 min read→ open

// SUBSCRIBE

Get the monthly note.

One technical note per month, in your inbox. No promotions. Read by ~3,800 ML practitioners.

→ get in touch