Now

Updated April 30, 2026

Current focus, writing pipeline, and availability.

Updated periodically. If you're reading this six months from now, assume it's stale and email me to confirm.

Current focus

Evaluation harnesses for agentic pipelines — calibrating LLM-as-judge against human labels.
Hybrid retrieval + reranker setups for production RAG systems.
Model routing via LiteLLM for cost-per-successful-completion tracking.

Writing pipeline

A minimum viable LLM evaluation harness — the version you can build in an afternoon.
Hybrid search + rerankers — the techniques that move RAG quality numbers.
Production reliability for LLM apps — a working checklist.

Availability

Open to selective engagements with founders and engineering teams.
Best fit: 4–12 week scoped projects with clear eval-backed outcomes.
Reach out via email — replies within 48 hours.

Not doing

Generic AI consulting or strategy decks.
Inference-internals work (vLLM, CUDA, kernel-level optimisation).
Long open-ended retainers without scoped deliverables.