Current focus, writing pipeline, and availability.
Updated periodically. If you're reading this six months from now, assume it's stale and email me to confirm.
Current focus
- Evaluation harnesses for agentic pipelines — calibrating LLM-as-judge against human labels.
- Hybrid retrieval + reranker setups for production RAG systems.
- Model routing via LiteLLM for cost-per-successful-completion tracking.
Writing pipeline
- A minimum viable LLM evaluation harness — the version you can build in an afternoon.
- Hybrid search + rerankers — the techniques that move RAG quality numbers.
- Production reliability for LLM apps — a working checklist.
Availability
- Open to selective engagements with founders and engineering teams.
- Best fit: 4–12 week scoped projects with clear eval-backed outcomes.
- Reach out via email — replies within 48 hours.
Not doing
- Generic AI consulting or strategy decks.
- Inference-internals work (vLLM, CUDA, kernel-level optimisation).
- Long open-ended retainers without scoped deliverables.