Skip to content

Independent AI engineer

AI engineering for production systems.

I work with founders and engineering teams shipping LLM-powered products. The focus is reliability — evaluations, retrieval, agent workflows, and the application-layer engineering that makes a demo become a system you can put in front of paying users.

What I work on

Production AI, four angles.

Evaluations

Custom eval harnesses, error analysis, LLM-as-judge calibrated against human labels. The discipline that turns demos into systems you can stake a roadmap on.

Retrieval (RAG)

Hybrid search, reranking, query routing, synthetic eval data. Retrieval that holds up on inputs you have not seen yet.

Agent workflows

State machines and graphs for long-running tasks. Context engineering for agents that do not drift on day three.

The boring leverage

Prompt caching, model routing, structured outputs, observability. The invisible engineering that makes production cost and reliability go in the right direction.

Writing

Recent entries

Subscribe via RSS

Get in touch

Have a system that needs to actually work?

Selective engagements with founders and engineering teams building LLM-powered products. Reach out if you'd like to talk about what you're shipping.

work.shivamsharma@zohomail.in