Independent AI engineer

AI engineering for production systems.

I work with founders and engineering teams shipping LLM-powered products. The focus is reliability — evaluations, retrieval, agent workflows, and the application-layer engineering that makes a demo become a system you can put in front of paying users.

Get in touch → Read writing

What I work on

Production AI, four angles.

Evaluations

Custom eval harnesses, error analysis, LLM-as-judge calibrated against human labels. The discipline that turns demos into systems you can stake a roadmap on.

Retrieval (RAG)

Hybrid search, reranking, query routing, synthetic eval data. Retrieval that holds up on inputs you have not seen yet.

Agent workflows

State machines and graphs for long-running tasks. Context engineering for agents that do not drift on day three.

The boring leverage

Prompt caching, model routing, structured outputs, observability. The invisible engineering that makes production cost and reliability go in the right direction.

Writing

Have a system that needs to actually work?

Selective engagements with founders and engineering teams building LLM-powered products. Reach out if you'd like to talk about what you're shipping.

work.shivamsharma@zohomail.in →

AI engineering for production systems.

Production AI, four angles.

Evaluations

Retrieval (RAG)

Agent workflows

The boring leverage

Recent entries

Build the eval harness first

RAG beyond embedding: the techniques that move quality numbers

Reliability is the pitch

Have a system that needs to actually work?