Reliability is the pitch
The dominant pain in production LLM applications is reliability — systems that produce trustworthy answers consistently. Cost optimisation is a downstream consequence of doing reliability well, not a competing pitch.
There is a temptation, when working on the application layer of LLM systems, to lead with cost. Prompt caching, model routing to cheaper models for cheaper subtasks, smaller models with structured outputs — all of these are real techniques that produce real bill reductions. Forty percent cost cuts are not unusual.
Cost is not the pitch. Reliability is. This post is about why.
The conversation buyers are actually having
In production deployments of LLM applications, the questions that come up are some flavour of: the thing breaks on inputs that worked yesterday. The agent gets stuck in a loop. The retrieval pulls the right document but the answer cites the wrong section. The vendor’s model updated last week and now half the test suite is failing. The PoC was magic and the rollout has been a slow embarrassment.
Cost is downstream of all of that. If a system is unreliable, lowering the per-call cost is a category error: the budget goes to producing wrong answers more efficiently. The conversation that goes anywhere is can this thing produce a trustworthy answer often enough that I would put my name on it. Cost shows up in that conversation, but as a consequence: “and by the way, here is what we did to also bring per-call cost down 40%.”
That second sentence is a much better sentence than the first.
Three reasons cost-as-pitch is structurally weak
One. Cost is something the buyer can already estimate. They have the bill. They know roughly what it is. They are, at most, mildly annoyed about it. Coming in with a pitch about something they already understand is hard.
Two. Cost optimisation is mostly invisible when it works. The system gets cheaper, the bill drops, the buyer notices nothing, and the case study reads “look at this lower number” — which doesn’t move anyone reading it.
Three — and this is the structural one: cost-as-pitch puts you in the procurement bucket. You are a vendor optimising a line item. Reliability-as-pitch puts you in the help me sleep at night bucket. The two have different decision-makers, different sales cycles, and different price tags.
The reframe
The pitch is not cost. The pitch is production-survival. Cost work is part of the package because cheaper iteration funds more iteration funds better evaluations funds higher reliability. But cost shows up in the case study at the end, not the headline at the top.
Old: “I help reduce LLM costs.”
New: “I help your AI system survive contact with real users. Lower bills are how we measure part of it.”
The first sentence describes a knob. The second describes an outcome.
What this implies for technique selection
If reliability is the goal and cost is the byproduct, the order of operations changes:
- Evaluations first. You cannot improve reliability you cannot measure.
- Retrieval depth. Most reliability problems are upstream context problems.
- Structured outputs. Schema-constrained outputs eliminate an entire class of brittleness.
- Model routing. Once you can measure reliability per task, route the easy tasks to cheaper models — this is where cost reduction actually shows up, measured against the eval suite, not against vibes.
- Prompt caching. Mechanical, last-mile. Real money saved, no quality cost when applied correctly.
Notice that cost-direct techniques are last. This is not because they don’t work; they do. It is because they are leverage you apply after you can prove the system still passes its eval suite.
What to take from this
If you are building or selling AI engineering services, lead with reliability. Lead with the system that survives production. Document cost reductions in the case study; do not lead with them. The buyers who care about cost-as-headline are not the buyers you want; the buyers who care about reliability will care about cost too, in the right order.
If you are buying AI engineering services and someone is leading their pitch with cost, ask them what the eval pass rate of the system is before and after their work. The answer will tell you whether they are optimising the cost of producing wrong answers or the cost of producing right ones.