You're Not Adopting GenAI. You're Coping With It.

Same input. Same output. Every time.

For decades, this pattern has been our mantra. We used words like idempotent and deterministic to describe it. This behavior isn’t incidental; it sits at the core of what we consider good software.

The expectation isn’t naïve. It’s discipline. It’s how reliable systems are built, tested, scaled, and trusted. Determinism is what lets teams debug confidently, automate safely, and hold engineering accountable. When something changes unexpectedly, you investigate. You don’t shrug. You don’t accept “close enough.” You fix it.

Most of us have spent our entire careers building on this foundation. Query a database twice with the same parameters and you expect identical results. Call an API with the same payload and you expect the same response. The deterministic behavior isn’t even a feature, it’s the contract.

Regression suites exist because the contract can be proven. Same input. Same output. This is engineering discipline.

And it’s non-negotiable. Systems violating the contract don’t ship. Engineers producing flaky outputs get PRs rejected. Leaders tolerating inconsistent behavior are on the receiving end of incident tickets.

This is why modern software works so well.

Visual representation of the collision between deterministic expectations and probabilistic GenAI systems

So when support reports the RAG system is producing inconsistent guidance for the same customer question, the reaction is predictable.

Same customer query. Varying outputs: structure reorganized, supporting details changed, emphasis shifted. Close enough to recognize the topic. Off enough to report as buggy.

This is how disciplined organizations are supposed to react. Unpredictable outputs trigger investigation. Something feels wrong. The system is behaving erratically. It gets escalated. Engineering is asked to take a look.

The instinct came from decades of deterministic software. Unreliable behavior means something broke. Reliable systems repeat their answers. This reflex has kept production systems stable for years.

And yet nothing is actually broken here.

The RAG system is behaving under a different contract. One producing plausible variation instead of identical repetition. Same question, defensible answers with shifting emphasis and framing. Probabilistic systems don’t honor deterministic contracts.

The collision starts when outputs are evaluated using deterministic standards.

The Collision

When GenAI tooling is rolled out, the familiarity comes along for the ride. New licenses are procured, usage tracked, and early wins celebrated.

The paradigm shift was unrecognized, though. So support, sales, and operations do what they’ve always done.

Support escalates: customers are getting inconsistent answers to the same question. Yesterday’s explanation doesn’t match today’s. The knowledge base used to be consistent. Now it depends on what, exactly? Timing? Phrasing? The phase of the moon?

Sales feels it next. The proposal generator produces different justifications for the same deal structure. A client asks a clarifying question and gets a slightly different explanation than the one they heard yesterday. Now credibility is at risk.

Operations flags it too. The document summarizer highlights different takeaways from the same quarterly report. Which version goes into the board deck? Do we run it three times and pick one? What does “best” even mean when the system won’t commit?

These are not technical complaints. They are operational breakdowns.

The business runs on repeatable processes. GenAI introduces variation by design. Engineering applies deterministic discipline. The new tooling refuses deterministic contracts.

Repeatable processes don’t absorb plausible variation. They fracture. And when every team reports the same friction, the question leadership asks is always the same.

“Why can’t this be made consistent?”

The Coping Mechanisms

When the paradigm doesn’t shift, organizations cope. The patterns are predictable.

Engineering absorbs unspoken expectations first. Support defines success one way. Sales defines it differently. Operations has a third interpretation. None of these definitions are written down. None are reconciled. Engineering builds to expectations that shift by department, by stakeholder, by mood. When outputs don’t match what someone imagined, engineering absorbs the blame. The variation isn’t the bug. The lack of shared understanding is.

But engineering can’t absorb forever. So the coping spreads.

Prompts get rewritten to chase consistency. Three sentences become three paragraphs. Instructions are added, reordered, expanded to “lock in” behavior. Each revision feels like progress because the output changes. Whether it improves is a different question. The prompt grows longer, more fragile, harder to reason about. Confidence erodes quietly.

Token budgets inflate next. “We have unlimited tokens” becomes justification for the cycle. Outputs vary, so teams respond with more context, more examples, more retries. The assumption is simple: more tokens should mean better answers. Cost rises. Clarity does not.

And evaluation stays stuck in the old paradigm. Multiple outputs get compared side by side. Differences in wording, tone, emphasis become the focus. Conversations drift toward which answer is “correct” instead of whether any answer is useful. Variation becomes the problem to solve.

The system never passes a test it was never designed to take.

What the Paradigm Shift Looks Like

Here’s the shift, stated plainly.

Traditional software rewards determinism. Same input. Same output. Deviations are defects to be eliminated.

GenAI does not work that way. It produces plausible outputs within a range. Variation is expected. Consistency comes from framing and evaluation, not from forcing identical responses. When the mental model changes, the behavior around the system changes with it.

When You Think This…The Shift Is…
“It gave different answers; something’s broken”Nothing broke. Expect a range of defensible answers, not one canonical response.
“I need to rewrite this prompt until it gives the same results”The prompt isn’t the control surface. Define what success looks like separately; tune for usefulness, not sameness.
“We need more tokens to make the outputs more reliable”More tokens won’t force consistency. Curate context deliberately; spend on precision, not volume.
“Which of these three outputs is the correct one?”Maybe all three. Evaluate against requirements, not against each other.
“Engineering needs to fix why this keeps changing”Engineering builds to outcomes, not expectations. Align on acceptable variation first; they’ll deliver to that.

Prompts Stop Being the Control Surface

When the mental model is correct, prompts stabilize early. They stop growing with every unexpected output. Instructions are no longer treated as enforcement mechanisms, but as framing tools. Success is defined outside the prompt, so variation in phrasing no longer triggers rewrites. The prompt stops being the thing teams fight over.

Tokens Become a Design Choice, Not a Crutch

With the right model, adding tokens is intentional. Context is curated, not inflated. Retries are purposeful, not reflexive. Teams can explain why additional context is necessary and when it is not. Token usage becomes part of system design instead of a substitute for clarity.

Evaluation Shifts from Sameness to Usefulness

Outputs are no longer judged against each other line by line. Evaluation moves to outcomes. Multiple answers can be acceptable if they satisfy the same objective. Differences in tone or emphasis are tolerated as long as the result works. The question stops being “Which one is correct?” and becomes “Did this do what we needed?”

Expectations Get Defined, Not Assumed

Success criteria are established early and evolve as understanding improves. What “good output” means gets articulated—including acceptable variation. Engineering builds to defined outcomes instead of inferred intent. When outputs vary within agreed parameters, that’s expected behavior, not an escalation. Alignment happens through iteration, not comprehensive upfront specification.

None of this requires new tooling. It requires a different contract.

GenAI does not replace discipline. It relocates it. The discipline moves from enforcing sameness to defining bounds, from controlling outputs to agreeing on outcomes. Once that happens, the coping mechanisms disappear on their own.

Direct Message to the Executive

If you mandated GenAI adoption, you need to hold yourself to account.

You didn’t just approve a tool. You changed the contract your organization has operated under for decades. And you did it without saying so out loud.

You rolled GenAI out the way you’ve rolled out every other system. Licenses procured. Usage tracked. Early wins celebrated. On paper, it looked disciplined.

And that’s the problem.

GenAI is not deterministic software with new features. It’s a probabilistic system operating under a different contract. You deployed probabilistic tooling using deterministic playbooks. No explicit acknowledgment of the shift. No shared definition of what “good output” means when variation is expected. No guidance on how to evaluate systems that don’t guarantee repetition.

That frustration you’re seeing is not a tooling failure. It’s a leadership gap.

Here’s the paradox: you mandated adoption. Engineering deployed it. What about the paradigm shift?

You probably assigned someone. And they delivered your mandate: adoption metrics, use cases, governance. The visible work.

The shift never made the mandate. Who’s left holding the bag?

When you deploy probabilistic systems under deterministic contracts, every downstream team is set up to fail. Support escalates behavior that looks broken. Sales loses confidence in outputs that won’t repeat verbatim. Operations stalls trying to pick the “right” answer. Engineering absorbs the blame for a mismatch it didn’t create.

None of this is surprising. It’s the predictable outcome of introducing a new paradigm without the shift ever making the mandate.

You don’t fix this by buying better models. You don’t fix it by adding guardrails. You don’t fix it by asking engineering to stabilize behavior that is working as designed.

You fix it by doing the part only leadership can do.

Own the shift. Explicitly. Publicly. Repeatedly.

Be clear about where consistency is required and where variation is acceptable. Be clear about how outputs should be evaluated. And be clear that engineering is not responsible for enforcing guarantees that were never defined.

Until that happens, your organization will keep coping. Prompts will grow. Tokens will inflate. Reviews will stall. Confidence will erode quietly. And you’ll keep hearing the same question framed a dozen different ways.

“Why can’t this be made consistent?”

Because consistency was never the promise. You just never told anyone.

If you want GenAI to work, stop treating it like software you already understand. Change the contract first. Everything else follows.

// Pragmatic GenAI. May contain traces of paradigm shifts