Scaling Haskell in production: Lessons from the trenches

Most engineers assume that a two-million-line Haskell codebase is a recipe for disaster. They imagine a team of PhDs arguing over monad transformers while the business burns. At Mercury, we’ve spent years proving that assumption wrong. We’ve scaled to hundreds of thousands of businesses and billions in transaction volume using Haskell, and we’ve done it with a team where most people learned the language on the job.

If you’re building a system that touches money, you don’t have the luxury of choosing a language based on its aesthetic appeal. You choose it because it keeps the business alive. Here is how we actually manage scaling Haskell in production without losing our minds or our uptime.

Reliability is about adaptive capacity

Most teams approach reliability like a game of whack-a-mole. They write tests for every edge case and hope they’ve cataloged every possible failure mode. That’s necessary, but it’s not sufficient. If you only focus on preventing failure, you develop a massive blind spot: you become great at documenting how things break, but you have no idea why they actually work.

We view reliability as the presence of adaptive capacity. It’s about building systems that degrade gracefully when the database slows down or a service hits a wall. When you have a codebase this large, your architecture must make the right thing easy and the wrong thing difficult. If a new hire can’t look at a module and understand its intent, you haven’t built a system; you’ve built a ticking time bomb.

The type system as institutional memory

In a company growing at 2x per year, your team is effectively "organizationally ancient." Half your coworkers will always have less than a year of experience. This means institutional knowledge is constantly leaking out the door.

This is where Haskell shines, but not for the reasons you read about in textbooks. I don’t care about the purity of the code; I care about the compiler acting as a disciplined, unyielding documentation engine. When you encode business logic into the type system, you’re writing down institutional knowledge in a form that survives the departure of the original author.

Diagram showing how type systems act as documentation for engineering teams

Here is how we keep the system understandable:

API-driven constraints: We pack operational knowledge directly into our APIs so that misuse is caught at compile time, not at 4 AM during an on-call incident.
Boundary enforcement: We put dangerous machinery behind tight, well-defined boundaries to limit the blast radius of any single change.
Early design audits: We treat stability engineering as a partnership. We ask about idempotency and rollback strategies before a single line of code is written, not during a post-mortem.

Why does Haskell work at scale?

The secret isn't some magical compiler optimization. It’s that Haskell forces you to confront the reality of your system design early. If you don't have clear answers for how a service handles failure, the compiler won't let you hide behind vague abstractions.

If you’re struggling with a growing codebase, stop obsessing over "correctness" and start obsessing over "understandability." Can your team read the code? Does the system absorb variation? If you aren't asking these questions, you’re just waiting for the next incident.

Scaling Haskell in production requires shifting your mindset from being the "quality police" to being an enabler of resilient design. Try this approach on your next feature launch and share what you find in the comments.

The Practical Guide to Scaling Haskell in Production (No Fluff)

Scaling Haskell in production: Lessons from the trenches

Reliability is about adaptive capacity

The type system as institutional memory

Why does Haskell work at scale?

Written by Admin