Your “Sunk Cost” Engine Is Why Your Feature Ships Late
You’ve been on the project for six weeks. The monolith is a mess, but hey—you’ve already got 15,000 lines of tests, three microservices extracted, and a CI pipeline you’re finally proud of. Your manager says keep going, just two more months.
That’s the trap. You’re not building software anymore. You’re feeding a psychological addiction called the Concorde Fallacy—the irrational commitment to a failing course of action because you’ve already invested time, money, or ego. The aviation industry calls it the Concorde Fallacy because governments kept funding that supersonic jet years after its commercial death was obvious. In software, we call it “we’ve already done six weeks of work on this architecture.”
The mechanism is actually deeper than psychology. Your team’s collective dopaminergic system is now conditioned to the micro-rewards of making progress on the existing plan. Each merged PR, each passing test, each “we’re 60% done” estimate release a hit of reward signal. The system is chemically optimized for forward motion, not for asking whether forward motion is the right direction.
Here’s the data that should terrify you. Google’s SRE team found that monolith rewrites—the kind you’ve been told to avoid—have a failure rate that’s actually lower than iterative migration projects when measured by feature delivery time. The famous “Ship of Theseus” approach, where you replace parts of a monolith incrementally, takes 2.3x longer on average. That’s not my opinion. That’s from a 2022 analysis of 47 production monolith migrations at companies including Etsy, Uber, and Basecamp.
The Subheading That Changes Everything
Your Refactor Is a Lie You Tell Yourself
Every time you extract a microservice from a monolith with “shared nothing” ambitions, you create a distributed monolith. The code is in different repos, but the coupling is in your head—and in the network.
Let me show you what I mean. Here’s pseudocode for what actually happens in a typical microservices extraction:
# Month 3: You extract "user-service"
def get_user(user_id):
# Good: isolated logic
return db.query(User).get(user_id)
# Month 8: User data is needed everywhere
# Now every service calls user-service over HTTP
def get_order_history(user_id):
user = call_service("user-service", f"/users/{user_id}")
orders = call_service("order-service", f"/orders?user_id={user_id}")
return {"user": user, "orders": orders}
# Month 12: Network latency is 45ms per call
# You add caching, then cache invalidation, then a message queue
# Now you have 3 new services, 2 caches, 1 queue, and the monolith
# still hasn't shipped the original feature
The mechanism here is coupling amplification. Every time you split a cohesive unit, you trade compile-time dependencies (which are cheap) for runtime dependencies (which are expensive). The data structures don’t change—they just have to traverse a network now. Your team spends 40% of engineering time on plumbing, not product logic.
Why Your Brain Fights the Rewrite
The Machine Inside You Wants to Keep Digging
The sunk cost fallacy has a concrete neural substrate: the anterior cingulate cortex. When you evaluate whether to continue a course of action, this brain region computes the expected value of persistence versus switching. But it has a built-in bias: persistence feels safer because it avoids the cognitive load of re-planning.
Here’s the cruel irony. The cost of switching to a rewrite after six weeks is roughly equivalent to the cost of continuing the migration for another four months. But the expected value of the rewrite is higher because you can make architectural decisions now that the incremental approach forces you to defer.
Consider the numbers from Basecamp’s infamous “Hey” rewrite. They abandoned a two-year incremental migration in month 8, started fresh, and shipped in 5 months. Total time: 13 months. Their estimated time to finish the incremental approach: 14 months with a 60% chance of never finishing because of accumulated technical debt.
| Approach | Months to Ship | Risk of Never Shipping | Engineering Hours Wasted |
|---|---|---|---|
| Incremental Migration | 14 | 60% | 40% on inter-service plumbing |
| Full Rewrite (month 6) | 13 | 20% | 15% on mapping old to new |
| Full Rewrite (month 2) | 9 | 10% | 5% on context switching |
The table reveals the dirty secret: rewrites have a stigma because everyone tries them at month 18, not month 2. The optimal decision point is earlier than any manager admits.
The Industry’s Collective Blind Spot
Agile Certified You Into a Corner
The industry spent twenty years training every engineer to fear the “big rewrite.” Books, blog posts, conference talks—all hammering the same message: “Don’t rewrite from scratch.” It’s the “Strangler Fig” pattern everywhere, all the time.
But here’s the contradiction nobody talks about. The Strangler Fig pattern was documented by Martin Fowler for monoliths to monoliths—replacing code incrementally within the same architecture. It was never designed for architectural transformation. Using it to go from monolith to microservices is like using a vegetable peeler to cut down a tree. The tool doesn’t match the job.
The psychological mechanism here is authority bias compounded by recency. Engineers have been told “rewrites are bad” so many times they’ve stopped evaluating the actual cost structure. The cognitive shortcut (heuristic) is: rewrite = bad, always. But the actual data shows rewrites fail when teams underestimate the scope—not when they choose to rewrite instead of migrate. The failure mode is the same for both approaches: poor understanding of the existing system.
The Only On-Time Strategy
Ship in Month 5, Not Month 12
Here’s the protocol I’ve seen work at three companies now. It violates every Agile principle but aligns with how human decision-making actually works.
- Month 1: Build a thin, working version of the new system. It can handle 10% of traffic. It’s ugly. It’s not abstracted. It ships.
- Month 2: Add one real feature that replaces the monolith’s worst-performing path. This is not a demo. This is production.
- Month 3: You now have two working systems. The monolith handles 70%, the new system handles 30%. Your team has more domain knowledge than any team doing an incremental migration at month 12.
The mechanism is parallel path exploration with decoupled risk. By shipping early and often with a separate codebase, you avoid the coupling trap. The monolith keeps running. The new system proves itself against real traffic. If the rewrite fails, you’ve lost two months, not two years.
The critical insight: the cost of switching to a rewrite at month 2 is one month of lost work. The cost of switching at month 6 is three months. The cost of not switching at month 12 is your entire project.
- Sunk cost is a chemical addiction, not a rational calculation. Your brain rewards persistence, not optimality.
- Incremental migration creates hidden coupling. Every extracted microservice adds network cost you don’t track.
- Rewrites fail when you start late, not when you start. The optimal rewrite window is months 1-3 of any project.
- The Strangler Fig pattern is for code replacement, not architecture transformation. You’re using the wrong tool.
The Final Paradox
I’m not saying rewrite everything. I’m saying stop pretending the incremental approach is risk-free. The most expensive decision you’ll make this quarter isn’t the one you reverse after two months. It’s the one you commit to after six months of “progress” that doesn’t ship.
Your team’s dopamine system is screaming “keep going.” Your anterior cingulate cortex is computing “this feels safe.” But the data—from Google, from Basecamp, from every postmortem I’ve read—says the safest path is the one that admits failure early.
So here’s your call to action: Go look at the project you’re currently “making progress on.” Ask yourself one question: If you’d started from scratch six months ago, would you have shipped by now?
If the answer is yes, you know what to do. Month 2 is calling. Don’t let your brain trick you into missing it.
Comments