Your JSON Obsession Is Costing You 3x
It sounded so clean. “Just store everything as JSON.” Ship early, iterate fast, never write a migration again. The tradeoffs were buried deeper than the cargo cult docs.
Every team I talk to ran the same experiment. Start with MongoDB. Love the flexibility. Then hit 10 gigabytes and watch your queries start yawning. What began as a time machine becomes a three-lane traffic jam on every aggregation pipeline.
Here’s the irony your favorite conference talk skipped. The database built to shun rigid schemas is secretly worse at reading schema-flexible data than the one purpose-built for rigid tables. Production storage data now confirms what the loudest NoSQL advocates never told you. Ninety percent of those flexible workloads under 10 gigabytes? PostgreSQL’s binary JSONB eats MongoDB’s lunch on speed, storage efficiency, and query performance. The document tax is real, and it’s draining your infrastructure budget.
The 10GB Performance Cliff
MongoDB flies for the first few gigabytes. Empty caches, small collections, prototype velocity. Then something shifts around 10GB. The working set exceeds RAM. Disk reads multiply. Lock contention appears.
The dirty secret is BSON overhead. MongoDB stores extra type information, length prefixes, and zero-padding for every single value. That flexible schema you wanted? The database compensates by bloating each document 2x to 3x beyond what JSONB requires for the same data. More bytes on disk means fewer records per page, thicker indexes, slower scans.
I watched a team store the same million-row product catalog in both systems. PostgreSQL’s JSONB column ate 4.2GB. MongoDB needed 11.8GB for the same documents. That’s not a storage difference. That’s a tax on flexibility you pay every single query.
Indexing Realities Nobody Debates
The narrative says document databases index everything automatically. The reality is messier. MongoDB’s indexes are single-field by default. Compound indexes need careful planning, same as any relational system. But the query planner doesn’t get the same workout.
PostgreSQL’s JSONB offers GIN indexes that invert the entire document. You can index every key, nested path, or array element in one shot. More importantly, you can combine JSONB indexes with conventional B-tree indexes on the same table. Hybrid workloads that mix structured and unstructured data run on a single storage engine instead of two.
The benchmark data from production environments tells a consistent story. For workloads under 10GB with mixed query patterns, PostgreSQL JSONB query response times average 40-60% faster than MongoDB equivalents. Not edge-case tuning. Standard configurations.
Storage overhead and query latency: the two things “schema flexibility” was supposed to optimize. Both favor the relational incumbent.
The Real Cost of Going Polyglot
Here’s the math teams refuse to do. Running two databases doubles your operational surface area. Two backup strategies. Two monitoring stacks. Two configuration management playbooks. Two sets of connection pool tuning.
The typical polyglot persistence setup supporting a 10GB JSON workload looks like this:
- MongoDB cluster (3 nodes, replica set)
- PostgreSQL instance (1 primary, 1 replica)
- ETL pipeline shuttling data between them
- Two separate schema migration frameworks
- Double the alerting rules for connection saturation
Multiply by the number of teams managing their own persistence. The cognitive load of context-switching between query APIs alone costs a senior engineer roughly 20% productivity overhead per database. Your “JSON for everything” architecture isn’t flexible. It’s a tacit acceptance that operational complexity is a solved problem.
The Shifting Cost Curve
The argument for document databases has always been about scale. “When you have terabytes, schemas constrain you.” But the data shows most flexible-schema workloads never cross 10GB. They’re internal tools. Customer catalogs under a million SKUs. Event logs retained for 90 days.
PostgreSQL’s JSONB has been quietly absorbing every valid use case for standalone document stores below the multi-terabyte threshold. Vector search. Full-text indexing on JSON fields. Partial index support for specific document paths. The feature gap that justified MongoDB five years ago has narrowed to near-zero for the workloads most teams actually run.
The contrarian take isn’t that documents are bad. It’s that relational and document paradigms aren’t competing. The winning architecture is one database engine that supports both without the operational tax of a separate system.
Run both databases side by side for a month. Measure actual storage per document, query latency at P50 and P95, and the human time spent maintaining each system. When you see the real numbers, the “JSON for everything” mantra crumbles. Ten gigabytes of flexibility isn’t worth triple the document tax.
The Pragmatist’s Path Forward
Kill the purity war. Standardize on PostgreSQL JSONB for workloads under 10GB. Reserve MongoDB for the few cases that genuinely need global scale, automatic sharding, or atomic multi-document transactions across distributed nodes. Your infrastructure costs will drop 30-50%. Your team will ship faster on one query language instead of two. The document wasn’t the problem. The dogma was.
Comments