Over the past year, you can’t open an RSS reader in the tech world without encountering an article about NoSQL databases. MongoDB, CouchDB, Cassandra, Redis — each promises something different, but they all share one idea: the relational model is not the only way. As a team that has spent the last decade working with Oracle and PostgreSQL, we decided to investigate whether NoSQL is hype or a genuine paradigm shift.
Why NoSQL databases were created¶
The term NoSQL first appeared in 2009 at a conference in San Francisco, but the idea of non-relational storage is much older. The main driver was simple — web applications like Facebook, Google and Amazon had to process such volumes of data and such numbers of requests that traditional relational databases simply couldn’t cope. Oracle can be ever so powerful, but when you need horizontal scaling across hundreds of servers, the relational model with its JOINs, transactions and normalization becomes a bottleneck.
Google published the Bigtable paper in 2006, Amazon introduced Dynamo in 2007 and these works inspired an entire generation of open-source databases. Cassandra was born at Facebook, HBase implements the Bigtable model on Hadoop, MongoDB came with a document model that is intuitive for web developers. Each of these databases makes a trade-off — it exchanges something from the ACID properties for scalability, flexibility or performance.
MongoDB’s document model¶
MongoDB is a document database, meaning it stores data as JSON-like documents (the internal format is BSON — Binary JSON). Each document can have a different structure; there is no need to define a schema in advance. This is a fundamental difference from relational databases, where you must create a table with fixed columns before you can insert data into it.
For web developers this model is natural. When your application works with JSON objects (and in 2011 almost every web application does), you can store them directly in the database without mapping to a relational schema. No ORM, no impedance mismatch. A document in MongoDB can contain nested objects and arrays, enabling you to model complex data structures in a single document instead of multiple joined tables.
When MongoDB makes sense¶
MongoDB excels in situations where the schema is not known in advance or changes frequently. A typical example is catalogue systems — each product category can have different attributes. In a relational database you’d be dealing with the EAV pattern (Entity-Attribute-Value) or having dozens of nullable columns. In MongoDB, each product simply has exactly the attributes it needs.
Another great use case is logging and analytics. Logs have variable structure and the volume grows exponentially. MongoDB handles high write throughput and with capped collections offers automatic rotation of old data. For aggregation over logs we use MapReduce — not as convenient as SQL GROUP BY, but more efficient for large data volumes.
Content management systems are another natural fit. Articles, pages, comments — each content type can have a different structure and nested components. MongoDB allows storing an entire page as a single document including metadata, tags and comments. Reading it is then a single query instead of a complex JOIN across five tables.
When MongoDB does NOT make sense¶
And now the important part — when you should NOT use NoSQL. If your data has strong relational ties and you need consistent transactions across multiple entities, a relational database is still the better choice. A banking system where a money transfer must be atomic (deduct from one account and add to another in a single transaction) — you can’t do that reliably with MongoDB. MongoDB does not have multi-document transactions.
Reporting and ad-hoc queries are another weakness. SQL is an incredibly expressive language for querying data. The MongoDB query language is more limited — complex aggregations require MapReduce, which is slow and difficult to debug. If your analysts need to write new queries over data daily, Oracle with SQL Developer will still be more productive than the MongoDB shell.
Also beware of data duplication. In the relational model, a customer’s address is in one place and all orders reference it. In MongoDB you might embed the address in each order — which is fast to read, but when the customer changes their address you need to update all documents. This is the fundamental trade-off of the document model.
Scaling — sharding and replica sets¶
One of the main advantages of MongoDB is native support for horizontal scaling. Sharding distributes data across multiple servers according to a shard key — for example, by customer ID. Each shard contains a subset of the data and the MongoDB router (mongos) automatically routes queries to the correct shard. Adding a new shard is relatively straightforward and MongoDB automatically rebalances the data.
Replica sets provide high availability. Each shard has a primary node and one or more secondary nodes that replicate data asynchronously. If the primary node fails, a secondary node is automatically elected as the new primary. This is significantly simpler than setting up Oracle RAC or PostgreSQL streaming replication.
But be warned — asynchronous replication means that during a failover you can lose data that has not yet been replicated. MongoDB offers write concern settings where you can require acknowledgment from a majority of nodes, but this reduces performance. It’s always a trade-off between consistency and performance.
Our experiences from a pilot project¶
We decided to try MongoDB on an internal project — a system for managing the configuration of our servers. Each server has a different set of services, various parameters and a change history. In the relational model this would have been five tables with many nullable columns. In MongoDB each server is a single document with exactly the attributes it needs.
The first impression was great — development went fast, we didn’t have to deal with schema migrations and queries were intuitive. Problems came when we needed to search for servers by a combination of attributes — MongoDB requires indexes on every field you want to search efficiently, and compound indexes have their limitations. The second problem was the absence of JOINs — when we wanted to display servers with information about their datacenter, we had to make two queries and join the data in the application.
On the other hand, adding a new attribute to the server configuration was trivial — no ALTER TABLE, no migration, we simply started storing the new attribute. For a system that is evolving quickly, this flexibility is invaluable.
Comparing NoSQL databases¶
MongoDB is not the only NoSQL database and each has its niche. CouchDB is also a document database, but uses an HTTP API and has built-in conflict resolution for offline-first applications. Cassandra is a columnar database optimized for extremely high write throughput — ideal for logging and IoT data. Redis is an in-memory key-value store great for cache and session management. HBase runs on Hadoop and is designed for analytical workloads over petabytes of data.
Choosing the right database depends on your use case. There is no universal answer. Often the best solution is polyglot persistence — using multiple databases in a single application, each for what it does best. Relational databases for transactions, MongoDB for flexible documents, Redis for cache, Elasticsearch for full-text search.
The CAP theorem and reality¶
When we talk about NoSQL, we must mention the CAP theorem. Eric Brewer formulated the hypothesis that a distributed system can satisfy at most two of three properties: Consistency, Availability and Partition tolerance. In practice, during a network problem you must choose — either return stale data (AP system) or be unavailable until the network recovers (CP system).
MongoDB is a CP system — when the primary node is unavailable, some data is temporarily inaccessible until a new primary is elected. Cassandra is an AP system — it always responds but may return stale data. Traditional relational databases on a single server simply ignore partition tolerance because they have no distribution.
Conclusion¶
NoSQL databases are not a replacement for relational databases — they are a complement. MongoDB is a great choice for flexible schemas, high write throughput and horizontal scaling. But for transactions, complex queries and strongly relational data we stay with PostgreSQL and Oracle. The future lies in polyglot persistence — the right database for the right problem.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us