AntidisestablishmentSQLism: an introduction to NoSQL

Posted by on March 30, 2010
 

Talking about the NoSQL movement is like talking about a “NoC++” movement: people not using C++ have very little else in common with each other.

For instance, some people say that NoSQL is obviously all about ACID (atomic, consistent, isolated, durable) vs. BASE (basically available, soft-state, easy). Other people say it’s all about scalability, whereas others say it’s all about distributing data, but not necessarily horizontal scaling. Some people say it’s not about scale at all, but to get away from modelling data as rows in a table.

It’s a recent “movement,” but databases that have been around for decades are claiming to be NoSQL along with the ones hacked together in the last few months. Databases that are no more complex than Memcached are grouped with those that could power every aspect of a complex website on their own. There are databases ready for production deployment and others that are still in beta, alpha, or haven’t even been released.

The people themselves also vary widely: some of the developers are brilliant geeks who know scaling inside and out. Others are DBA-types who have been dealing with MySQL issues for years. Others are conceited jerks who think that, because they read the Dymano/Bigtable/PNUTS paper, they’re the only ones who understand the issues.

As I have read the Dynamo/Bigtable/PNUTS paper, here are the main issues:

Horizontal scaling

This is a biggie but, as people who like relational databases always point out, you can scale relational databases. It’s just very difficult and sacrifices a lot of the features that make them useful to start out with. Some NoSQL databases just scale much more easily and naturally.

Multi-master

Again, you can run MySQL as multi-master, but it’s a pain. Many NoSQL databases list multi-master as a feature, but a lot of programmers who think they want multi-master probably won’t like the reality of it. A quick quiz: does your heart sing at the thought of programming conflict resolution logic for every possible stale data scenerio? Then multi-master may be for you. (I’m not a big fan of multi-master, someone else may want to defend this one.)

Data representation

Rows can model a lot, but unless you’re programming accounting apps, they’re unlikely to be a natural fit for your data. More often than not, data is much more interesting and complex than a row (such as a graph of friends, a tree of links, or overlapping timelines of events). Sometimes, data is very simple (e.g., a cookie used to look up a session). Sometimes it changes over time. Many NoSQL databases give you a more flexibile way of modelling data.

Speed

Relational databases are pretty heavyweight with all sorts of data safety guarantees, which is unnecessary overhead for a lot of applications. Some NoSQL databases can store things as fast as Memcached (although not when they flush to disk… they are limited by physical laws, despite what their developers might claim). If you have something like page statistics or logging, using a relational database might just be slowing you down.

At this point, NoSQL databases are huddled together against the overwhelming presence of relational databases. However, NoSQL is an unnatural and, in many cases, uneasy alliance.

Photo credit: Great Protest Signs by didbygraham


About the author—Marco is the keeper of keys and Chief Garbage Collector at Blue Parabola, php|architect's parent company. He can be found on Twitter as @mtabini.