If you’ve ever studied Hamlet in high school, you might be familiar with his famous monologue;
“To SQL, or NoSQL? That is the query. Whether ’tis nobler in the mind to suffer the slings and arrows of outrageous tables, or to take arms against a sea of unstructured data, and, by using structured schema, define them?”
Ok, so we may have paraphrased Shakespeare’s classic just a little – but you get the idea.
But if you’re thinking of embracing a document data model, what are the advantages? And what are the 4 best NOSQL databases available?
Traditionally, the Structured Query Language or SQL have been the most popular and common type of databases. They rose to popularity in the 70’s, at a time when storage was extremely expensive – but then again, so were computers. Software engineers needed a way to normalize their databases to reduce data duplication and more efficiently use what little storage they had.
Eventually, technology outgrew the SQL database. A new type, dubbed NoSQL, was born. Now, this term has a double meaning, either “non-SQL” or “not only SQL” – subtly different. But either way, NoSQL databases store data in a format other than relational tables so both terms fall under the same umbrella.
The NoSQL database made an appearance as the cost of data storage per megabyte started to plummet. Technology was advancing, drives were increasing in capacity and also dropping in price. There was a shift – the primary cost of software development wasn’t storage anymore; it was the developers themselves. This shift filtered down into the way databases work, going from being focused on reducing data duplication to a better model to optimize developer productivity – hence why it’s used today.
SQL VS NOSQL:
Just so you get an idea, there are five major differences between these two database setups that we need to address first before really sinking our teeth into the details:
- SQL is relational and provides access to data points that are related to each other, while NoSQL is non-relational, meaning it’s more flexible and stores data in non-tabular form.
- SQL has a predefined schema which uses structure query language, where NoSQL uses dynamic schema for organizing their unstructured data.
- SQL is vertically scalable by adding more resources like a more powerful processor, or additional memory, but NoSQL is designed to be horizontally scalable by adding more machines to the server group.
- SQL is rather archaic and table based, while NoSQL can be document, key value, graph, or wide column orientated to store data.
- SQL is the better choice for multi-row transactions, but NoSQL is better for unstructured data such as documents or JavaScript Object Notation.
Why NoSQL:
NoSQL databases address a number of challenges that face modern companies and how both their employees, and their customers, access data.
More customers are going online, so databases have to support anywhere from thousands all the way up to millions of users concurrently. NoSQL does this with a smile, maintaining high performance while ensuring access 24 hours a day, 7 days a week.
As time goes on, the internet is connecting everything together in one giant web. NoSQL allows for continuous streams of real-time data to be broadcast from the servers. It also supports different data structures, while supporting hardware and software updates that generate different data sets.
We mentioned storage earlier, well big data is and will only get bigger. Horizontal scaling in the form of adding more machines to the pool is a relatively cheap way to ensure that as customer demand increases, the data stored and able to be accessed grows with it. The smaller infrastructure costs also translate into a faster time to market.
The genie is out of the bottle – everyone has a mobile device now, so a NoSQL database’s ability to synchronize mobile data with remote databases in the form of cloud storage is invaluable. Everything is on the cloud now, but the benefit of NoSQL is that it can support multiple mobile platforms and OS with just the single backend – super handy.
The Tale of The Tape:
So which NoSQL database reigns supreme? And how do our top 4 NoSQL choices measure up against each other, in terms of their various advantages and drawbacks?
Apache Cassandra:
Otherwise simply known as CassDB, Apache Cassandra was originally a Facebook only product before being made open source in the late 00’s. It has a proven track record as a NoSQL that is high performance, while at the same time being very fault tolerant.
Failed node replacements are generally not enough to shut down the system, as data is replicated across multiple nodes – sharing the workload. The database operates by making all nodes equal peers without any master/slave relationship.
Horizontal scaling is a breeze, as new machines can be added into the pool while running applications, meaning zero interruptions to the service.
The cons with going the CassDB route is that excessive requests and data reads can really slow down the transaction time per user, and poor latency does creep into the system. It’s a system built for fast writes, not fast reads. Due to the shared work between the nodes, data duplication can become a problem, one that is fixed by using a Java Virtual Machine within CassDB to deal with garbage collection.
Apache HBase:
If you need to read and write huge chunks of data, then Apache HBase is the logical choice. HBase is also open source and tailor-made to manage billions of rows and columns by using sorting via commodity hardware clusters. It’s based on The Big Table – a distributed storage system created specifically for structured data.
Scalability, automatic sharding of tables, fairly powerful and speedy read and write capability, and built-in support guarding against server failure are some of the big pluses with going with a system like this. The client interface is extremely simple and user friendly, and can be integrated with Hive to operate similar to an SQL – an option for database administrators who are more familiar with managing SQL type queries.
The negatives are no transaction support, no handling of JOINs to combine columns or more tables by using the common values between them, and no built-in security authentication. An annoying trait is that HBase is indexed and sorted only according to key, resulting in the database itself being extremely memory hungry, and problems with troubleshooting any issues that arise.
MongoDB:
Designed to be used in the cloud, MongoDB stores data in JavaScript Object Notation documents, making it far superior to traditional row and column type databases.
It supports a large number of search methods including geographical, text and graph, and also provides almost unrivalled security for the client by using firewalls, encryption and secure sockets layer verification.
Best of all, MongoDB can create visualizations to share data and connect to other SQL databases using the MySQL protocol.
It’s completely open source and free to use, and is so user-friendly that the system doesn’t really need a database administrator – making it a great choice for businesses experiencing
rapid growth or have a lot of unstructured data with no clear schema definitions. Think mobile app development, real-time analytics, and online content management systems.
The downside is that over time, MongoDB will accumulate a larger data size physically on the drives. Although it’s extremely powerful and quick acting, it’s comparatively slow when weighed up against other NoSQLs.
Neo4j:
A little different from the others, Neo4j is a graph-based database that excels not only in handling data itself – but the relationships between groups of data. As soon as anything is stored, connections are made so that the next time it’s accessed, the task is completed faster than arguably any other NoSQL database in existence.
Each piece of data is catalogued with a cypher that provides direct pointers to every other data record it’s connected with. It’s this process that is the powerhouse of the database, making queries much faster and simpler to write than any other table-based system – no need to worry about JOINs.
To boot, Neo4j also provides official drivers for common programming languages like Java, .Net, JavaScript, Python and Go.
Neo4j hits a snag when it comes to horizontal scaling. As you can’t shard, you have to have your whole dataset in just one server. The only way to scale is vertically by increasing processor power or adding memory, similar to monolithic architecture.
And The Winner Is…
There really is no clear “winner” here. One NoSQL isn’t necessarily better than another NoSQL in all areas, and each database performs best in the field of expertise it was designed to operate in.
Which of the 4 you choose – Apache Cassandra, Apache HBase, MongoDb or Neo4j – will ultimately come down to what your requirements are.
Career Stats
Are you looking for a career in big data management? NoSQL databases have matured and grown into such a wide array of frameworks that NoSQL is a feather that every tech professional has to have in their cap in order to get a look in on the more lucrative job roles and salaries.
The average salary of a professional skilled in working with NoSQL databases is $117,982. Now, compare that to a Software Developer at $89,671.
The Healthcare Services sector pays the highest average salary out of any industry, a very impressive $135,948. Information Technology Consulting is where your NoSQL experience is valued the least, with an average salary of $85,929.
Next Steps
We hope this blog helped you learn more about the 4 best nosql databases!
If you enjoyed this blog you may also enjoy our other blogs like, Golang vs Python or Erlang and 5 Things You Need to Know About It.
Kofi Group is proud to be a source of knowledge and insight into the startup software engineering world and offers a multitude of resources to help you learn more, improve your career, and help startups hire the best talent. If you are interested in learning more about what we do and how we can help you then get in touch or watch our Youtube videos for additional information.