First off, if you are not subscribed, please subscribe so you don’t miss a thing! It is free, and it is good!
In software projects we all work with data; this data might be small, big, easy to comprehend, or complex, but in any case, we have to find a way to store and extract that data in the best way possible.
Everything depends on the use case, a use case is a functional requirement of a system and describes what the system is supposed to do in a specific situation.
There are a lot of different types of databases, but there are 3 that dominate the market, those are Relational Databases, NoSQL Databases, and Time-series Databases.
Everything here is about data, what the data is, where is it coming from, what are you going to do with it, who needs access to it, and in what format, there are a lot of factors to consider.
So, let’s start.
Relational Databases
This is the most typical database out there. When you have keys in your data that you need to match with other data, say, account information from your customers, you have their IDs, another dataset with their orders, in the case that you sell stuff to them or an inventory system where you have all your devices, and then you need to join this data with details such as model, IP address, location, owner, etc. In this type of database, you have multiple tables and relationships between them, it is like a chain, and all the data there is constrained. The data in this type of DB is queried using a special language called SQL or Structured Query Language. Another key factor of this type of DB is the ACID (Atomicity, Consistency, Integrity, Durability), which means that they support “Transactions” and constraints, maintaining the integrity of the data when adding, removing, and updating the data. Data in this type of DB is commonly stored as columns and rows, just as you see data in Excel!
Fun Fact: A lot of people still call it SEQUEL instead of S-Q-L, and this is because in its origins it was called SEQUEL, as in Structured English Query Language, overtime SQL became the standard name.
SQL is an easy-to-read, intuitive language, with common words such as “select”, “insert”, “from” and lots of other keys, an easy example of SQL is:
select name from users
In this case, select tells the database that we want to fetch or find something, name is actually what we want to receive, and we are going to pull such data “from” the table called users, pretty easy right?
Use Case: Managing structured data for applications, such as inventory systems, where relations between data points are crucial.
Database: MySQL, PostgreSQL, SQLite, Oracle ($), etc.
NoSQL Databases
As its name implies, no SQL is used here, we are talking about a completely different animal.
Imagine that SQL is like a nice drawer, where you have folders for different things, they are nicely labeled and you can get stuff out of there quickly, it is neat! Well, then imagine you also have another drawer with a bag, where you throw letters, random notes, markers? (Document store) Also, you have key chains, where each key has a specific tag (key-value store), and maybe a little board full of photos showing who's friends with whom (Graph database).
In short, NoSQL gives you additional ways to store data instead of the traditional row/column way, it let’s you store documents, key-value pairs, hierarchical relationships, and topological structures, like … how you relate to your friends’ friends in a social network.
Document Use Case: Storing, retrieving, and managing document-oriented information, such as content management systems or blogging platforms.
Key-Value Use Case: Storing session information for web applications, caching, or configurations.
Graph Use Case: Representing and querying data in graph formats, is ideal for use cases like social networks where understanding the relationships between entities is crucial.
Fun Fact: There is a concept called the “Six Degrees of Separation”, which means that any two people on earth are, on average, about six acquaintance links apart from each other. So, that means that you know someone who knows someone... and by the time you've said that six times, you could potentially be connected to anyone else in the world! Pretty crazy.
Now, given the new social networks, the degree of separation is getting smaller, to even 3 or 4 acquaintance links apart, according to Facebook.
Database Examples:
Document: MongoDB, CouchDB.
Key-Value: Redis, Riak.
Graph: Neo4j, ArangoDB
Time-Series Databases
These are specially designed databases to store time-stamped data, you will probably use this type of database to track something. Over time, this data will be useful for trending purposes. This type of database is highly optimized for high-speed IO, so the ingestion and extraction happen extremely fast, and data collection can happen as fast as per second (as opposed to per minute, hour, etc.)
Use Case: Storing and querying sequences of data points indexed by time, like heart rate.
Use Case 2: Storing CPU, memory, and disk usage from a server every minute, or thousands of servers!
Database examples:
InfluxDB, TimescaleDB, RRD
Those are the traditional databases or most common ones, let’s move to special cases.
Big data databases
There are also different options for big data, which depend a lot on your data as well. Traditionally, your concern is storage and data processing, for that reason the most common big data solution is a warehousing system, but there are other options depending on your use case.
Warehousing systems:
Systems such as Amazon Redshift, or Google BigQuery allow you to aggregate data from different sources, and transform it in a consistent format. Also, its distributed nature will ease the query and processing.
Hive backed by HDFS, this is a distributed SQL-like system that will let you treat your HDFS data as if it were a database, but it is not a database in the traditional sense. Its distributed nature allows it to process massive amounts of data pretty quickly.
Time series databases
InfluxDB for example, is a prime example of a TSDB able to process big data (as long as your data is timestamped)
NoSQL
Databases such as MongoDB or Couchbase can process vast amounts of data or both, structured and non-structured data.
As you see, this is not a once-size-fits-all kind of thing, big data requires some special analysis to determine what route to take.
The next ones are really on the “niche” kind of databases, so I will provide only a brief description since their use is really specific.
Distributed Databases
Use Case: Databases that are spread across multiple physical locations, either for load balancing, fault tolerance, or geographic distribution.
Example: CockroachDB, Apache Ignite.
Multimodel Databases
Use Case: Databases that support multiple data models, such as document, key-value, relational, and graph, within a single, integrated backend. This is useful when your application needs varied data storage models.
Example: OrientDB, ArangoDB (also serves as a graph DB).
NewSQL Databases
Use Case: Databases that aim to provide the scalability of NoSQL databases with the transactional consistency and relational model of traditional RDBMS.
Example: VoltDB, CockroachDB.
Search Engines
Use Case: Optimized for search operations and often provide features like full-text search, distributed search, etc.
Example: Elasticsearch, Apache Solr.
Wide-Column Stores
Use Case: Suitable for querying large datasets with a vast amount of columns, often used in big data processing.
Example: Apache Cassandra, ScyllaDB.
Event Sourcing Databases
Use Case: Databases that store events rather than "state," allowing systems to reconstruct the state of an object by replaying events.
Example: Event Store.
There are a lot more databases not mentioned here, I am listing some of the key players as well as some of their key details, shoot me a message if you want me to elaborate more on any of those DBs!
Happy querying!
Cheers!