Learning Paths
Last Updated: May 13, 2026 at 12:00
Modern Database Types Explained: SQL, NoSQL, Search, Time-Series, Vector & More
How to Choose the Right Database by Understanding the Trade-offs Every System Is Designed Around
Every database is built to make one kind of access fast while making others more expensive. This guide explains the major database families — SQL, NoSQL, search, time-series, vector, and more — through a simple mental model based on shape, scale, access patterns, and consistency. Instead of memorising features, you learn how to reason about why each system exists and where it naturally breaks. Once you see these trade-offs clearly, database selection becomes a structured decision rather than guesswork.

A Single Meta Insight Before You Start
Every database is a system that makes one type of access extremely fast by making others expensive. Understanding which access pattern a database optimises for tells you more than any feature list ever could.
Four Questions That Narrow the Field
Every database type in this guide is an answer to a specific combination of four questions. Get these right and the choice almost makes itself.
Shape. What does your data look like? Rows and columns? Nested JSON documents? A web of interconnected things? A stream of measurements over time?
Scale. Will your data fit on one machine, or do you need to spread it across many? This covers storage, read throughput, and write throughput together.
Access patterns. How will you read the data? Single-record lookup? Google-like search over messy text? Aggregating millions of rows into a single number? Traversing friend-of-friend chains for fraud ring detection? This question is the strongest differentiator of the four — different access patterns demand fundamentally different storage layouts.
Consistency. If two users read the same record at almost the same moment, do both need to see the most recent version? Or is data that is a few milliseconds stale acceptable?
Keep these questions in mind as you read. They will do more to guide your choice than any feature comparison.
Relational Databases (SQL)
Use when correctness matters more than raw scale — payments, inventory, banking, bookings, or anything involving critical state.
Relational databases such as PostgreSQL, MySQL, and SQL Server store data in structured tables made of rows and columns. Tables relate to each other through shared identifiers: an orders table links to a customers table through a customer_id, and an order_items table links to orders through an order_id.
The tables themselves are not the point. The point is what the database enforces on top of them — and there are three layers of enforcement working together.
The first is schema enforcement. The database defines exactly what shape data must take, and rejects anything that violates it. You cannot insert an order with a negative quantity if you constrain against it, or a string where a date is expected. The structure is guaranteed at the storage level, not left to application code.
The second is referential integrity. Foreign key constraints ensure relationships between tables stay valid at all times. If you try to delete a customer who still has open orders, the database can block it or cascade the deletion automatically. Orphaned records — orders pointing at customers that no longer exist — cannot occur silently.
The third is ACID transactions. When multiple operations must succeed or fail together, the database treats them as a single atomic unit. A payment that debits one account and credits another either completes fully or rolls back entirely — no partial state. Isolation ensures that two simultaneous transactions cannot corrupt each other's results, even when they touch the same records. Durability ensures that once a transaction is confirmed, it survives a crash a millisecond later.
Together these three layers are what engineers mean by correctness: the database enforces rules your application code never has to.
Data shape. Rigid schemas with predefined columns and relationships. Every row in a table has the same set of columns, and the database enforces that references between tables stay valid.
Scale. Relational databases are naturally strongest when deployed on a single machine. The reason is structural: the guarantees that make them valuable — atomic transactions, referential integrity, isolation between concurrent writes — all require coordination, and coordination gets expensive when data is spread across many machines. Every distributed transaction must confirm that all nodes agree before committing, which turns network latency and node failures into threats to correctness, not just performance. This is a fundamental tension, not a tooling problem. Modern distributed SQL systems such as CockroachDB and Spanner do solve it, but at the cost of added latency and operational complexity.
Access patterns. Complex queries, joins, aggregations, reporting, and transactional reads and writes across related datasets.
Consistency. Strong ACID transaction guarantees, where correctness is prioritised over serving potentially stale data.
When to use. Relational databases are ideal when the schema is relatively stable, relationships matter, and inconsistencies are unacceptable — e-commerce orders, financial ledgers, user accounts, payroll systems, and inventory management are classic examples.
Document Stores (NoSQL)
Use when your data is naturally JSON-shaped, your schema changes frequently, or you need to scale across many machines without complex relational coordination.
Document databases such as MongoDB and Couchbase store data as self-contained JSON-like documents rather than rigid rows and columns. Each document can have different fields, nested objects, or arrays without requiring schema defined upfront. A news platform, for example, might store each article as a single document containing a title, an author object, a tags array, media embeds, and body content. One article may include live updates while another contains a corrections section — each document carries only the fields it needs. Because documents are largely independent, they are easier to partition and distribute across many servers, making horizontal scaling significantly simpler than with tightly relational systems.
Data shape. Flexible JSON documents with nested structures and optional fields. There is no fixed schema. A user without a phone number simply does not have that field — the document fits its own shape rather than a predetermined mould.
The key distinction from key-value stores. A document store is designed around querying inside the value — filtering by fields, indexing nested properties, and running rich queries are first-class operations. Key-value stores are designed around the key — retrieval by exact key is the primary operation, and any querying beyond that is secondary capability bolted on top. The difference is what the system is optimised for, not what it can technically do.
Scale. Document stores are designed for high write throughput across many machines. Because each document is self-contained, writes are isolated — the database does not need to update multiple related tables, enforce cross-table constraints, or coordinate locks across records. This independence is what makes horizontal scaling tractable: data can be split across many nodes through sharding, and each node can handle its writes without tight coordination with the others. It is a fundamentally different scaling story from relational databases, where cross-table consistency requirements make distribution expensive.
Access patterns. Fast single-document reads and writes, flexible filtering and indexing, but weaker support for complex joins and multi-document relational queries.
Consistency. Many document stores offer tunable consistency models ranging from strong to eventual, depending on performance and availability requirements.
When to use. Document stores are ideal when the application primarily works with self-contained entities — user profiles, product catalogues, content management systems, event metadata, mobile app backends, or personalisation systems — and when schema flexibility and operational scalability matter more than strict relational guarantees.
When to avoid. The relationship model weakens significantly. If you store a user ID inside a post document, the database does not verify that user exists — that becomes your application's responsibility. Complex joins across collections are slow or require multiple round trips.
Wide-Column Stores
Use when you need massive, sustained write throughput across many nodes and your query patterns are known in advance.
Wide-column stores such as Apache Cassandra, Apache HBase, and Google Bigtable organise data into tables with rows and columns, but unlike relational databases, each row can have a completely different set of columns. There is no fixed schema at the table level — instead, each row effectively carries its own schema, with column names stored alongside values.
The real design principle is that access patterns come first. In relational systems, you model your data and then write queries against it. In wide-column systems, you start with the queries and design your tables around them. Data is structured specifically to make those queries fast, even if that means duplicating or reshaping it in multiple tables. A query that was not anticipated during schema design is often inefficient or impractical to serve.
Data shape. Sparse, distributed rows with dynamic and potentially very large column sets. Each row can be completely different from every other.
Scale. Designed for extreme horizontal scale across many nodes, often handling terabytes to petabytes of data.
Access patterns. Highly predictable and query-driven. Fast lookups when aligned with partition keys, but limited support for ad-hoc querying or joins.
Consistency. Typically tunable, allowing trade-offs between consistency and availability depending on the workload.
When to use. Wide-column databases are best suited for event streams, time-series data, IoT telemetry, logging pipelines, messaging systems, and large-scale activity tracking — anywhere writes are continuous, datasets are enormous, and access patterns are well defined in advance.
When to avoid. They are less suitable when the shape of your questions changes frequently or when exploratory querying is required.
Key-Value Stores
Use when you need extremely fast lookups by a single key, typically for caching, session storage, rate limiting, or ephemeral application state.
Key-value stores represent the simplest possible database model: a key maps directly to a value, and the primary operations are SET and GET. The key is the only guaranteed fast path — retrieval by exact key is what the system is built and optimised around.
Beyond that, systems diverge significantly. Memcached treats the value as a true opaque blob — fast storage and retrieval, nothing more. Redis exposes rich internal data structures: sorted sets, hashes, streams, and pub/sub, making it useful for rate limiting, leaderboards, and real-time messaging well beyond simple caching. DynamoDB supports secondary indexes and attribute-level filtering, blurring the line with document stores. What unites them is the design priority: key lookup is always fast, and everything else is secondary capability that varies by implementation.
Data shape. A key mapped to a value. The value can be as simple as a string or counter, or as rich as a sorted set, stream, or nested document depending on the system. The key is the fastest and most natural access path, but most modern implementations also support filtering, secondary indexes, and attribute-level queries on top of that.
Scale. Extremely high throughput. In-memory systems like Redis are constrained by RAM but achieve sub-millisecond latency. Distributed systems like DynamoDB scale horizontally by partitioning data across many nodes.
Access patterns. Ultra-fast point lookups by exact key. Secondary querying and filtering are available in some systems but are not the primary design target.
Consistency. Varies by implementation. Typically strong per-key operations, with tunable trade-offs in distributed deployments.
When to use. Caching, session management, feature flags, user preferences, counters, and real-time lookups where the query is "fetch this specific item". Commonly deployed as a high-speed layer in front of a primary database, absorbing repetitive reads and reducing load on slower systems.
Graph Databases
Use when relationships between entities are the primary thing you are querying — not just attributes or metadata.
Graph databases such as Neo4j, Amazon Neptune, and TigerGraph model data as nodes and edges. Nodes represent entities like users, accounts, devices, or products, while edges represent relationships such as FOLLOWS, OWNS, TRANSFERRED_TO, or CONNECTED_TO. Both nodes and edges can carry properties, allowing the relationship itself to hold meaningful data.
Why graph databases excel at traversals. In a graph database, moving from one node to a neighbour across an edge is a direct pointer dereference — the cost is constant relative to the local structure, no matter how many total nodes exist. In a relational database, the same operation requires a join that becomes more expensive as tables grow. At ten hops deep, the relational engine may be exploring billions of combinations. The graph database just follows pointers. This efficiency holds within a single partition; in distributed graph deployments, traversals that cross partition boundaries can introduce meaningful overhead, so data modelling decisions about partitioning matter significantly at scale.
Data shape. Nodes and edges with properties. The graph is the primary data model — connections are first-class citizens, not just foreign keys.
Scale. Can scale vertically or horizontally depending on implementation, though cross-partition traversals can introduce overhead in distributed setups.
Access patterns. Optimised for multi-hop traversals, neighbourhood queries, and pathfinding. Not suited for large aggregations or tabular analytics.
Consistency. Often supports strong consistency in single-node deployments, with tunable models in distributed systems.
When to use. Graph databases are best suited for highly connected domains such as social networks, recommendation systems, fraud detection, identity graphs, dependency mapping, and network analysis — where the core question is "how is this connected to that?"
When to avoid. They are less suitable when queries are mostly about independent records, aggregations, or simple lookups. In those cases, relational or key-value systems are typically simpler and more efficient.
Time-Series Databases
Use for any data that arrives as a continuous stream of timestamped measurements — metrics, sensor readings, financial ticks, or application telemetry.
Time-series databases such as InfluxDB, Prometheus, TimescaleDB, and Amazon Timestream are designed specifically for time-ordered data. Each record consists of a timestamp, a measured value, and a set of tags such as host, region, or service. The fundamental assumption is that data is continuously appended, and almost every query is anchored in time — recent windows, historical trends, or rate-based aggregations.
Time-series databases achieve performance by leaning heavily on time-based assumptions. Data is written in chronological order, stored in time-partitioned chunks, and compressed using patterns that exploit temporal locality. Because adjacent values in a metric stream tend to be similar, compression ratios are high. Because queries almost always target recent or bounded time ranges, storage engines can skip entire blocks of irrelevant data.
Data shape. Timestamped measurements with optional dimensional tags and numeric values. Each point is essentially a triple of (time, value, tags).
Scale. Designed for extremely high ingestion rates and efficient long-term storage of large historical datasets.
Access patterns. Time-window queries, aggregations (average, max, percentile), rate calculations, and downsampling. Individual point lookups are uncommon.
Consistency. Varies by system — monitoring-focused setups may prioritise availability, while others support stronger guarantees depending on configuration.
When to use. Time-series databases are best suited for observability systems, IoT telemetry, financial market data, application performance monitoring, and any workload where the central question is "how is this metric changing over time?"
When to avoid. They are less suitable when the data model is primarily relational, heavily transactional, or requires complex cross-entity relationships rather than time-based analysis.
Search Databases
Use when users need to find information by relevance rather than exact matches — product search, document search, autocomplete, or log exploration.
Search databases such as Elasticsearch, OpenSearch, and Apache Solr are designed for one primary capability: retrieving the most relevant documents from large text-heavy datasets. Unlike relational or key-value systems that rely on exact lookups, search engines are built around intent-based retrieval — finding results even when the query does not match the data exactly.
The core mechanism is the inverted index, similar to the index at the back of a book. Instead of scanning every document, the system maps each term to the documents that contain it. On top of this structure, search engines apply relevance scoring, typically based on term frequency, rarity across the dataset, and positional importance — for example, whether a term appears in a title or body. The result is not just matching documents, but a ranked list ordered by likely relevance.
Data shape. Documents — typically JSON records with fields like title, body, tags, and metadata. When a document is written, the search engine breaks each text field into individual terms and builds an index mapping every term back to the documents that contain it. At query time the engine looks up terms directly rather than scanning documents, which is what makes search fast at scale.
Scale. Horizontally scalable systems capable of indexing and searching very large document collections.
Access patterns. Full-text search, fuzzy matching, autocomplete, filtering, faceting, and relevance-ranked retrieval. Not optimised for joins or transactional queries.
Consistency. Typically near-real-time indexing — newly written data becomes searchable after a short delay due to index refresh cycles, prioritising search performance over immediate visibility.
When to use. Search databases are best suited for product search, document retrieval, log analysis, knowledge bases, and any system where users express intent rather than precise identifiers.
When to avoid. They are less suitable for transactional workloads, relational querying, or systems requiring strict consistency and structured joins.
Columnar Databases
Use for analytics — scanning and aggregating large volumes of data to answer business questions. Not for live transactional workloads.
Most traditional databases store data row by row: all columns for a record are written together on disk. This works well when you frequently read or update a single record. But it becomes inefficient when you want to analyse a single field across millions of rows, because the database still has to read entire rows from disk just to extract a few columns.
Columnar databases such as ClickHouse, Amazon Redshift, Google BigQuery, and Snowflake fix this by turning the layout on its side. Instead of storing all columns for a row together, they store all values for each column together — every sales_amount in one place, every country in another. A query that needs three columns reads three column segments and nothing else.
This has a compounding benefit: values within a single column tend to be similar — amounts near each other, countries repeating — so they compress extremely well. Less data on disk means even less data to read. Combined with engines that process those column segments in parallel across many machines, aggregations over billions of rows become fast enough to run interactively.
Data shape. Tabular structure (rows and columns), but physically stored by column rather than by row.
Scale. Designed for distributed, petabyte-scale analytical workloads with separated storage and compute.
Access patterns. Large scans, aggregations, grouping, and filtering across subsets of columns. Not optimised for single-record lookups or frequent updates.
Consistency. Strong consistency within analytical queries, with systems typically optimised for append-heavy ingestion patterns.
When to use. Columnar databases are best suited for data warehousing, BI dashboards, cohort analysis, and large-scale event analytics where the goal is to compute insights rather than retrieve individual records.
When to avoid. They are less suitable for transactional systems or user-facing applications where low-latency per-record access and frequent updates are required.
Vector Databases
Use when you need to find things by meaning rather than structure — such as documents about the same topic, images that look similar, or products a user is likely to prefer.
Traditional databases retrieve data based on exact matches, ranges, keywords, or relationships. Vector databases solve a different problem entirely: finding items that are semantically similar, even when they share no obvious words or structure.
This is achieved using embeddings. A machine learning model converts content — text, images, audio, or user behaviour — into a high-dimensional vector: an array of numbers that represents its meaning. Items with similar meaning are placed close together in this vector space. For example, "dog" and "puppy" produce nearby vectors, as do "running shoes" and "athletic footwear", even though the wording is different.
Vector databases such as Pinecone, Weaviate, Milvus, and Chroma are built to store these vectors and search across millions of them in milliseconds. The search is not exact — instead of finding a perfect match, the database finds the closest ones, which is both faster at scale and more useful in practice. Close enough in vector space means similar in meaning.
For many teams, a dedicated vector database is not the first step. PostgreSQL, MongoDB, and Elasticsearch all now offer vector search as built-in capability, and for moderate scale that is often sufficient. A standalone vector database becomes the right choice when similarity search is the dominant workload, when you are managing hundreds of millions of vectors, or when you need fine-grained control over indexing and search performance that general-purpose systems cannot provide.
Data shape. Each record stores three things: the original content (a product description, an article, an image), a machine-generated numerical representation of its meaning, and any metadata you want to filter on such as date or category. You query with content; the database matches by meaning.
Scale. Designed for efficient similarity search across millions to billions of vectors, typically using approximate nearest-neighbour indexing to trade a tiny amount of precision for massive speed gains.
Access patterns. Similarity search, semantic retrieval, recommendations, clustering, and deduplication.
Consistency. Typically near-real-time indexing. Newly added embeddings become searchable shortly after ingestion.
When to use. Vector databases are best suited for AI-driven applications such as semantic search, retrieval-augmented generation (RAG), recommendation systems, image and audio similarity search, and intelligent assistants.
When to avoid. They are less suitable for transactional systems, relational queries, or structured workloads where exact matches and consistency are more important than semantic similarity.
NewSQL
Use when you need both strong ACID guarantees and horizontal scalability — and neither can be compromised.
NewSQL databases such as CockroachDB, Google Cloud Spanner, and YugabyteDB were created to remove the traditional trade-off between relational consistency and distributed scale. They provide a familiar SQL interface — tables, joins, constraints, and transactions — while internally distributing data across many nodes and regions.
These systems behave like a traditional relational database from the application's perspective, but internally coordinate transactions across a distributed cluster using consensus protocols such as Raft or Paxos. This allows the system to maintain strong consistency even when data is partitioned across different machines or data centres.
The operational cost is real and worth stating plainly. NewSQL systems are genuinely harder to run than either a managed PostgreSQL instance or a simple NoSQL service. Distributed consensus adds latency, cross-region transactions add complexity, and debugging failures across a distributed cluster requires expertise most teams build slowly. The question to ask honestly is whether your workload actually requires both global distribution and ACID guarantees simultaneously — because if it does not, a conventional relational database will be simpler, cheaper, and easier to operate.
Data shape. Relational tables with schemas, constraints, and joins, but physically distributed across multiple nodes.
Scale. Horizontally scalable across clusters and regions while preserving relational semantics.
Access patterns. Full SQL support including joins, aggregations, and multi-row transactions across distributed data.
Consistency. Strong global ACID guarantees achieved through distributed consensus and transaction coordination.
When to use. NewSQL systems are best suited for global applications that require both scale and correctness — financial systems, SaaS platforms, order management systems, and multi-region applications where data integrity must be preserved under high load.
When to avoid. They are less suitable for workloads that do not require distributed transactions, or where specialised systems provide better performance and simplicity. Starting with PostgreSQL and migrating later is almost always the right call for teams that are not already at global scale.
Event Streams (Kafka / Kinesis)
Use when multiple services need to react to the same business events independently, or when you need a complete, replayable history of everything that has happened in your system.
Event streaming platforms such as Apache Kafka and Amazon Kinesis model data fundamentally differently from traditional databases. Instead of storing the current state of a system, they store an ordered, append-only log of immutable events. Each event represents something that happened — an order was placed, a payment was confirmed, an item was shipped. The current state is not stored directly; it is derived by replaying these events in sequence.
This shifts the entire design model from "what is true right now?" to "what has happened over time?" Because events are immutable and only appended, the system preserves a complete history by default. Any point in time can be reconstructed by replaying events up to a given offset, making the event log both a transport mechanism and a durable source of truth.
Data shape. An ordered, immutable log of timestamped events, partitioned by a key such as user_id or order_id. Each event is a small record containing a type, a timestamp, and a payload.
Scale. Designed for extremely high throughput, often processing millions of events per second across distributed brokers.
Access patterns. Sequential consumption of events — either real-time streaming or replay from a specific offset. Not designed for random queries or joins.
Consistency. Strong ordering within partitions, with delivery guarantees ranging from at-least-once to exactly-once with additional configuration.
When to use. Event streaming systems are best suited for real-time data pipelines, microservice communication, audit logging, financial event tracking, and analytics systems where multiple consumers must react to the same sequence of changes.
When to avoid. They are less suitable for direct state storage or ad-hoc querying, since reconstructing state requires processing event history or maintaining separate materialised views.
Object Storage (S3 / Blob Storage)
Use when the data is large, unstructured, and not meaningfully queryable — and the system only needs to store and retrieve it as a whole.
Object storage systems such as Amazon S3, Google Cloud Storage, and Azure Blob Storage are designed for storing large binary or file-based data: images, videos, audio, PDFs, backups, logs, and machine learning datasets. Unlike databases, they do not interpret or index the internal structure of the data. Each object is stored as an opaque blob and retrieved in full using a unique key or URL.
The key design principle is separation of metadata and data. Another database stores structured information — such as an object ID, ownership, timestamps, and a pointer to the object's location — while the object store holds the actual bytes. This separation matters in practice: storing large binary payloads directly inside a relational database causes row bloat, inflates backup sizes, slows down replication, and degrades performance for every query on that table — including queries that have nothing to do with the binary content. Keeping blobs in object storage and storing only a reference in the database avoids all of this.
Data shape. Opaque binary objects stored with unique keys and optional metadata. The storage system never looks inside the bytes.
Scale. Effectively unlimited horizontal scale, designed for very large datasets spanning petabytes to exabytes.
Access patterns. Whole-object PUT, GET, and DELETE operations. Without additional services like S3 Select or metadata indexing layers, there is no querying inside objects and no partial retrieval.
Consistency. Modern object stores generally provide strong read-after-write consistency for new objects and updates, with high durability guarantees.
When to use. Object storage is best suited for media files, backups, archives, data lakes, logs, ML datasets, and any system where data is consumed as complete objects rather than queried by internal structure.
When to avoid. It is less suitable for transactional systems, low-latency structured queries, or workloads requiring indexing, relationships, or partial updates within the data.
How Deep Is the Adoption of Various Database Types?
Understanding how widely each database type is actually used helps calibrate how much community support, tooling, operational knowledge, and hiring pool you can expect.
Relational databases (SQL) remain the dominant foundation of the industry by a wide margin. The 2024 Stack Overflow Developer Survey found PostgreSQL used by 49% of developers, MySQL by around 41%, and SQL Server by roughly 25% — making relational databases comfortably the most common starting point across organisation types and sizes. The 2026 OpenLogic State of Open Source Report corroborates this, showing PostgreSQL adopted by 44% of organisations surveyed and MySQL by 52%. For the vast majority of teams, a relational database is either the primary store or a critical component of the stack.
Key-value stores have achieved extraordinary reach. The same OpenLogic survey placed Redis at 46% adoption — making it the third most widely deployed open source database overall, ahead of PostgreSQL for the first time. Redis is the archetypal complement to a relational primary store: almost every production system of meaningful scale runs one. It is worth noting that Redis changed its licence in 2024, prompting significant enterprise evaluation of alternatives such as the Linux Foundation-backed Valkey, which maintains full API compatibility. The churn is ongoing but Redis/Valkey as a category remains nearly ubiquitous.
Document stores are the dominant NoSQL category and the most common entry point when teams move beyond pure relational architectures. Industry surveys consistently show document-oriented systems representing around 46% of NoSQL deployments. MongoDB is the most widely recognised name; roughly 40% of organisations use a multi-database approach that includes at least one NoSQL store, and MongoDB is the most likely candidate for that role. Adoption skews toward teams with rapidly evolving schemas, mobile and web backends, and content-heavy applications.
Search databases appear across a wide range of production stacks, typically alongside a primary relational or document store. Elasticsearch and its open-source fork OpenSearch are the dominant implementations. The OpenLogic survey shows them deployed alongside relational and caching layers as one of the most common combination patterns in modern architectures. Adoption is high among e-commerce platforms, SaaS products with content search, and any team running observability pipelines.
Event streaming (Kafka / Kinesis) is well established but skews toward larger organisations and more complex architectures. Around 48% of Kafka users work for companies with more than 500 employees, according to the 2026 OpenLogic report, though the adoption split across company sizes was notably even — suggesting streaming infrastructure is no longer exclusively an enterprise concern. Kafka appears most commonly in microservice architectures, financial platforms, and organisations running real-time analytics pipelines.
Columnar databases are standard infrastructure for any organisation doing meaningful analytics. BigQuery, Redshift, Snowflake, and ClickHouse are widely deployed, with adoption driven heavily by the shift from on-premise data warehouses to cloud-managed analytical services. Adoption is essentially universal among companies with a dedicated data or analytics team; the question is typically which product, not whether.
Time-series databases have strong adoption in engineering-heavy organisations — particularly those running observability stacks (Prometheus is deployed in the majority of Kubernetes environments) and IoT or industrial sensor workloads. Outside of those contexts, time-series is often handled by a general-purpose system rather than a specialist store.
Wide-column stores (Cassandra, HBase, Bigtable) are concentrated in large-scale deployments — typically high-traffic consumer applications, telecommunications, financial services, and IoT platforms generating continuous high-volume writes. They represent a smaller share of overall deployments than relational or document stores, but within their target use cases they are the established choice.
Graph databases remain the most specialised category outside of vector databases. Neo4j is the most widely deployed, but graph databases are typically adopted to solve a specific well-defined problem — fraud detection, identity resolution, recommendation graphs — rather than as general-purpose infrastructure. Adoption is growing, particularly in financial services and platforms with rich social or network data, but the category accounts for a smaller share of deployments than any other type covered here.
NewSQL (CockroachDB, Spanner, YugabyteDB) is largely confined to organisations with specific requirements for both global distribution and ACID guarantees. Google Spanner underpins several of Google's own large-scale services; CockroachDB has significant traction in financial services. For most organisations, the operational complexity does not justify adoption unless the use case genuinely demands it.
Vector databases are the fastest-growing category and among the most discussed, though adoption figures should be read in the context of how recently the category became mainstream. The global vector database market was valued at around $2.5 billion in 2025 and is growing rapidly, driven almost entirely by enterprise AI adoption — specifically retrieval-augmented generation (RAG), which now features in over half of enterprise AI implementations according to Menlo Ventures' 2024 generative AI report. Many teams are meeting their initial vector search needs through extensions in existing systems (pgvector, MongoDB Atlas Vector Search, Elasticsearch's vector capabilities) rather than standalone vector databases. Dedicated vector databases are increasingly the choice for AI-native applications at scale.
Object storage is essentially universal. Amazon S3 alone stores trillions of objects and serves the majority of internet-scale applications. Any organisation using a cloud provider for production workloads is almost certainly using object storage — the question has long since shifted from adoption to configuration and cost management.
How Real Systems Look in Practice
Real systems are almost never pure. A single application might need strong consistency for payments, relevance ranking for product search, sub-millisecond latency for sessions, and analytical aggregation for business reporting. No single database satisfies all of these simultaneously.
This leads to polyglot persistence — using multiple databases, each optimised for a specific access pattern. A real e-commerce platform might use PostgreSQL for orders, payments, and inventory where ACID guarantees are essential; Elasticsearch alongside it for product search with relevance ranking and faceted filtering; Redis for sessions and shopping carts where sub-millisecond reads matter; BigQuery for overnight business intelligence reports; S3 for product images; and Kafka tying it together, carrying events between services.
Each choice exists because a specific force in the system demanded it. But polyglot persistence has real costs. Every additional database is another system to monitor, back up, secure, and maintain. Data in multiple stores needs to stay consistent — if a product price updates in PostgreSQL but the sync to Elasticsearch fails, users see wrong prices in search results. Debugging issues that span multiple databases is harder. Onboarding new engineers is more complex.
The right approach, especially for teams early in their journey, is to start with the simplest thing that could work — usually a relational database — and introduce specialised stores only when a specific, measurable pain makes the cost worth paying.
Putting It All Together
The four questions are your compass.
Shape tells you whether your data is tabular (relational), document-shaped (document store), connected (graph), measured over time (time-series), or semantically rich (vector).
Scale tells you whether a single machine is enough or whether you need horizontal distribution — and which categories even support that.
Access patterns tell you whether you need single-record lookup (key-value or relational), Google-like search over messy text (search database), friend-of-friend traversal or fraud ring detection (graph), time-window aggregation over live metrics (time-series), large-scale offline aggregation (columnar), or similarity-by-meaning retrieval (vector).
Consistency tells you whether you need ACID guarantees (relational or NewSQL) or whether eventual consistency is acceptable (most NoSQL systems).
These four questions, answered honestly, will narrow you to one or two database families. The rest — which specific product, which version, which hosting option — is secondary.
One final note: the boundaries between categories are less rigid than this guide implies. PostgreSQL now supports JSON documents, full-text search, and vector similarity via extensions. MongoDB has added time-series collections and vector search. The clean separations here are conceptual tools, not permanent walls. The important distinction is this: databases are converging, but workloads are not. A system that can do many things is not the same as a system optimised for your specific thing. The trade-offs that gave rise to each category remain real even as individual products expand — understanding what a specialisation was built for still tells you what it is tuned for, and that distinction does not go away.
About N Sharma
Lead Architect at StackAndSystemN Sharma is a technologist with over 28 years of experience in software engineering, system architecture, and technology consulting. He holds a Bachelor’s degree in Engineering, a DBF, and an MBA. His work focuses on research-driven technology education—explaining software architecture, system design, and development practices through structured tutorials designed to help engineers build reliable, scalable systems.
Disclaimer
This article is for educational purposes only. Assistance from AI-powered generative tools was taken to format and improve language flow. While we strive for accuracy, this content may contain errors or omissions and should be independently verified.
