We are in the middle of Module 3, and the previous lessons have been working through the different storage shapes you might reach for. Relational stores. Document stores. Key-value caches. This lesson is about the family that, more than any other, was built to answer a single question: what do you do when one machine is no longer enough, and you do not want to spend the rest of your career babysitting a sharded Postgres cluster? The answer the industry came up with, between 2006 and 2010, is the wide-column database. Cassandra, ScyllaDB, and BigTable are the three names worth knowing.
The deal these databases offer is straightforward and uncomfortable in equal measure. You get horizontal scale that is, in practice, unbounded. You add nodes, you get more capacity, the system rebalances. You also get a schema that is welded to your queries: you cannot ask a question you did not design for, and the cost of asking the wrong question is a full-cluster scan that takes minutes and ruins everyone’s afternoon. The trade is real. This lesson is about understanding it well enough to know when it is worth taking.
The data model
A wide-column row is not the row you know from Postgres. It has a structure that takes a moment to internalise, and once you have it, the rest of the model falls into place.
Every row lives inside a partition, identified by a partition key. The partition key is hashed, and the hash determines which node in the cluster owns this partition. All rows with the same partition key live on the same node (and on its replicas). This is the unit of distribution.
Inside a partition, rows are sorted by a clustering key. The clustering key is one or more columns. Rows with the same partition key but different clustering keys are stored next to each other on disk, in the order the clustering key defines. This is the unit of locality: a single read can pull a contiguous slice of rows from a partition very cheaply.
After the partition key and the clustering key, a row has its columns: ordinary fields, set per row, with no requirement that two rows in the same table share the same set of columns. One row can have name, age, email. Another row in the same table can have name, last_login, device. The “wide” in wide-column comes from this: the column set per row can be arbitrarily wide, and rows do not need to agree on it.
Conceptually, a partition looks like a sub-table dedicated to one entity, with the clustering key as its primary key, holding however many rows that entity has accumulated. A user’s event log. A device’s sensor readings. A tenant’s billing records. The partition holds the slice. The cluster holds the partitions.
flowchart LR
Q[Query: events for user 42, last 24h] --> H[hash user_id 42]
H --> N[Node owning partition]
N --> P[Partition: user_id=42]
P --> R1[ts=2025-12-10T10:00:00 event=login]
P --> R2[ts=2025-12-10T10:05:12 event=view]
P --> R3[ts=2025-12-10T10:08:44 event=click]
P --> R4[...]
The query lands on one node, hits one partition, and reads a slice. That is the shape every wide-column workload is trying to be.
Where this came from
The two foundational papers came out of Google and Amazon within a year of each other. Google published the BigTable paper in 2006, describing the storage system behind Search, Gmail, and Maps. Amazon published the Dynamo paper in 2007, describing the storage system behind the shopping cart. Both papers said, in different vocabularies, the same thing: relational databases do not scale horizontally without enormous pain, and here is a different shape that does.
The open-source descendants split along the two lineages. Cassandra (Apache, 2008, originally written at Facebook) took Dynamo’s masterless replication model and BigTable’s data model, and packaged them as a single system. HBase (Apache, 2008) was a more direct BigTable clone built on top of Hadoop’s HDFS, popular in the Hadoop ecosystem of the early 2010s. ScyllaDB (2015) is a C++ rewrite of Cassandra, drop-in compatible with Cassandra clients, with much higher per-node throughput. BigTable itself is available as a managed service on Google Cloud and is the version still running inside Google.
The story of the last decade is roughly: HBase has faded as the Hadoop world has faded; Cassandra is still the default name people reach for; ScyllaDB has eaten a non-trivial share of new wide-column workloads because the operational story is simpler at the same throughput; BigTable is the natural pick if you are already on GCP and do not want to run anything yourself.
Cassandra: the masterless model
Cassandra’s defining design decision is that there is no leader. Every node in the cluster is equal. Reads and writes can hit any node, which then forwards to the replicas that own the partition. Replication is configured per keyspace: replication factor 3 means each partition lives on three nodes, chosen by a consistent-hashing ring.
Consistency is tunable per query. When you write, you specify how many replicas must acknowledge the write before the client is told it succeeded: ONE, QUORUM, ALL. When you read, you specify how many replicas must respond: same options. The combination determines the consistency you get. QUORUM for both read and write gives you strong consistency on the partition, at the cost of latency. ONE for both gives you eventual consistency, fast and weak. Most production workloads pick QUORUM for both and treat the system as strongly consistent for individual partitions.
The price of all this is operational. A Cassandra cluster has compaction, which is the process of merging the on-disk sorted files (SSTables) that writes accumulate into. Compaction tuning is a real job. Repair, which reconciles divergent replicas, is another real job. Adding and removing nodes is straightforward in principle and full of surprises in practice. The operational reputation of Cassandra is not undeserved, and it is the reason teams new to wide-column today usually start with ScyllaDB.
ScyllaDB: same data model, less pain
ScyllaDB started from the observation that Cassandra is a JVM application doing a lot of low-level work that the JVM is bad at. The rewrite in C++, with a thread-per-core architecture and a userspace network stack, gives roughly an order of magnitude more throughput per node. A Cassandra cluster of fifty nodes often becomes a ScyllaDB cluster of five.
The wire protocol and the query language are the same. Cassandra clients talk to ScyllaDB without changes. The data model is identical. The operational story is dramatically simpler because you have fewer nodes to manage and the per-node performance is predictable.
For a new wide-column workload at scale, ScyllaDB is, in 2026, often the right answer. The Cassandra ecosystem is broader, but the day-to-day experience of running Scylla is enough better that the broader ecosystem is rarely worth the operational tax.
BigTable: the managed option
If you are on Google Cloud and you want a wide-column store with no operational burden, BigTable is the answer. It is the version Google itself runs internally. The data model is the same shape (row key, column families, columns, cell versions), with the local quirk that BigTable’s “row key” is what Cassandra would call a partition key plus clustering key concatenated. The query API is more limited than Cassandra’s CQL, but for the use cases BigTable is designed for (high-throughput key-range scans), the API is sufficient.
The pitch is simple: you do not run nodes, you do not tune compaction, you do not worry about replication. You pay for nodes by the hour and capacity scales by adding nodes through an API call. For a team that does not want to be in the database-operations business, this is a fair deal.
The constraint that defines the family
Every wide-column database imposes the same fundamental constraint, and it is the thing you must internalise before you commit to one: you must design your schema for the queries you intend to run.
There are no joins. There are no ad-hoc queries on non-key fields. If you find yourself wanting to ask “give me all rows where last_login > 2025-01-01,” and last_login is not your clustering key, the answer is a full-cluster scan. The cluster will do it, eventually, slowly, at a cost that ruins your latency for everyone else.
The data-modeling exercise for a wide-column database is therefore not “what does my data look like.” It is “what queries will I run.” You enumerate the read patterns. For each one, you design a partition key and a clustering key that make the query a fast lookup. If two queries need the same data accessed by different keys, you store the data twice, once for each access pattern. Denormalisation is not a smell. It is the model.
A worked example. Suppose you have an events table for a SaaS application. The queries you need:
- Recent events for a given user (partition by
user_id, cluster bytimestamp DESC). - Recent events for a given tenant (partition by
tenant_id, cluster bytimestamp DESC). - All events of a given type in the last hour, across all tenants.
The first two are natural fits. Each gets its own table, with its own partition key, and writes go to both tables on every event. The third is awkward: there is no good partition key. The pragmatic answer is usually to denormalise into a third table partitioned by (event_type, time_bucket), where time_bucket is something like 2025-12-10T10 (one bucket per hour). Now query 3 is a partition lookup on the relevant buckets, and the cost is paid at write time by writing to a third table.
This shape, of designing schemas backwards from queries and writing each event to multiple tables, is what wide-column workloads look like in practice. It is uncomfortable for engineers used to relational normalisation. It is also the price of the scale.
When to use it, when not to
Wide-column wins when:
- You have time-series data at extreme scale: event logs, application metrics, IoT sensor data. Partition by entity, cluster by timestamp, and the access pattern lines up perfectly with the storage layout.
- You have user data partitioned by
user_idat a scale where Postgres breaks. The query is always “all rows for this user,” the partition is always small enough to fit in memory, and the cluster scales linearly with the number of users. - Your access pattern is “always know the key” at a billion-row scale. Lookups by primary key, no surprises, no joins.
Wide-column loses when:
- You need ad-hoc analytics or arbitrary filtering. Use a columnar OLAP store (lesson 24, ClickHouse) or a search index (lesson 25, Elasticsearch) for that.
- You need joins. Use Postgres.
- You need OLTP-style relational integrity (foreign keys, multi-row transactions across tables). Use Postgres.
The honest summary is that wide-column is a specialist tool. When the access pattern fits, nothing else scales as gracefully. When it does not fit, every other choice is better.
A teaser
The most-told case study in this space is Discord’s storage journey. They started on MongoDB, hit its limits in 2017, migrated to Cassandra, hit Cassandra’s operational limits in 2022, and migrated to ScyllaDB. The migration story has detail worth chewing on (the data model evolved across the migrations, the operational pain was real, the throughput gains were dramatic). We will go through it carefully in lesson 32 of this course, when we cover real migration stories. For now, the relevant lesson is that wide-column is a real production answer for messaging-scale data, and the choice between Cassandra and ScyllaDB is not academic.
Citations and further reading
- Apache Cassandra documentation,
https://cassandra.apache.org/doc/latest/(retrieved 2026-05-01). The reference for the data model, CQL, and operational topics. - Fay Chang et al., “Bigtable: A Distributed Storage System for Structured Data” (Google, OSDI 2006),
https://research.google/pubs/pub27898/(retrieved 2026-05-01). The original BigTable paper. - Giuseppe DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store” (Amazon, SOSP 2007),
https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf(retrieved 2026-05-01). The other foundational paper for this family. - ScyllaDB documentation,
https://docs.scylladb.com/(retrieved 2026-05-01). Coverage of the architecture and the Cassandra-compat API. - Discord Engineering, “How Discord Stores Trillions of Messages”,
https://discord.com/blog/how-discord-stores-trillions-of-messages(retrieved 2026-05-01). The migration to ScyllaDB, with the data-modeling decisions that came with it.