Data & System Architecture, from the ground up Lesson 32 / 80

Real case: Discord's MongoDB to Cassandra to ScyllaDB journey

How Discord's message storage went from MongoDB to Cassandra to ScyllaDB over ten years, what each migration cost, and what the lessons are for everyone else.

This is the first lesson of the course that is, end to end, a single case study. The reason for breaking the format is that the abstractions of Module 4 (sharding, replication, hot keys, fan-out) are easier to internalise when you watch them play out in a real production system over ten years. Discord’s message-storage journey is the cleanest public example I know. The company has written about each transition in detail on its engineering blog, and the three eras of the system map directly onto the lessons of this module.

Discord is a chat platform. By 2023 it had hundreds of millions of monthly users and a message store holding trillions of rows. The system has been rewritten on top of three different databases since 2015: MongoDB first, then Cassandra, then ScyllaDB. Each migration was driven by hitting an operational ceiling on the previous database. The data model, after the second migration, has stayed the same.

The workload, stated precisely

Before the eras: what is Discord asking the database to do?

The dominant query is “give me the most recent N messages in this channel, optionally before timestamp T, for pagination as the user scrolls back through history”. Almost every read in the system is this query. Some reads also include filters: messages from a specific user, messages with a specific reaction, messages matching a search. The dominant write is “insert a new message into this channel”. Edits and deletes are vastly less common than inserts.

The data has two natural keys. Messages are scoped to channels: a server’s #general, a private DM between two users, a group DM. Within a channel, messages are ordered by time. The natural unit of locality is “all messages in a channel, in time order”. The natural unit of distribution is the channel.

The traffic is heavily skewed. A few channels are gigantic (popular servers’ announcement channels, viral DMs during world events) and most channels are tiny (a DM between two users that has fifty messages total). Any storage layer for this workload has to handle both extremes without falling over on either one.

That is the problem statement. Now the eras.

2015 to 2016: MongoDB

Discord launched in 2015 with MongoDB as the message store, for the pragmatic startup reason: MongoDB was easy to operate at small scale, the schema-less document model fit the loose shape of a chat message, and the team did not yet know exactly what queries they would need. MongoDB let them ship.

The breakage came when the working set stopped fitting in RAM. MongoDB’s read performance is excellent when the indexes and the hot data fit in memory. When the dataset grows past the RAM ceiling, every read that misses the cache hits disk, and the latency tail blows up. By 2017 Discord was at around a hundred million messages, and read latency on chat history (the dominant query) was getting bad enough that users noticed.

The 2017 blog post is worth reading in full because it is honest about what specifically went wrong. The IO pattern of “recent messages in this channel” was being served by an index lookup followed by random reads against the on-disk message documents, and the random reads were the problem. Caching the index was easy; caching all the messages was not. A second problem: MongoDB’s replica-set model had operational sharp edges at the scale they were operating. The decision to migrate was driven equally by the latency problem and by operational fatigue.

2016: the move to Cassandra

The migration target was Cassandra, and the data model was redesigned from scratch around the dominant query. This is the part of the story that is easiest to learn from, because the data-modelling exercise is exactly the one lesson 20 described.

The query is “messages in this channel, ordered by time”. The Cassandra data model that turns this into a fast operation is:

  • Partition key: channel_id (technically a composite of channel_id and a time bucket, to bound partition size). All messages in a channel live in the same partition.
  • Clustering key: message timestamp (or message ID, which embeds a timestamp). Messages in a partition are stored on disk in time order.

With this layout, “recent messages in channel X” is a single partition lookup followed by a contiguous read of the most recent rows. Cassandra does this kind of read very fast, because the partition is on a single node, the rows are contiguous on disk, and the access pattern matches the LSM-tree storage exactly. No random IO, no joins, no fan-out: one shard, sequential read, done.

The composite partition key with a time bucket addresses the “some channels are huge” problem. Without bucketing, a single channel’s partition would grow without bound, eventually causing the problems lesson 28 described (hot partition, slow compactions, painful repair). With bucketing (for example, partition key = (channel_id, year_month) or similar), each partition holds at most a month of one channel’s messages, which keeps partitions bounded regardless of channel size. The price is that “give me the last N messages” might span two buckets near a month boundary, which the application handles by querying the current bucket and falling back to the previous bucket if needed.

The migration itself took months. Discord’s blog describes a dual-write phase where new messages were written to both MongoDB and Cassandra simultaneously, while a backfill job copied historical messages from Mongo into Cassandra. Once the backfill caught up, reads were switched to Cassandra, and after a verification window MongoDB was decommissioned. This is the standard playbook for database migrations, and it is the right one. Anything more aggressive (a single big-bang cutover) would have been a disaster at the scale they were operating.

For the next six years, Cassandra was the message store. The cluster grew from twelve nodes initially to one hundred and seventy-seven nodes by 2022, holding trillions of messages. The data model did not change. Adding capacity meant adding nodes, which Cassandra handled, with operational pain that Discord wrote about in detail.

2022 to 2023: the move to ScyllaDB

The 2022 blog post on the ScyllaDB migration is the more recent and more interesting of the two. The headline of the post is the cost: Discord migrated from one hundred and seventy-seven Cassandra nodes to seventy-two ScyllaDB nodes, with better latency and lower cost.

The reasons for migrating were not data-model problems. The data model was working. The reasons were operational, and they line up exactly with the things lesson 20 said about Cassandra.

JVM overhead and latency variability. Cassandra is a JVM application. Garbage collection pauses cause periodic latency spikes that show up in the p99 and p99.9 tails. At Discord’s scale, even a small fraction of slow reads is a lot of slow reads in absolute terms. The team had spent significant effort tuning GC, and the latency tail was still problematic.

Compaction pain. Cassandra’s compaction process merges the on-disk sorted files (SSTables) that writes accumulate into. Compaction is necessary to keep reads fast and to reclaim space from deleted rows, but it consumes IO and CPU on the same nodes that are serving reads. At Discord’s scale, compaction was an ongoing source of operational toil: tuning compaction strategies, scheduling compactions, dealing with compaction backlogs, monitoring compaction-induced latency spikes.

Operational fatigue. Repair (the process that reconciles divergent replicas) was painful at one hundred and seventy-seven nodes. Adding nodes was painful. Removing nodes was painful. The team had become very good at running Cassandra, and running it was still a substantial fraction of their time.

ScyllaDB is the C++ rewrite of Cassandra. Same wire protocol, same query language, same data model, completely different implementation. The thread-per-core architecture, userspace networking, and absence of a JVM mean that ScyllaDB delivers an order of magnitude more throughput per node, with much smaller and more predictable latency tails. For a team running an existing Cassandra workload, ScyllaDB is the closest thing to a free lunch that exists in this space: same client code, same data layout, same operational shape, much better numbers.

The migration was, again, a multi-month engineering exercise. Discord wrote a custom data-migration service to copy the trillions of messages from Cassandra to ScyllaDB while production traffic continued. The service used dual-writes for new messages, a streaming backfill for historical messages, and per-channel verification before the cutover. The cutover was incremental: traffic was shifted channel-by-channel, monitored, and rolled back if anything misbehaved. After a few months, the entire workload was on ScyllaDB.

The numbers from the public post: one hundred and seventy-seven Cassandra nodes became seventy-two ScyllaDB nodes, a roughly 60 percent reduction. Read latency at the tail dropped substantially. The infrastructure cost on this workload was meaningfully lower (Discord publicly framed the migration as a major cost-saving move). The operational pain associated with compaction, GC, and node management was substantially reduced, though not eliminated: ScyllaDB is still a wide-column database, and the same operational primitives apply.

flowchart LR
    subgraph E1[2015-2016: MongoDB era]
      M[(MongoDB)]
      M -->|RAM ceiling, IO pattern| L1[Latency degradation]
    end
    subgraph E2[2016-2022: Cassandra era]
      C[(Cassandra, 12 to 177 nodes)]
      C -->|JVM, compaction, ops| L2[Operational ceiling]
    end
    subgraph E3[2022-2026: ScyllaDB era]
      S[(ScyllaDB, 72 nodes)]
      S --> L3[Lower cost, better tail]
    end
    L1 --> E2
    L2 --> E3

The data model that survived from era 2 to era 3 is the same: channel as partition (with time-bucketing for size control), message timestamp as clustering key. The migration changed the implementation of the database underneath, not the shape of the data. This is the single most important fact about the migration and the most generalisable lesson.

What the journey teaches

Five lessons, in roughly the order Discord encountered them.

Choose the data model around your dominant query pattern. The reason Cassandra worked is that the partition-per-channel, clustered-by-timestamp layout makes the dominant query a single-partition sequential read. The same model is what made the ScyllaDB migration possible without a re-architecture: the data shape was already right for a wide-column store, and ScyllaDB is a wide-column store. If the original Cassandra schema had been wrong (say, partition by user) the move to ScyllaDB would have required a data-model change, which is much harder than a database swap. Lesson 20 said wide-column databases earn their keep when the schema matches the query. Discord is the textbook case.

Wide-column databases earn their keep at scale. They are operationally heavy. Compaction, repair, node management, capacity planning: none of it is fun. But at Discord’s scale the alternatives are also heavy and additionally do not scale. MongoDB hit the ceiling at a hundred million messages; Cassandra bought them six years and several thousand times the data.

Migration is engineering, not magic. Both Discord migrations took months and used the same playbook: design the new data model first, dual-write, backfill historical data while live traffic continues, verify, cut over incrementally, monitor, decommission. Lesson 29 covered the patterns in the abstract; the Discord migrations are exactly that pattern at industrial scale. There is no shortcut.

The right rewrite of an old technology can be transformative. ScyllaDB is the case study. Same protocol, same query language, same data model, completely different implementation, an order of magnitude better in the parts that matter. The migration was nearly free in application changes and dramatic in cost and latency. This is possible because Cassandra’s wire protocol is documented and stable. Closed-protocol systems never get the equivalent rewrite.

Operational fatigue is a real signal. The Cassandra cluster was technically working when Discord migrated away from it. The data model was right, the cluster was scaling, the application was serving traffic. The migration was driven by the cumulative weight of operational pain and the realisation that a rewrite of the same database in a different language could lift much of it. When a team spends a substantial fraction of its time running the database rather than building the product, the database is too expensive even if it is not failing. The cost shows up in the engineering org chart, not in the latency dashboard.

What this means for systems that are not Discord

You almost certainly do not have a chat workload at Discord’s scale. The relevant lessons are therefore not “use Cassandra” or “migrate to ScyllaDB”. They are the meta-lessons.

If your access pattern is “recent items in a stream, by time, scoped to an entity”, the wide-column data model (entity as partition, timestamp as clustering key) is the right shape, regardless of whether the store underneath is Cassandra, ScyllaDB, BigTable, DynamoDB, or even Postgres with sharded tables at smaller scale.

If you are picking a database for a new product, optimise for the dominant query and accept that the secondary queries will be ugly. Discord’s design accepts that “find a single message by ID” is awkward in exchange for making the dominant operation extremely fast. The trade is correct because the secondary operation is rare.

If you are running a database that works but hurts to operate, migration is a real option. It is also expensive: months of engineering. Migrate when the cumulative operational cost is genuinely larger than the migration cost, and when the target is mature enough to bet on. Being an early adopter is a different and usually worse risk profile.

If you are running a single Postgres instance with ten million rows: stay there. The case study does not say “switch to Cassandra”. It says “design your data layout for your dominant query, and choose a database that fits the layout”. For ten million rows on a user-centric workload, that database is almost certainly Postgres.

Module 4 closes here

Module 4 opened with replication, walked through partitioning, sharding strategies, hot keys and rebalancing, cross-shard queries, and ends here on a case study that ties the abstractions to a real production system over a decade. The patterns are the same regardless of which database you picked in Module 3, and they are the foundation for everything Module 5 builds on.

Module 5 turns to processing: how data flows through a system once it is stored. Batch processing first (the descendants of MapReduce, the modern data-warehouse pipelines), then stream processing (Kafka, Flink, the real-time side), then the convergence of the two. The storage layer is the floor. Processing is the building.

Citations and further reading

  • Discord Engineering, “How Discord Stores Trillions of Messages”, 2023, https://discord.com/blog/how-discord-stores-trillions-of-messages (retrieved 2026-05-01). The detailed account of the ScyllaDB migration, including node counts, the data-migration service, the cutover process, and the cost and latency outcomes.
  • Discord Engineering, “How Discord Stores Billions of Messages”, 2017, https://discord.com/blog/how-discord-stores-billions-of-messages (retrieved 2026-05-01). The earlier post on the MongoDB-to-Cassandra migration, including the failure modes that drove the move and the Cassandra schema design.
  • Apache Cassandra documentation, https://cassandra.apache.org/doc/latest/ (retrieved 2026-05-01). Reference for the data model, compaction strategies, and operational topics that the case study touches on.
  • ScyllaDB documentation, https://docs.scylladb.com/ (retrieved 2026-05-01). Reference for the architecture and the Cassandra-compatible API that made the migration possible.
  • “Designing Data-Intensive Applications” (Martin Kleppmann, O’Reilly, 2017), chapters 5 and 6. The standard reference for replication and partitioning, with the concepts the case study illustrates.
Search