How Discord Manages Trillions of Messages in 2024

BY Mark Howell 29 September 20245 MINS READ
article cover

Today in Edworking News we want to talk about In 2017, they wrote a blog post on how they store billions of messages. they shared the journey of how we started out using MongoDB but migrated our data to Cassandra because we were looking for a database that was scalable, fault-tolerant, and relatively low maintenance. They knew we’d be growing, and they did! They wanted a database that grew alongside us, but hopefully, its maintenance needs wouldn’t grow alongside our storage needs. Unfortunately, they found that to not be the case — our Cassandra cluster exhibited serious performance issues that required increasing amounts of effort to just maintain, not improve. Almost six years later, they're changed a lot, and how they store messages has changed as well.

Our Cassandra Troubles

They stored our messages in a database called cassandra-messages. As its name suggests, it ran Cassandra, and it stored messages. In 2017, they ran 12 Cassandra nodes, storing billions of messages. By the beginning of 2022, it had 177 nodes with trillions of messages. Despite the growth, the system was high-toil — our on-call team was frequently paged for issues with the database, latency was unpredictable, and maintenance operations were becoming too expensive to run.
Image Description: A visual representation of a Cassandra cluster with nodes and data flow.

### Changing Our Architecture
Our messages cluster wasn’t our only Cassandra database. They had several other clusters, each exhibiting similar faults. We were intrigued by ScyllaDB, a Cassandra-compatible database written in C++. Its promise of better performance, faster repairs, and stronger workload isolation via its shard-per-core architecture was appealing. By 2020, they had migrated every database but one to ScyllaDB. The last one? Our friend, cassandra-messages.

Data Services

With Cassandra, they struggled with hot partitions. High traffic to a given partition resulted in unbounded concurrency, leading to cascading latency. To control the amount of concurrent traffic to hot partitions, they wrote intermediary services called data services that sit between our API monolith and our database clusters. They chose Rust for writing our data services due to its fast speeds and safety features. Rust’s libraries were a great match for our needs, and its emphasis on safety made it easy to write safe, concurrent code.
Image Description: A diagram showing the flow of data services between API and database clusters.

### A Very Big Migration
Our migration requirements were straightforward: migrate trillions of messages with no downtime, and do it quickly. They provisioned a new ScyllaDB cluster using our super-disk storage topology. Initially, our migration plan was to dual-write new data to Cassandra and ScyllaDB and migrate historical data behind it. However, they decided to rewrite the data migrator in Rust, which significantly sped up the migration process. We managed to migrate data at speeds of up to 3.2 million messages per second.
Image Description: A graph showing the migration speed and completion status.

### Several Months Later…
They switched our messages database over in May 2022, and it’s been a quiet, well-behaved database since then. We reduced our nodes from 177 Cassandra nodes to just 72 ScyllaDB nodes. Our tail latencies have also improved drastically. For example, fetching historical messages had a p99 of between 40-125ms on Cassandra, with ScyllaDB having a steady 15ms p99 latency. This performance improvement has unlocked new product use cases for us.
At the end of 2022, during the World Cup, they observed that goals scored showed up in our monitoring graphs. This was a testament to our system's robustness, as it handled the increased traffic without breaking a sweat.

Remember these 3 key ideas for your startup:

  1. Scalability and Maintenance: When choosing a database, ensure it can scale with your growth while keeping maintenance needs manageable. Migrating to a more efficient database like ScyllaDB can significantly reduce operational toil and improve performance. For more insights, check out how to determine realistic goals for a project.

  2. Data Services and Concurrency: Implementing intermediary data services can help manage high traffic and reduce database load. Using a language like Rust can provide the speed and safety needed for such tasks. Learn more about task automation and why you should use it.

  3. Efficient Migration: A well-planned migration strategy can save time and resources. Rewriting critical components in a performant language can drastically speed up the process, as seen with our Rust-based data migrator. Discover the best productivity hacks to get your work done.


Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
I


For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
CoverMastering Tokenization: Key to Successful AI ApplicationsTokenization is crucial in NLP for AI apps, influencing data processing. Understanding tokenizers enhances AI performance, ensuring meaningful interactions and minimizing Garbage In, Garbage Out issues.
BY Mark Howell 23 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now