Evaluating Distributed Databases for Large-Scale Social Network Applications: TAO, TAO Bench, and TiDB

Title: Evaluating Distributed Databases for Large-Scale Social Network Applications: TAO, TAO Bench, and TiDB

Time: September 28, 6 PM Pacific Daylight Time. Doors will open at 5:30 PM.

Location: Sky Computing Lab, EECS Science Division, Soda Hall, UC Berkeley (435 Soda Hall, Berkeley, CA 94720). There should be street parking near Soda Hall. Enter the building using the Le Roy Avenue entrance. The lab is on the 4th floor. We’ll have someone to let people in at the building and the lab.


In this meetup, Aaron will first describe the evolution of TAO, the social graph data store at Meta. This system serves billions of requests per second with high availability to support Meta’s family of applications. Though TAO was designed to be eventually consistent, it has since added stronger guarantees, including read-your-write consistency and transactions, to support evolving use and application needs.

Then, Audrey will give an overview of TAOBench, a benchmark for large-scale social networks. It fills a gap in representative workloads by providing workloads based on Meta’s request patterns. Audrey will also introduce interesting features observed in production workloads that the benchmark captures. She will also discuss the impact of TAOBench both at Meta and on various databases, including TiDB.

Finally, Yang will explain the reasons behind TiDB’s out performance in TAOBench, as well as in hundreds of production environments. He will discuss how TiDB overcomes issues, such as operation skew, to provide a great and predictable performance through load balancing, write flow control, and background activities throttling.


Aaron Kabcenell, PhD
Research Scientist, Meta

Aaron Kabcenell is a Research Scientist in the Systems and Infra organization at Meta. He works on distributed consistency, providing transactions and strengthening consistency guarantees for the social graph. Prior to joining Meta, he received his PhD from Harvard University, designing novel hybrid quantum systems with diamond color centers and mechanical resonators. He also worked with the Harvard Data Systems Lab on designing cost-optimized NoSQL storage solutions.

Audrey Cheng
PhD Student, University of California, Berkeley

Audrey is a third-year PhD student at UC Berkeley in the Sky Computing Lab. She is advised by Ion Stoica and Natacha Crooks. Her research focuses on transaction processing for database systems, and, in particular, the challenges of providing stronger safety and correctness guarantees at large scale. She is supported in part by a National Science Foundation Graduate Research Fellowship, a Meta PhD Research Fellowship, and a Berkeley Chancellor’s Fellowship.

Yang Zhang
Software Engineer, PingCAP

Yang Zhang is a software engineer on the TiKV storage team. He mainly focuses on performance and resource efficiency and has also worked on tools to recover clusters from massive node failure. Prior to PingCAP, he worked at Google Spanner on stability and privacy issues.