publish date
Apr 11, 2023
duration
27
min
Difficulty
Case details
Odin is Uber's stateful platform to deploy and manage all stateful workloads such as MySQL, Schemaless, Redis, Zookeeper, Kafka, and HDFS globally. The platform manages ~100,000 hosts with exabytes of storage across multiple geographical regions with availability zones in Uber's own datacenters, AWS, and GCP as a single cluster. Databases are dockerized and co-located on hosts with intelligent placement to optimize utilization and failure domain anti-affinity to maximize efficiency and reliability. Our mission is to run all of Uber's storage solutions at scale, with high availability, low cost, and a high level of automation. All changes are automated (e.g., doing kernel upgrades, handling host failures, or expanding storage clusters). The presentation will lay out the principles on which we built the platform and touch on the challenges we faced along the way.
Share case: