THE FUTURE IS HERE

UBER : Big Data Infrastructure and Machine Learning Platform

Talk 1: Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
This talk will reflect on the challenges faced with scaling Uber’s Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing our hardware. We will provide a behind-the-scenes look at the current data technology landscape at Uber, including various open-source technologies (e.g. Hadoop, Spark, Hive, Presto, Kafka, Avro) as well as open-sourced in-house-built solutions such as Hudi, Marmaray, etc. We’ll dive into the technical aspects of how our ingestion platform was re-architected to bring in 10+ trillion events/day, with 100+ TB new data/day, at minute-level latency, how our storage platform was scaled to reliably store 100+ PB of data in the data lake, and our processing platform was designed to efficiently serve millions of queries and jobs/day while processing 1+ PB per day. You’ll leave the talk with greater insight into how data truly powers each and every Uber experience and will be inspired to re-envision your own data platform to be more extensible and scalable.

Speaker : Reza Shiftehfar (Uber)
Reza Shiftehfar currently leads Uber’s Hadoop Platform team. His team helps build and grow Uber’s reliable and scalable Big Data platform that serves petabytes of data utilizing technologies such as Apache Hadoop, Apache Hive, Apache Kafka, Apache Spark, and Presto. Reza is one of the founding engineers of Uber’s data team and helped scale Uber’s data platform from a few terabytes to over 100 petabytes while reducing data latency from 24+ hours to minutes. Reza holds a Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign.

Talk2 : Michelangelo PyML – Uber’s Platform for Rapid Python ML Model Development

Uber aims to leverage machine learning (ML) in product development and the day-to-day management of our business. In pursuit of this goal, hundreds of data scientists, engineers, product managers, and researchers work on ML solutions across the company. This talk will cover a brief history of Uber’s machine learning platform – Michelangelo. We will take a closer look into a model life-cycle of prototyping, validation, and productionization and the importance of frictionless experience at each stage of this process. And finally, we will focus on PyML – a new extension of Michelangelo that enables faster Python ML model development and seamless integration with Uber’s production infrastructure.

Speaker: Stepan Bedratiuk (Uber)
Stepan Bedratiuk is a lead engineer on Michelangelo’s PyML team. His work focused on scaling model deployment pipelines and model serving services. Prior to ML platform team, Stepan worked on Uber’s data platform team and helped to unify and scale the data access layer. Stepan holds B.S. and M.S. in Applied Mathematics from the Taras Shevchenko National University of Kyiv, Ukraine.