Key Points
References
Reference_description_with_linked_URLs_______________________ | Notes______________________________________________________________ |
---|---|
gq>> Uber system design - incentives, logistics | |
Key Concepts
Uber elevate - Uber flight concepts white paper
Uber real time data infrastructure article
Uber-paper_ Real-time Data Infrastructure at Uber – Distributed Computing Musings.pdf. link
Uber-paper_ Real-time Data Infrastructure at Uber – Distributed Computing Musings.pdf. file
Uber might seem simple in the first look but does a great job of hiding complexity in order to provide a great user experience. Achieving this requires processing huge chunks of data in real-time and making decisions based on this data. Also time is of essence while making these decisions as they impact the customers who is using the application at that very moment. Use cases such as fraud detection, calculating surge pricing etc require processing petabytes of data in a scalable format. In addition to doing the processing, the system needs to be extensible to accommodate use cases in future
This data comprises of both client side events and system logs from microservices operating within Uber application. The real time data generation also comes from the change-log of production databases where live transactions are getting processed. Processing is performed on this data in order to cover large set of use which can be covered on a high level in these three categories.
- Messaging platform
- Stream processing
- Online analytical processing
Data challenges
- FACTUR3DT.IO for big data, multiple sources, formats,
- Multiple use cases - detail data models for ML/ AI, event streams, smart state change management, version management of schema, function changes, smart queries, extensible models
- Logical models and flows can simplify complex analytics to simple queries
- Need to version manage schemas, functions with data changes
Logical data infrastructure model ( similar my AI, operations analytics model )
Data management requirements - serves multiple use cases
- Standard FACTUR3DT.io capabilities for each use case
- Data source discovery, mapping, integration, services definition, normalization, events managementm, data age
- Performant, Available ( memory > binary log > table spaces ), ( transaction life cycle - difference between commit and finalization points )
- Sharding for performance, security
- Encrypted: in-flight, at-rest, in-process ( enclaves )
- logical service model to simplify data consumption use cases
- low-cost, adaptable, extensible infrastructure
Data Layers Abstracted
User Infrastructure Data Tools integrate data from multiple sources and process by use case priority
Kafka key for integrating data processing and events as transaction messages
At Uber, Kafka is responsible for transferring streaming data to both batch and realtime processing systems. The use cases can range from sending events from driver/rider apps to the underlying analytics platform to streaming database change-logs to subscribers performing computation based on these events.
Potential Value Opportunities
Potential Challenges
Candidate Solutions
Step-by-step guide for Example
sample code block