m Apache Beam
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.
Key Points
- Data pipelines connect, transform data sources to data targets in batches or event streams
- Beam provides high-level, portable data pipeline processing model over data services runtimes
- Beam input data events can come from Spark, Flink, Kafka and more that arrive over time
- Beam processing done via SDK: SQL, Java, Python, Go and ??
- Beam can perform many transformations from existing libraries and new application logic
- Implement batch and streaming data processing jobs that run on any execution engine.
References
Reference_description_with_linked_URLs_______________________ | Notes______________________________________________________________ |
---|---|
youtube Apache Beam overview 2019 | Beam overview 2019 |
https://github.com/apache/beam | Beam github |
https://beam.apache.org/documentation/ | Beam docs |
https://beam.apache.org/documentation/runtime/model/ | Beam execution model |
https://beam.apache.org/documentation/resources/learning-resources/ | Beam learning resources - interactive examples *** |
https://beam.apache.org/documentation/programming-guide/ | Beam Programming Guide |
Apache Big Data projects | |
https://www.educba.com/my-courses/dashboard/ | Data engineering education site: educfba |
https://beam.apache.org/get-started/quickstart-java/ | Java quickstart |
https://beam.apache.org/documentation/dsls/sql/walkthrough/ | |
https://beam.apache.org/get-started/quickstart-py/ | Python quickstart |
Key Concepts
Apache Beam Overview 2
youtube Apache Beam overview 2019
Big Data - variety, volume, velocity, variance
Which data framework to use?
Beam Vision
Beam processing details
Parallel Do functions
Per Key aggregations
event time windowing output
Roadmap
Where we are in March 2019
Links
Apache Beam - Java Quickstart
https://beam.apache.org/get-started/quickstart-java/
Apache Beam - Python Quickstart
https://beam.apache.org/get-started/quickstart-py/
Potential Value Opportunities
Potential Challenges
Candidate Solutions
Step-by-step guide for Example
sample code block