m Apache Beam

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.

Key Points

  1. Data pipelines connect, transform data sources to data targets in batches or event streams
  2. Beam provides high-level, portable data pipeline processing model over data services runtimes
  3. Beam input data events can come from Spark, Flink, Kafka and more that arrive over time
  4. Beam processing done via SDK: SQL, Java, Python, Go and ??
  5. Beam can perform many transformations from existing libraries and new application logic
  6. Implement batch and streaming data processing jobs that run on any execution engine.


References

Reference_description_with_linked_URLs_______________________Notes______________________________________________________________
youtube Apache Beam overview 2019Beam overview 2019
https://github.com/apache/beamBeam github
https://beam.apache.org/documentation/Beam docs
https://beam.apache.org/documentation/runtime/model/Beam execution model
https://beam.apache.org/documentation/resources/learning-resources/Beam learning resources - interactive examples ***
https://beam.apache.org/documentation/programming-guide/Beam Programming Guide

https://projects.apache.org/projects.html?category

Apache Big Data projects
https://www.educba.com/my-courses/dashboard/Data engineering education site:  educfba
https://beam.apache.org/get-started/quickstart-java/Java quickstart
https://beam.apache.org/documentation/dsls/sql/walkthrough/
https://beam.apache.org/get-started/quickstart-py/Python quickstart









Key Concepts



Apache Beam Overview 2

youtube Apache Beam overview 2019


Big Data - variety, volume, velocity, variance


Which data framework to use?


Beam Vision


Beam processing details

Parallel Do functions

Per Key aggregations

event time windowing output

Roadmap

Where we are in March 2019

Links


Apache Beam - Java Quickstart

https://beam.apache.org/get-started/quickstart-java/


Apache Beam - Python Quickstart

https://beam.apache.org/get-started/quickstart-py/







Potential Value Opportunities



Potential Challenges



Candidate Solutions



Step-by-step guide for Example



sample code block

sample code block
 



Recommended Next Steps