Key Points
- Data pipelines connect, transform data sources to data targets in batches or event streams
- Beam provides high-level, portable data pipeline processing model over data services runtimes
- Beam input data events can come from Spark, Flink, Kafka and more that arrive over time
- Beam processing done via SDK: SQL, Java, Python, Go and ??
- Beam can perform many transformations from existing libraries and new application logic
References
Reference_description_with_linked_URLs_______________________ | Notes______________________________________________________________ |
---|---|
Apache Big Data projects | |
https://www.educba.com/my-courses/dashboard/ | Data engineering education site: educfba |
Key Concepts
Apache Beam Overview 2
youtube Apache Beam overview 2019
Big Data - variety, volume, velocity, variance
Which data framework to use?
Beam Vision
Beam processing details
Parallel Do functions
Per Key aggregations
event time windowing output
Roadmap
Where we are in March 2019
Links
Apache Beam - Java Quickstart
https://beam.apache.org/get-started/quickstart-java/
Potential Value Opportunities
Potential Challenges
Candidate Solutions
Step-by-step guide for Example
sample code block