Table of Contents |
---|
...
Key Concepts
Data Use Cases & Decision Match Data Processing Flows
Functional Data Layers Architecture
...
However, if Spark is running on YARN with other shared services, performance might degrade and cause RAM overhead memory leaks. For this reason, if a user has a use-case of batch processing, Hadoop has been found to be the more efficient system.
...
Improve RAJG data virtualization layers with data consumption methods
https://www.linkedin.com/posts/giorgiotorre1234rajkgrover_howdata-manydatamanagement-api-architecture-styles-do-you-banking-activity-70590723883400642567221140607228911617-TM9UhgeR?utm_source=share&utm_medium=member_desktop
Architecture styles define how different add consumption models from sources, lakes
batch, transaction request, events, streams
add column for MDM, governance
Data Services Methods
Architecture styles define how different components of an application programming interface (API) interact with one another.
...
https://drive.google.com/open?id=1ryNqjIE3LCY6Jxik_4V-urf2aCp9QLr0
Data Architecture vs Enterprise Architecture
https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
Guidance and Best Practices
Databricks Assistant Tips and Tricks for Data Engineers
Applying Software Development and DevOps Best Practices to Delta Live Table Pipelines
Unity Catalog Governance in Action: Monitoring, Reporting and Lineage
Scalable Spark Structured Streaming for REST API Destinations
A Data Engineer’s Guide to Optimized Streaming With Protobuf and Delta Live Tables
Design Patterns for Batch Processing in Financial Services
How to Set Up Your First Federated Lakehouse
Orchestrating Data Analytics With Databricks Workflows
Schema Management and Drift Scenarios via Databricks Auto Loader
From Idea to Code: Building With the Databricks SDK for Python
Data Architecture vs Enterprise Architecture
https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b
Data Architecture Strategy
...
data-architecture-build-strategy-dataversitydatastrategyburbankfeb2018-180227040559.pdf file
Enterprise vs Embedded Databases
Enterprise Databases > client - server model for mutliple clients and shared data, transactions
can support SQL, NoSQL or both
MySQL, Postgres, Mongo, more
neo4j - graph db
vector dbs - for ai
time series db - influxdb or mysql jem
Embedded Databases > single app to access the database - external options ?? jdbc etc - usually no
can support SQL, NoSQL or both
Sqlite, CouchDb
jdbc drivers - cdata
Data Lake Platform Concepts
khub.Data Services - Candidate Solutions
DWH > Data Lake > Lake House > Data Mesh concepts
...
data-delta-lake-up-&-running_er2.pdf. link. OReilly ebook
...
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_a_WebSocket_server_in_Java
Writing WebSocket servers](/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers
...
disributed-db-sharding-strategies1.pdf
This article looks at four data sharding strategies for distributed SQL including algorithmic, range, linear, and consistent hash.
Data sharding helps in scalability and geo-distribution by horizontally partitioning data. A SQL table is decomposed into multiple sets of rows according to a specific sharding strategy. Each of these sets of rows is called a shard. These shards are distributed across multiple server nodes (containers, VMs, bare-metal) in a shared-nothing architecture. This ensures that the shards do not get bottlenecked by the compute, storage, and networking resources available at a single node. High availability is achieved by replicating each shard across multiple nodes. However, the application interacts with a SQL table as one logical unit and remains agnostic to the physical placement of the shards. In this section, we will outline the pros, cons, and our practical learnings from the sharding strategies adopted by these databases.
Data Driven Organization Maturity Levels
...