Table of Contents

Reference_description_with_linked_URLs_________________	Notes_________________________________________________________________
Data Governance Concepts

http://www.as400pro.com/tipListInq.php?cat=iSJava	Some AS/400 data related articles that included Jim Mason and more techtarget has killed the links
https://www.roseindia.net/ https://www.roseindia.net/jdbc/jdbc-mysql/	roseindia.net
ISO 8000-150 Data Governance Standard

Data Architecture
https://www.slideshare.net/Dataversity/das-slides-enterprise-architecture-vs-data-architecture	Data Architecture vs Enterprise Architecture
https://www.slideshare.net/lmartins_us/enterprise-data-architecture-deliverables	Enterprise Data Architecture Deliverables
https://www.slideshare.net/Dataversity/data-architecture-strategies-building-an-enterprise-data-strategy-where-to-start	Enterprise Data Strategy
https://www.slideshare.net/Dataversity/data-architecture-the-foundation-for-enterprise-architecture-and-governance	Data Architecture - Foundation for Enterprise Architecture

https://www.slideshare.net/Dataversity/data-architecture-strategies-artificial-intelligence-realworld-applications-for-your-organization	Data Architecture Strategies for AI
https://www.slideshare.net/Dataversity/data-architecture-best-practices-for-todays-rapidly-changing-data-landscape	Data Architecture Best Practices
Data-Virtualization-for-Dummies.pdf	Data Virtualization for Dummies
https://www.datacamp.com/community/blog/data-infrastructure-tools	Sample Data Infrastructure - datacamp *
https://www.slideshare.net/Dataversity/data-lake-architecture-modern-strategies-approaches	Data Lake Architecture Strategies
https://www.informatica.com/resources.asset.cd3434c8d2aae44c6071d19d9077ca60. pdf?mkt_tok=eyJpIjoiTm 1Wak1EWXlNMkl5TURFeCIsInQiOiJlREFlZjlEa0VvK1BsTlViZDJIeHdcL2FZTGhC Mit4OXVOR0JYNnVWckNZZHU0UHRPZElJaGpMOHh4S09QN1dhdTB2MndZV m5RWkszNkFBQmUxZkpaOXlSVzhUVXJhSU1GOWVlYkdEMldTOUlDK2JHVXJ aY0FKeHVyeFJGVHN1WCtcL0RnY1pZemM1S1FZeVpwOWRSb3d2QT09In0%3D data-lake-concepts-2019-resources.asset.cd3434c8d2aae44c6071d19d9077ca60.pdf	Data Lake Design Principles - Informatica
data-lakes-Six-Guiding-Principles-for-Effective-Data-Lake-Pipelines.pdf
https://dzone.com/articles/four-data-sharding-strategies-for-distributed-sql?edition=568292&utm_source=Daily%20Digest&utm_medium=email&utm_campaign= Daily%20Digest%202020-01-29 disributed-db-sharding-strategies1.pdf	Data Sharding Strategies compared


snowflake-data-db-cloud-service-2333957-solution-brief-snowflake.pdf


Data Modeling
sustainable data architecture concepts	basics on data concepts for blockchain *




Data Management
SFTP secure shell vs FTPS encrypted ftp explained. spiceworks.com-SFTP vs FTPS Understanding the 8 Key Differences.pdf file




Data Governance
https://profisee.com/data-governance-what-why-how-who/ data-governance-profisee.com-Data Governance What Why How Who 15 Best Practices.pdf	Data Governance Concepts & Tools
s Blockchain Data Compliance Services	Data Compliance
Sichern-data-compliance-whitepaper-short-version.201906.docx
DMX - Blockchain and Data Compliance Services.v2.pptx


Data Services Open Solutions
Ubuntu	https://docs.ubuntu.com/ https://help.ubuntu.com/stable/ubuntu-help/
Ubuntu Server docs	https://ubuntu.com/server/docs
Ubuntu Multipass	https://multipass.run/docs Multipass is a tool to generate cloud-style Ubuntu VMs quickly on Linux, macOS, and Windows.
Docker	https://docs.docker.com/


Run multiple JDKS on MACOS	https://medium.com/@manvendrapsingh/installing-many-jdk-versions-on-macos-dfc177bc8c2b https://wiki.classe.cornell.edu/Computing/InstallingMultipleVersionsOfJavaOnMac https://gist.github.com/gramcha/81dcec3f1e4ce8cffd7f248d3e2a42a7 Manually install a JDK version from
Open JDK	https://openjdk.org/
Open JDK 19	https://jdk.java.net/19/
Open JDK 11	https://jdk.java.net/archive/
Apache Tomcat	https://tomcat.apache.org/ https://tomcat.apache.org/tomcat-10.1-doc/index.html
Oracle MySQL	https://dev.mysql.com/doc/
Postgres	https://www.postgresql.org/docs/ https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-US.pdf PostgresSQL v15 manual pdf. link
CouchDB	https://docs.couchdb.org/en/stable/
SQLite	https://www.sqlite.org/docs.html
Derby Java DB	https://db.apache.org/derby/manuals/
JHipster Lite	https://github.com/jhipster/jhipster-lite https://hub.docker.com/r/jhipster/jhipster-lite JHipster Lite image
Grails	https://grails.org/documentation.html https://views.grails.org/latest/ https://cs4760.csl.mtu.edu/2019/assignments/cs4760-assignments/programming-assignments/1-building-your-first-app/ https://groovy-lang.org/documentation.html
Spark	https://github.com/apache/spark Apache-Spark-Beginners-Guide-2023-Ebook_8-Steps-V2.pdf link
Apache EventMesh	EventMesh is a new generation serverless event middleware for building distributed event-driven applications.
IPFS	https://docs.ipfs.tech/
Kafka	https://kafka.apache.org/documentation/ https://www.confluent.io/resources/online-talk/fundamentals-for-apache-kafka-2-part-series/?utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.kafka_mt.mbm_rgn.namer_lng.eng_dv.all_con.kafka-service&utm_term=%2Bkafka%20%2Bservice&creative=&device=c&placement=&gad=1&gclid=Cj0KCQjw6cKiBhD5ARIsAKXUdyap0EBNsV1F-fyqcrdi927y1qpcJx7mi2YbmuD3LfXkSCoHUURN9PwaAqEHEALw_wcB video
Hyperledger	https://hyperledger-fabric.readthedocs.io/en/release-2.5/
m Messaging	ActiveMQ - topic queues, broadcast models ( pub / sub ) AretemisMQ - docs url
Apache Service Mix - ActiveMQ, Camel, Service mesh ..	https://servicemix.apache.org/docs/7.x/index.html heavyweight framework
Apache Camel - data source connections	https://camel.apache.org/docs/
JDBC Drivers	https://www.geeksforgeeks.org/jdbc-drivers/ https://www.ibm.com/docs/en/i/7.4?topic=jdbc-types-drivers https://en.wikipedia.org/wiki/JDBC_driver
BIRT - the original open-source data frames solution for analytics workbooks	https://eclipse.github.io/birt-website/ https://www.eclipse.org/community/eclipse_newsletter/2015/september/article3.php
Grafana - open-source data visualization on many data sources	https://grafana.com/docs/ https://grafana.com/docs/grafana/latest/introduction/
Data Beaver Open DB Client tool	https://dbeaver.io/
Web CMS list from wikipedia	https://en.wikipedia.org/wiki/List_of_content_management_systems
OpenCMS	http://www.opencms.org/en/ https://en.wikipedia.org/wiki/OpenCms http://www.opencms.org/en/news/230425-opencms-v1500.html https://documentation.opencms.org/central/
xWiki	https://www.xwiki.org/xwiki/bin/view/Main/WebHome https://www.xwiki.org/xwiki/bin/view/Documentation/UserGuide/Features/SecondGenerationWiki/ https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/ https://www.xwiki.org/xwiki/bin/view/Documentation/ https://dev.xwiki.org/xwiki/bin/view/Community/SupportStrategy/DatabaseSupportStrategy https://www.xwiki.org/xwiki/bin/view/Documentation/ https://en.wikipedia.org/wiki/XWiki
JSPWiki	https://jspwiki.apache.org/ https://jspwiki-wiki.apache.org/Wiki.jsp?page=Documentation https://jspwiki-wiki.apache.org/Wiki.jsp?page=Getting%20Started https://jspwiki-wiki.apache.org/Wiki.jsp?page=ContributedPlugins#section-ContributedPlugins-ContributedPluginsPriorToV2.9.x
Drupal	https://www.drupal.org/
Eclipse	https://www.eclipse.org/documentation/
VStudio	https://code.visualstudio.com/docs
IntelliJ	https://www.jetbrains.com/help/idea/getting-started.html



Data Services Commercial Solutions
GCP
AWS Aurora
AWS RedShift
Azure
Snowflake
Teallium

Cloud data warehouse solutions
https://www.scnsoft.com/analytics/data-warehouse/cloud cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf file	cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf

Key Concepts

Data Use Cases & Decision Match Data Processing Flows

Image Added

Functional Data Layers Architecture

...

However, if Spark is running on YARN with other shared services, performance might degrade and cause RAM overhead memory leaks. For this reason, if a user has a use-case of batch processing, Hadoop has been found to be the more efficient system.

...

Improve RAJG data virtualization layers with data consumption methods

https://www.linkedin.com/posts/giorgiotorre1234rajkgrover_howdata-manydatamanagement-api-architecture-styles-do-you-banking-activity-70590723883400642567221140607228911617-TM9UhgeR?utm_source=share&utm_medium=member_desktop

Architecture styles define how different add consumption models from sources, lakes

batch, transaction request, events, streams

add column for MDM, governance

Data Services Methods

https://www.linkedin.com/posts/giorgiotorre1234_how-many-api-architecture-styles-do-you-activity-7059072388340064256-TM9U?utm_source=share&utm_medium=member_desktop

Architecture styles define how different components of an application programming interface (API) interact with one another.

...

https://drive.google.com/open?id=1ryNqjIE3LCY6Jxik_4V-urf2aCp9QLr0

Data Architecture vs Enterprise Architecture

https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b

2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf

Guidance and Best Practices
Databricks Assistant Tips and Tricks for Data Engineers
Applying Software Development and DevOps Best Practices to Delta Live Table Pipelines
Unity Catalog Governance in Action: Monitoring, Reporting and Lineage
Scalable Spark Structured Streaming for REST API Destinations
A Data Engineer’s Guide to Optimized Streaming With Protobuf and Delta Live Tables
Design Patterns for Batch Processing in Financial Services
How to Set Up Your First Federated Lakehouse
Orchestrating Data Analytics With Databricks Workflows
Schema Management and Drift Scenarios via Databricks Auto Loader
From Idea to Code: Building With the Databricks SDK for Python

Data Architecture vs Enterprise Architecture

https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b

Data Architecture Strategy

...

data-architecture-build-strategy-dataversitydatastrategyburbankfeb2018-180227040559.pdf file

Enterprise vs Embedded Databases

Enterprise Databases > client - server model for mutliple clients and shared data, transactions

can support SQL, NoSQL or both

MySQL, Postgres, Mongo, more

neo4j - graph db

vector dbs - for ai

time series db - influxdb or mysql jem

Embedded Databases > single app to access the database - external options ?? jdbc etc - usually no

can support SQL, NoSQL or both

Sqlite, CouchDb

Sqlite alternatives

jdbc drivers - cdata

couchdb jdbc drive cdata

Data Lake Platform Concepts

khub.Data Services - Candidate Solutions

DWH > Data Lake > Lake House > Data Mesh concepts

...

data-delta-lake-up-&-running_er2.pdf. link. OReilly ebook

...

https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_a_WebSocket_server_in_Java

Writing WebSocket servers](/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers

https://github.com/mdn/content/blob/main/files/en-us/web/api/websockets_api/writing_a_websocket_server_in_java/index.md?plain=1

...

disributed-db-sharding-strategies1.pdf

This article looks at four data sharding strategies for distributed SQL including algorithmic, range, linear, and consistent hash.

Data sharding helps in scalability and geo-distribution by horizontally partitioning data. A SQL table is decomposed into multiple sets of rows according to a specific sharding strategy. Each of these sets of rows is called a shard. These shards are distributed across multiple server nodes (containers, VMs, bare-metal) in a shared-nothing architecture. This ensures that the shards do not get bottlenecked by the compute, storage, and networking resources available at a single node. High availability is achieved by replicating each shard across multiple nodes. However, the application interacts with a SQL table as one logical unit and remains agnostic to the physical placement of the shards. In this section, we will outline the pros, cons, and our practical learnings from the sharding strategies adopted by these databases.

Data Driven Organization Maturity Levels

...

Versions Compared

Old Version 71

New Version Current

Key

Data Governance Concepts

Data Services Open Solutions

Data Services Commercial Solutions

Cloud data warehouse solutions

Key Concepts

Data Use Cases & Decision Match Data Processing Flows

Functional Data Layers Architecture

Improve RAJG data virtualization layers with data consumption methods

Data Services Methods

Data Architecture vs Enterprise Architecture

2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf

Data Architecture vs Enterprise Architecture

Data Architecture Strategy

Enterprise vs Embedded Databases

Enterprise Databases > client - server model for mutliple clients and shared data, transactions

Embedded Databases > single app to access the database - external options ?? jdbc etc - usually no

Data Lake Platform Concepts

khub.Data Services - Candidate Solutions

DWH > Data Lake > Lake House > Data Mesh concepts

data-delta-lake-up-&-running_er2.pdf. link. OReilly ebook

Data Driven Organization Maturity Levels

Page Comparison

Versions Compared

Old Version 71

New Version Current

Key

Data Governance Concepts

Data Services Open Solutions

Data Services Commercial Solutions

Cloud data warehouse solutions

Key Concepts

Data Use Cases & Decision Match Data Processing Flows

Functional Data Layers Architecture

Improve RAJG data virtualization layers with data consumption methods

Data Services Methods

Data Architecture vs Enterprise Architecture

2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf

Data Architecture vs Enterprise Architecture

Data Architecture Strategy

Enterprise vs Embedded Databases

Enterprise Databases > client - server model for mutliple clients and shared data, transactions

Embedded Databases > single app to access the database - external options ?? jdbc etc - usually no

Data Lake Platform Concepts

khub.Data Services - Candidate Solutions

DWH > Data Lake > Lake House > Data Mesh concepts

data-delta-lake-up-&-running_er2.pdf. link. OReilly ebook

Data Driven Organization Maturity Levels