Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Reference_description_with_linked_URLs_________________

Notes_________________________________________________________________


Data Governance Concepts




http://www.as400pro.com/tipListInq.php?cat=iSJava

Some AS/400 data related articles that included Jim Mason and more

techtarget has killed the links

https://www.roseindia.net/

https://www.roseindia.net/jdbc/jdbc-mysql/

roseindia.net


ISO 8000-150 Data Governance Standard



Data Architecture


https://www.slideshare.net/Dataversity/das-slides-enterprise-architecture-vs-data-architecture

Data Architecture vs Enterprise Architecture

https://www.slideshare.net/lmartins_us/enterprise-data-architecture-deliverables

Enterprise Data Architecture Deliverables

https://www.slideshare.net/Dataversity/data-architecture-strategies-building-an-enterprise-data-strategy-where-to-start

Enterprise Data Strategy

https://www.slideshare.net/Dataversity/data-architecture-the-foundation-for-enterprise-architecture-and-governance

Data Architecture - Foundation for Enterprise Architecture



https://www.slideshare.net/Dataversity/data-architecture-strategies-artificial-intelligence-realworld-applications-for-your-organization

Data Architecture Strategies for AI

https://www.slideshare.net/Dataversity/data-architecture-best-practices-for-todays-rapidly-changing-data-landscape

Data Architecture Best Practices

Data-Virtualization-for-Dummies.pdf

Data Virtualization for Dummies

https://www.datacamp.com/community/blog/data-infrastructure-tools

Sample Data Infrastructure - datacamp *

https://www.slideshare.net/Dataversity/data-lake-architecture-modern-strategies-approaches

Data Lake Architecture Strategies

https://www.informatica.com/resources.asset.cd3434c8d2aae44c6071d19d9077ca60.
pdf?mkt_tok=eyJpIjoiTm
1Wak1EWXlNMkl5TURFeCIsInQiOiJlREFlZjlEa0VvK1BsTlViZDJIeHdcL2FZTGhC
Mit4OXVOR0JYNnVWckNZZHU0UHRPZElJaGpMOHh4S09QN1dhdTB2MndZV
m5RWkszNkFBQmUxZkpaOXlSVzhUVXJhSU1GOWVlYkdEMldTOUlDK2JHVXJ
aY0FKeHVyeFJGVHN1WCtcL0RnY1pZemM1S1FZeVpwOWRSb3d2QT09In0%3D

data-lake-concepts-2019-resources.asset.cd3434c8d2aae44c6071d19d9077ca60.pdf

Data Lake Design Principles - Informatica

data-lakes-Six-Guiding-Principles-for-Effective-Data-Lake-Pipelines.pdf


https://dzone.com/articles/four-data-sharding-strategies-for-distributed-sql?edition=568292&utm_source=Daily%20Digest&utm_medium=email&utm_campaign=
Daily%20Digest%202020-01-29

disributed-db-sharding-strategies1.pdf

Data Sharding Strategies compared





snowflake-data-db-cloud-service-2333957-solution-brief-snowflake.pdf






Data Modeling


sustainable data architecture concepts 

basics on data concepts for blockchain *









Data Management


SFTP secure shell vs FTPS encrypted ftp explained.

spiceworks.com-SFTP vs FTPS Understanding the 8 Key Differences.pdf file










Data Governance


https://profisee.com/data-governance-what-why-how-who/

data-governance-profisee.com-Data Governance What Why How Who 15 Best Practices.pdf

Data Governance Concepts & Tools

s Blockchain Data Compliance Services

Data Compliance

Sichern-data-compliance-whitepaper-short-version.201906.docx


DMX - Blockchain and Data Compliance Services.v2.pptx






Data Services Open Solutions


Ubuntu

https://docs.ubuntu.com/

https://help.ubuntu.com/stable/ubuntu-help/

Ubuntu Server docs

https://ubuntu.com/server/docs

Ubuntu Multipass

https://multipass.run/docs
Multipass is a tool to generate cloud-style Ubuntu VMs quickly on Linux, macOS, and Windows.

Docker

https://docs.docker.com/

Run multiple JDKS on MACOS

https://medium.com/@manvendrapsingh/installing-many-jdk-versions-on-macos-dfc177bc8c2b

https://wiki.classe.cornell.edu/Computing/InstallingMultipleVersionsOfJavaOnMac

https://gist.github.com/gramcha/81dcec3f1e4ce8cffd7f248d3e2a42a7

Manually install a JDK version from

Open JDK

https://openjdk.org/

Open JDK 19

https://jdk.java.net/19/

Open JDK 11

https://jdk.java.net/archive/

Apache Tomcat

https://tomcat.apache.org/

https://tomcat.apache.org/tomcat-10.1-doc/index.html

Oracle MySQL

https://dev.mysql.com/doc/

Postgres

https://www.postgresql.org/docs/

https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-US.pdf

PostgresSQL v15 manual pdf. link

CouchDB

https://docs.couchdb.org/en/stable/

SQLite

https://www.sqlite.org/docs.html

Derby Java DB

https://db.apache.org/derby/manuals/

JHipster Lite

https://github.com/jhipster/jhipster-lite

https://hub.docker.com/r/jhipster/jhipster-lite JHipster Lite image

Grails

https://grails.org/documentation.html
https://views.grails.org/latest/

https://cs4760.csl.mtu.edu/2019/assignments/cs4760-assignments/programming-assignments/1-building-your-first-app/

https://groovy-lang.org/documentation.html

Spark

https://github.com/apache/spark

Apache-Spark-Beginners-Guide-2023-Ebook_8-Steps-V2.pdf link

Apache EventMesh

EventMesh is a new generation serverless event middleware for building distributed event-driven applications.

IPFS

https://docs.ipfs.tech/

Kafka

https://kafka.apache.org/documentation/

https://www.confluent.io/resources/online-talk/fundamentals-for-apache-kafka-2-part-series/?utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.kafka_mt.mbm_rgn.namer_lng.eng_dv.all_con.kafka-service&utm_term=%2Bkafka%20%2Bservice&creative=&device=c&placement=&gad=1&gclid=Cj0KCQjw6cKiBhD5ARIsAKXUdyap0EBNsV1F-fyqcrdi927y1qpcJx7mi2YbmuD3LfXkSCoHUURN9PwaAqEHEALw_wcB video

Hyperledger

https://hyperledger-fabric.readthedocs.io/en/release-2.5/

m Messaging

ActiveMQ - topic queues, broadcast models ( pub / sub )

AretemisMQ - docs url

Apache Service Mix - ActiveMQ, Camel, Service mesh ..

https://servicemix.apache.org/docs/7.x/index.html

heavyweight framework

Apache Camel - data source connections

https://camel.apache.org/docs/

JDBC Drivers

https://www.geeksforgeeks.org/jdbc-drivers/

https://www.ibm.com/docs/en/i/7.4?topic=jdbc-types-drivers

https://en.wikipedia.org/wiki/JDBC_driver

BIRT - the original open-source data frames solution for analytics workbooks

https://eclipse.github.io/birt-website/

https://www.eclipse.org/community/eclipse_newsletter/2015/september/article3.php

Grafana - open-source data visualization on many data sources

https://grafana.com/docs/

https://grafana.com/docs/grafana/latest/introduction/

Data Beaver Open DB Client tool

https://dbeaver.io/

Web CMS list from wikipedia

https://en.wikipedia.org/wiki/List_of_content_management_systems

OpenCMS

http://www.opencms.org/en/

https://en.wikipedia.org/wiki/OpenCms

http://www.opencms.org/en/news/230425-opencms-v1500.html

https://documentation.opencms.org/central/

xWiki

https://www.xwiki.org/xwiki/bin/view/Main/WebHome

https://www.xwiki.org/xwiki/bin/view/Documentation/UserGuide/Features/SecondGenerationWiki/

https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/

https://www.xwiki.org/xwiki/bin/view/Documentation/

https://dev.xwiki.org/xwiki/bin/view/Community/SupportStrategy/DatabaseSupportStrategy

https://www.xwiki.org/xwiki/bin/view/Documentation/

https://en.wikipedia.org/wiki/XWiki

JSPWiki

https://jspwiki.apache.org/

https://jspwiki-wiki.apache.org/Wiki.jsp?page=Documentation

https://jspwiki-wiki.apache.org/Wiki.jsp?page=Getting%20Started

https://jspwiki-wiki.apache.org/Wiki.jsp?page=ContributedPlugins#section-ContributedPlugins-ContributedPluginsPriorToV2.9.x

Drupal

https://www.drupal.org/

Eclipse

https://www.eclipse.org/documentation/

VStudio

https://code.visualstudio.com/docs

IntelliJ

https://www.jetbrains.com/help/idea/getting-started.html

Data Services  Commercial Solutions


GCP 


AWS Aurora


AWS RedShift


Azure


Snowflake


Teallium




Cloud data warehouse solutions


https://www.scnsoft.com/analytics/data-warehouse/cloud

cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf file

cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf






Key Concepts

Data Use Cases & Decision Match Data Processing Flows

image-20240823-180636.pngImage Added


Functional Data Layers Architecture

...

However, if Spark is running on YARN with other shared services, performance might degrade and cause RAM overhead memory leaks. For this reason, if a user has a use-case of batch processing, Hadoop has been found to be the more efficient system.  

...

Improve RAJG data virtualization layers with data consumption methods

https://www.linkedin.com/posts/giorgiotorre1234rajkgrover_howdata-manydatamanagement-api-architecture-styles-do-you-banking-activity-70590723883400642567221140607228911617-TM9UhgeR?utm_source=share&utm_medium=member_desktop

Architecture styles define how different add consumption models from sources, lakes

batch, transaction request, events, streams

add column for MDM, governance

Data Services Methods

https://www.linkedin.com/posts/giorgiotorre1234_how-many-api-architecture-styles-do-you-activity-7059072388340064256-TM9U?utm_source=share&utm_medium=member_desktop

Architecture styles define how different components of an application programming interface (API) interact with one another.

...

https://drive.google.com/open?id=1ryNqjIE3LCY6Jxik_4V-urf2aCp9QLr0

Data Architecture vs Enterprise Architecture

https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b


2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf

Guidance and Best Practices
Databricks Assistant Tips and Tricks for Data Engineers
Applying Software Development and DevOps Best Practices to Delta Live Table Pipelines
Unity Catalog Governance in Action: Monitoring, Reporting and Lineage
Scalable Spark Structured Streaming for REST API Destinations
A Data Engineer’s Guide to Optimized Streaming With Protobuf and Delta Live Tables
Design Patterns for Batch Processing in Financial Services
How to Set Up Your First Federated Lakehouse
Orchestrating Data Analytics With Databricks Workflows
Schema Management and Drift Scenarios via Databricks Auto Loader
From Idea to Code: Building With the Databricks SDK for Python

Data Architecture vs Enterprise Architecture

https://drive.google.com/open?id=1O5q-cppIU3UZrFCJyQmHRWzROHTvKl-b


Data Architecture Strategy

...

data-architecture-build-strategy-dataversitydatastrategyburbankfeb2018-180227040559.pdf file

Enterprise vs Embedded Databases

Enterprise Databases > client - server model for mutliple clients and shared data, transactions

can support SQL, NoSQL or both

MySQL, Postgres, Mongo, more

neo4j - graph db

vector dbs - for ai

time series db - influxdb or mysql jem

Embedded Databases > single app to access the database - external options ?? jdbc etc - usually no

can support SQL, NoSQL or both

Sqlite, CouchDb

Sqlite alternatives

jdbc drivers - cdata

couchdb jdbc drive cdata

Data Lake Platform Concepts

khub.Data Services - Candidate Solutions

DWH > Data Lake > Lake House > Data Mesh concepts

...

data-delta-lake-up-&-running_er2.pdf. link. OReilly ebook

...

https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_a_WebSocket_server_in_Java

Writing WebSocket servers](/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers

https://github.com/mdn/content/blob/main/files/en-us/web/api/websockets_api/writing_a_websocket_server_in_java/index.md?plain=1

https://github.com/mdn/content/blob/main/files/en-us/web/api/websockets_api/writing_a_websocket_server_in_java/index.md?plain=1

...

disributed-db-sharding-strategies1.pdf

This article looks at four data sharding strategies for distributed SQL including algorithmic, range, linear, and consistent hash.

Data sharding helps in scalability and geo-distribution by horizontally partitioning data. A SQL table is decomposed into multiple sets of rows according to a specific sharding strategy. Each of these sets of rows is called a shard. These shards are distributed across multiple server nodes (containers, VMs, bare-metal) in a shared-nothing architecture. This ensures that the shards do not get bottlenecked by the compute, storage, and networking resources available at a single node. High availability is achieved by replicating each shard across multiple nodes. However, the application interacts with a SQL table as one logical unit and remains agnostic to the physical placement of the shards. In this section, we will outline the pros, cons, and our practical learnings from the sharding strategies adopted by these databases.


Data Driven Organization Maturity Levels

...