- 1.1 Key Points
- 1.2 References
  - 1.2.1 Data Use Cases
  - 1.2.2 Data Governance Concepts
  - 1.2.3 Data Architecture
  - 1.2.4 Data Modeling
  - 1.2.5 Data Management
  - 1.2.6 Data Governance
  - 1.2.7 Data Services Open Solutions
    - 1.2.7.1 Data Services Commercial Solutions
    - 1.2.7.2 Cloud data warehouse solutions
2 Key Concepts
3 Potential Value Opportunities
4 Potential Challenges
5 Candidate Solutions
- - 5.1.1 Sample Data Services Pipeline Solution - Teallium
- 5.2 SFTP secure shell vs FTPS encrypted ftp explained
- 5.3 Step-by-step guide for Example
  - 5.3.1 sample code block
- 5.4 Recommended Next Steps
- 5.5 Related articles

Key Points

foundation for enterprise architecture, solutions
driven by business use cases
covers variety of use cases: support, self service, enterprise services, integration ...

References

Reference_description_with_linked_URLs_________________	Notes_________________________________________________________________

Reference_description_with_linked_URLs_________________	Notes_________________________________________________________________
Data Use Cases


Data Governance Concepts

http://www.as400pro.com/tipListInq.php?cat=iSJava	Some AS/400 data related articles that included Jim Mason and more techtarget has killed the links
https://www.roseindia.net/ https://www.roseindia.net/jdbc/jdbc-mysql/	roseindia.net
ISO 8000-150 Data Governance Standard

Data Architecture
Real-Time Data Architecture Patterns.pdf. GD	EEP, 7 Vs, NFRs, Patterns: Search, Directories ( Virtual, Logical, Physical ), Registries, Streams, Events, Alerts, Transaction LC, Layers, Services, SCRUD, Async, Messages, Promises, Cache, Age, Stacks, Queues, Pub/Sub, Brokers, Handlers, Validation LC, Trust LC, Processes, Decisions, RDS, ODS, DWH, Batch / Real-time, Hashes, Proofs, CDC, Ledgers, Journals, Blocks, Controllers, Observers, Idempotency, Rollback, Replay, Versioning, Interfaces, Delegates, Adapters, Routers, Decorators, Models ( Physical, Logical) , MetaModels, MDM, Observability, BCP, Aggregates ( Virtual, Persistent ), Factories, Pipelines, Transformers, DSLs, Flow - ( Ingestiion, Transform, Validation, Normalization, Edit, Process, Post, Store, Search, Retrieve, Transform, Present ), Distributed, Replicated, Decentralized ( Commit, Finalization ), DRDA, GIGO2, Secrets, Rollups, SOLID, DATES2, Age, Level Checks, Versions, Mutations, Mappings, Liquibase, Attributes ( fixed, managed, unmanaged ), Data Types, Functions, Procs, RPC, GRPC, Concurrency, Semaphores, Switches, Tokens, Credentials, Authorizations, IAM, Encryption, SDC, UTXO vs Lots, HTLC vs Escrow Transactions, Classes, MetaClasses,
https://www.slideshare.net/Dataversity/das-slides-enterprise-architecture-vs-data-architecture	Data Architecture vs Enterprise Architecture
https://www.slideshare.net/lmartins_us/enterprise-data-architecture-deliverables	Enterprise Data Architecture Deliverables
https://www.slideshare.net/Dataversity/data-architecture-strategies-building-an-enterprise-data-strategy-where-to-start	Enterprise Data Strategy
https://www.slideshare.net/Dataversity/data-architecture-the-foundation-for-enterprise-architecture-and-governance	Data Architecture - Foundation for Enterprise Architecture

https://www.slideshare.net/Dataversity/data-architecture-strategies-artificial-intelligence-realworld-applications-for-your-organization	Data Architecture Strategies for AI
https://www.slideshare.net/Dataversity/data-architecture-best-practices-for-todays-rapidly-changing-data-landscape	Data Architecture Best Practices
Data-Virtualization-for-Dummies.pdf	Data Virtualization for Dummies
https://www.datacamp.com/community/blog/data-infrastructure-tools	Sample Data Infrastructure - datacamp *
https://www.slideshare.net/Dataversity/data-lake-architecture-modern-strategies-approaches	Data Lake Architecture Strategies
https://www.informatica.com/resources.asset.cd3434c8d2aae44c6071d19d9077ca60. pdf?mkt_tok=eyJpIjoiTm 1Wak1EWXlNMkl5TURFeCIsInQiOiJlREFlZjlEa0VvK1BsTlViZDJIeHdcL2FZTGhC Mit4OXVOR0JYNnVWckNZZHU0UHRPZElJaGpMOHh4S09QN1dhdTB2MndZV m5RWkszNkFBQmUxZkpaOXlSVzhUVXJhSU1GOWVlYkdEMldTOUlDK2JHVXJ aY0FKeHVyeFJGVHN1WCtcL0RnY1pZemM1S1FZeVpwOWRSb3d2QT09In0%3D data-lake-concepts-2019-resources.asset.cd3434c8d2aae44c6071d19d9077ca60.pdf	Data Lake Design Principles - Informatica
data-lakes-Six-Guiding-Principles-for-Effective-Data-Lake-Pipelines.pdf
https://dzone.com/articles/four-data-sharding-strategies-for-distributed-sql?edition=568292&utm_source=Daily%20Digest&utm_medium=email&utm_campaign= Daily%20Digest%202020-01-29 disributed-db-sharding-strategies1.pdf	Data Sharding Strategies compared


snowflake-data-db-cloud-service-2333957-solution-brief-snowflake.pdf


Data Modeling	Models, MetaModels, MDG, RoundTripping, VersionMgt
sustainable data architecture concepts	basics on data concepts for blockchain *




Data Management
SFTP secure shell vs FTPS encrypted ftp explained. spiceworks.com-SFTP vs FTPS Understanding the 8 Key Differences.pdf file




Data Governance
https://profisee.com/data-governance-what-why-how-who/ data-governance-profisee.com-Data Governance What Why How Who 15 Best Practices.pdf	Data Governance Concepts & Tools
s Blockchain Data Compliance Services	Data Compliance
Sichern-data-compliance-whitepaper-short-version.201906.docx
DMX - Blockchain and Data Compliance Services.v2.pptx


Data Services Open Solutions
m Apache Data Services

Ubuntu	https://docs.ubuntu.com/ https://help.ubuntu.com/stable/ubuntu-help/
Ubuntu Server docs	Ubuntu Server documentation
Ubuntu Multipass	Multipass Multipass is a tool to generate cloud-style Ubuntu VMs quickly on Linux, macOS, and Windows.
Docker	Home


Run multiple JDKS on MACOS	Installing many JDK on macOS using Homebrew ad openjdk https://wiki.classe.cornell.edu/Computing/InstallingMultipleVersionsOfJavaOnMac Managing multiple Java versions in MacOS Manually install a JDK version from
Open JDK	https://openjdk.org/
Open JDK 19	https://jdk.java.net/19/
Open JDK 11	Archived OpenJDK GA Releases
Apache Tomcat	Apache Tomcat® - Welcome! Apache Tomcat 10 (10.1.42) - Documentation Index
Oracle MySQL	https://dev.mysql.com/doc/
Postgres	https://www.postgresql.org/docs/ https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-US.pdf PostgresSQL v15 manual pdf. link
CouchDB	https://docs.couchdb.org/en/stable/
SQLite	https://www.sqlite.org/docs.html
Derby Java DB	https://db.apache.org/derby/manuals/
JHipster Lite	https://github.com/jhipster/jhipster-lite https://hub.docker.com/r/jhipster/jhipster-lite JHipster Lite image
Grails	https://grails.org/documentation.html https://views.grails.org/latest/ https://cs4760.csl.mtu.edu/2019/assignments/cs4760-assignments/programming-assignments/1-building-your-first-app/ https://groovy-lang.org/documentation.html
Spark	https://github.com/apache/spark Apache-Spark-Beginners-Guide-2023-Ebook_8-Steps-V2.pdf link
Apache EventMesh	EventMesh is a new generation serverless event middleware for building distributed event-driven applications.
IPFS	https://docs.ipfs.tech/
Kafka	https://kafka.apache.org/documentation/ https://www.confluent.io/resources/online-talk/fundamentals-for-apache-kafka-2-part-series/?utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.kafka_mt.mbm_rgn.namer_lng.eng_dv.all_con.kafka-service&utm_term=%2Bkafka%20%2Bservice&creative=&device=c&placement=&gad=1&gclid=Cj0KCQjw6cKiBhD5ARIsAKXUdyap0EBNsV1F-fyqcrdi927y1qpcJx7mi2YbmuD3LfXkSCoHUURN9PwaAqEHEALw_wcB video
Hyperledger	https://hyperledger-fabric.readthedocs.io/en/release-2.5/
https://skywebteam.atlassian.net/wiki/spaces/KHUB/pages/54919289#mMessaging-ActiveMQ	ActiveMQ - topic queues, broadcast models ( pub / sub ) AretemisMQ - docs url
Apache Service Mix - ActiveMQ, Camel, Service mesh ..	https://servicemix.apache.org/docs/7.x/index.html heavyweight framework
Apache Camel - data source connections	https://camel.apache.org/docs/
JDBC Drivers	https://www.geeksforgeeks.org/jdbc-drivers/ https://www.ibm.com/docs/en/i/7.4?topic=jdbc-types-drivers https://en.wikipedia.org/wiki/JDBC_driver
BIRT - the original open-source data frames solution for analytics workbooks	https://eclipse.github.io/birt-website/ https://www.eclipse.org/community/eclipse_newsletter/2015/september/article3.php
Grafana - open-source data visualization on many data sources	https://grafana.com/docs/ https://grafana.com/docs/grafana/latest/introduction/
Data Beaver Open DB Client tool	https://dbeaver.io/
Web CMS list from wikipedia	https://en.wikipedia.org/wiki/List_of_content_management_systems
OpenCMS	http://www.opencms.org/en/ https://en.wikipedia.org/wiki/OpenCms http://www.opencms.org/en/news/230425-opencms-v1500.html https://documentation.opencms.org/central/
xWiki	https://www.xwiki.org/xwiki/bin/view/Main/WebHome https://www.xwiki.org/xwiki/bin/view/Documentation/UserGuide/Features/SecondGenerationWiki/ https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/ https://www.xwiki.org/xwiki/bin/view/Documentation/ https://dev.xwiki.org/xwiki/bin/view/Community/SupportStrategy/DatabaseSupportStrategy https://www.xwiki.org/xwiki/bin/view/Documentation/ https://en.wikipedia.org/wiki/XWiki
JSPWiki	https://jspwiki.apache.org/ https://jspwiki-wiki.apache.org/Wiki.jsp?page=Documentation https://jspwiki-wiki.apache.org/Wiki.jsp?page=Getting%20Started https://jspwiki-wiki.apache.org/Wiki.jsp?page=ContributedPlugins#section-ContributedPlugins-ContributedPluginsPriorToV2.9.x
Drupal	https://www.drupal.org/
Eclipse	https://www.eclipse.org/documentation/
VStudio	https://code.visualstudio.com/docs
IntelliJ	https://www.jetbrains.com/help/idea/getting-started.html



Data Services Commercial Solutions
GCP
AWS Aurora
AWS RedShift
Azure
Snowflake
Teallium

Cloud data warehouse solutions
https://www.scnsoft.com/analytics/data-warehouse/cloud cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf file	cloud-dwh-2022-Top 6 Cloud Data Warehouse Solutions.pdf

Key Concepts

Data Use Cases & Decision Match Data Processing Flows

Functional Data Layers Architecture

https://www.linkedin.com/posts/rajkgrover_dataplatforms-businessintelligence-analytics-activity-7133748286368141313-RXgv?utm_source=share&utm_medium=member_desktop

Source: Deloitte

The purpose of a data platform is to collect, store, transform and analyze data and make that data available to (business) users or other systems. It is often used for #businessIntelligence, (advanced) #analytics (such as #machineLearning) or as a data hub.

The platform consists of several components that can be categorized into common layers that each have a certain function. These layers are: Data Sources, Integration Layer, Processing Layer, Storage Layer, Analytics Layer, #Visualization Layer, Security, and #DataGovernance (Figure 1).

Data Sources This layer contains the different sources of the data platform. This can be any information system, like ERP or CRM systems, but it can also be other sources like Excel files, Text files, pictures, audio, video or streaming sources like IOT devices.

Ingestion Layer The ingestion layer is responsible for loading the data from the data sources into the data platform. This layer is about extracting data from the source systems, checking the data quality and storing the data in the landing or staging area of the data platform.

Processing Layer The processing layer is responsible for transforming the data so that it can be stored in the correct data model. Processing can be done in batches (scheduled on a specific time/day) or done real-time depending on the type of data source and the requirements for the data availability.

Storage Layer The data is stored in the storage layer. This can be a relational database or some other storage technologies such as cloud storage, Hadoop, NoSQL database or Graph database.

Analytics Layer In the analytics layer the data is further processed (analyzed). This can be all kinds of (advanced) analytics algorithms, for example for machine learning. The outcome of the analytics can be sent to the visualization layer or stored in the storage layer.

Visualization Layer The data is presented to the end-user in the visualization layer. This can be in the form of reports, dashboards, self-service BI tooling or #API ’s so that the data can be used by other systems.

Security One of the important tasks of a data platform is to guarantee that only users that are allowed to use the data have access. A common method is user authentication and authorization, but it can also be required that the data is encrypted (storage and in transfer) and that all activities on the data are audited so that is it known who has accessed or modified which data.

Data Governance Data governance is about locating the data in a data catalog, collecting and storing metadata about the data, managing the master data and/or reference data, and providing insights on where the data in the data platform originates from (i.e., #datalineage).

Is Hadoop still valid for batch data processing?

Apache Spark and other open-source frameworks are better now for some use cases

logz.io/blog/hadoop-vs-spark/

Spark has been found to run 100 times faster in-memory, and 10 times faster on disk. It’s also been used to sort 100 TB of data 3 times faster than Hadoop MapReduce on one-tenth of the machines. Spark has particularly been found to be faster on machine learning applications, such as Naive Bayes and k-means.

Spark performance, as measured by processing speed, has been found to be optimal over Hadoop, for several reasons:

Spark is not bound by input-output concerns every time it runs a selected part of a MapReduce task. It’s proven to be much faster for applications
Spark’s DAGs enable optimizations between steps. Hadoop doesn’t have any cyclical connection between MapReduce steps, meaning no performance tuning can occur at that level.

However, if Spark is running on YARN with other shared services, performance might degrade and cause RAM overhead memory leaks. For this reason, if a user has a use-case of batch processing, Hadoop has been found to be the more efficient system.

Improve RAJG data virtualization layers with data consumption methods

https://www.linkedin.com/posts/rajkgrover_data-datamanagement-banking-activity-7221140607228911617-hgeR?utm_source=share&utm_medium=member_desktop

add consumption models from sources, lakes

batch, transaction request, events, streams

add column for MDM, governance

Data Services Methods

https://www.linkedin.com/posts/giorgiotorre1234_how-many-api-architecture-styles-do-you-activity-7059072388340064256-TM9U?utm_source=share&utm_medium=member_desktop

Architecture styles define how different components of an application programming interface (API) interact with one another.

Here are the most used styles:

🔹SOAP:
Mature, comprehensive, XML-based
Best for enterprise applications

🔹RESTful:
Popular, easy-to-implement, HTTP methods
Ideal for web services

🔹GraphQL:
Query language, request specific data
Reduces network overhead, faster responses

🔹gRPC:
Modern, high-performance, Protocol Buffers
Suitable for microservices architectures

🔹WebSocket:
Real-time, bidirectional, persistent connections
Perfect for low-latency data exchange

🔹Webhook:
Event-driven, HTTP callbacks, asynchronous
Notifies systems when events occur

Are there any other famous styles I missed? 👇🏿

Source: ByteByteGo, Alex Xu

Data Event Message Communications

Most applications are integrated based on events with shared data.
An event occurs in PC1 ( Process Context 1 ) and 1 or more other dependent processes ( PC2 .. PCN ) will listen and react to the events and the related event data.
Sources and Handlers of Events? Function or Object? While pure functions can generate events, objects provide a context for the event beyond the current function.
Objects are key for automated processing of events. They go beyond functions, providing a valid context and the capability to handle responsibilities for events, behaviors and data.
They organize and simplify the use of functions and APIs.

Event requirements

consider async event handling requirements vs sync handling
requires saving state for later processing, replays and recovery
is event access based on push or pull model? in many scenarios, push may be more efficient for real-time responsive application flows
is event message persistence required?
what are the replay and recovery requirements?
are messages broadcast or handled by specific handler?
is the design for a configured handler ( eg API, database, Web Sockets or RPC )?
is the design for a registered event listener ( pub / sub messages, SSE ( Server Sent Events ) )?
what are the V's ( Volatility, Volume, Variety, Variance, Value, Validity ) ?

Frameworks for communicating Events

API service - a client can call an API to send an event object to a service for processing and a response ( can be sync or async invocation )
Database with SSE ( Server Sent Events ) and Streams. SSE doesn't require a database ( can be a service ) but the DB can persist the events for replay, recovery etc
Messaging with optional persistence, push or pull delivery models to clients supports a wide set of interaction models including push or pull delivery, async or sync, with broadcast or request / response handling
RPC - Remote Procedure - direct invocation of remote process from current process passing an event object and optionally returning a result object ( can be sync or async invocation )
Web Sockets - a synchronous interactive communications model between a 2 processes ( source and target ). a Web socket is created from an HTTP connection that is upgraded
SSE - Server Sent Events - from a data service or custom api service- 1 way async messages from server to client
Distributed Files sent using a file service ( SFTP, Rsync etc )
Blockchain agents - most DLTs offer transaction finality and a variety of environment events
Custom Communications Service - custom comm apps can be fully duplex

GEMS = Global Event Management System

supports the Modern Value Chain Networks

used by multiple solution layers: net apps, malls, stores, services providers, tools

modular solution that connects across networks and platforms with interfaces and adapters to connect many components

open standards-based platform composed primarily of sustainable, open frameworks with added open source connector services

project managed by an open foundation ( see Linux Foundation or ? )

provides solid NFR capabilities for most use cases

extensible on services, interfaces

manageable and maintainable

version tolerance with mutation policies

Steps to define GEMS

I have a clear idea of the problem I'm trying to solve but need to build a real doc to clarify the use cases with concrete examples. Firefly may be a big part of the solution since it can connect multiple DLT nets, can communicate in multiple ways at the services layer ( not just DLT ) and has basic event support capabilities.
If I name the solution it's a global event workflow service that does pub / sub ( both push and pull models ) across multiple networks connected by a supernode model. There are concrete examples I need to specify.
I've built a simple version of that type of service in the past on a single distributed platform on IBM i because the platform had built-in support for event workflows that could be connected over a distributed net.

EDA > Event Driven Architecture for a simple event workflow solution

https://www.linkedin.com/posts/rajkgrover_eventdrivenarchitecture-microservices-transformpartner-activity-6988742530594942977-eGkE?utm_source=share&utm_medium=member_desktop

Architectural blueprint for #EventDrivenArchitecture-#Microservices Systems

The following figure is an architectural diagram of an EDA-microservices-based enterprise system. Some microservices components and types are shown separately for better clarity of the architecture.

The EDA and microservices-specific components in this blueprint are:

·Event backbone. The event backbone is primarily responsible for transmission, routing, and serialization of events. It can provide APIs for processing event streams. The event backbone offers support for multiple serialization formats and has a major influence on architectural qualities such as fault tolerance, elastic scalability, throughput, and so on. Events can also be stored to create event stores. An event store is a key architectural pattern for recovery and resiliency. 
§ Services layer. The services layer consists of microservices, integration, and data and analytics services. These services expose their functionality through a variety of interfaces, including REST API, UI, or as EDA event producers and consumers. The services layer also contains services that are specific to EDA and that address cross-cutting concerns, such as orchestration services, streaming data processing services, and so on. 
§ Data layer. The data layer typically consists of two sublayers. In this blueprint, individual databases owned by microservices are not shown.
§ Caching layer, which provides distributed and in-memory data caches or grids to improve performance and support patterns such as CQRS. It is horizontally scalable and may also have some level of replication and persistence for resiliency.
§ Big data layer, which is comprised of data warehouses, ODS, data marts, and AI/ML model processing.
§ Microservices chassis. The microservices chassis provides the necessary technical and cross-cutting services that are required by different layers of the system. It provides development and runtime capabilities. By using a microservices chassis, you can reduce design and development complexity and operating costs, while you improve time to market, quality of deliverables, and manageability of a huge number of microservices. 
§ Deployment platform: Elastic, cost optimized, secure, and easy to use cloud platforms should be used. Developers should use as many PaaS services as possible to reduce maintenance and management overheads. The architecture should also provision for hybrid cloud setup, so platforms such as Red Hat OpenShift should be considered. 

Key architectural considerations
The following architectural considerations are extremely important for event-driven, microservices-based systems:

·Architectural patterns
§ Technology stack
§ Event modeling
§ Processing topology
§ Deployment topology
§ Exception handling
§ Leveraging event backbone capabilities
§ Security
§ Observability
§ Fault tolerance and response

Source: IBM

#TransformPartner – Your #DigitalTransformation Consultancy

Jim >>

The concepts shown are a good start but not adequate to meet the event solution models we are looking at. On our end, we are looking to define a more global model for different use cases than you have here. I'm sure your implementation can be successful for your use case.

Solve the CAP theorem for async acid event transactions

Raj Grover >>

Event processing topology

https://www.linkedin.com/posts/rajkgrover_eventdrivenarchitecture-microservices-activity-6988742530594942977-rNFF/?originalSubdomain=my

In hashtag#EDA , processing topology refers to the organization of producers, consumers, enterprise integration patterns, and topics and queues to provide event processing capability. They are basically event processing pipelines where parts of functional logic (processors) are joined together using enterprise integration patterns and queues and topics. Processing topology is a combination of the SEDA, EIP, and Pipes & Filter patterns. For complex event processing, multiple processing topologies can be connected to each other.
The following figure depicts a blueprint of a processing topology.

compare to Firefly Core Stack for Event Management

Firefly: Web3 Blockchain framework#FireflyFeaturesandServices

Apache Pulsar - Event messaging & streaming

https://skywebteam.atlassian.net/wiki/spaces/KHUB/pages/61112419#mApacheDataServices-ApachePulsar

Apache® Pulsar™ is an open-source, distributed messaging and streaming platform built for the cloud.

What is Pulsar

Apache Pulsar is an all-in-one messaging and streaming platform. Messages can be consumed and acknowledged individually or consumed as streams with less than 10ms of latency. Its layered architecture allows rapid scaling across hundreds of nodes, without data reshuffling.

Its features include multi-tenancy with resource separation and access control, geo-replication across regions, tiered storage and support for six official client languages. It supports up to one million unique topics and is designed to simplify your application architecture.

Pulsar is a Top 10 Apache Software Foundation project and has a vibrant and passionate community and user base spanning small companies and large enterprises

Apache EventMesh

EventMesh is a new generation serverless event middleware for building distributed event-driven applications.

key features EventMesh has to offer:

Built around the CloudEvents specification.
Rapidly extensible language sdk around gRPC protocols.
Rapidly extensible middleware by connectors such as Apache RocketMQ, Apache Kafka, Apache Pulsar, RabbitMQ, Redis, Pravega, and RDMS(in progress) using JDBC.
Rapidly extensible controller such as Consul, Nacos, ETCD and Zookeeper.
Guaranteed at-least-once delivery.
Deliver events between multiple EventMesh deployments.
Event schema management by catalog service.
Powerful event orchestration by Serverless workflow engine.
Powerful event filtering and transformation.
Rapid, seamless scalability.
Easy Function develop and framework integration.