Key Points

SQL - DB2, Oracle, SQL Server, MySQL, Postgres
NoSQL - Mongodb, Couchdb, Couchbase, Domno
ACID vs CAP compliant transactions - Atomic Consistent Isolated Durable vs Consistency Availability Partition Tolerance
Isolation levels: Serialized, Repeatable Read, Committed Read, Uncomitted
RDB Normalization for ODS = 3rd normal form - fields have no transitive dependencies ( whse_code, whse_name in orderLineItem table )
RDB Normalization for RDS = 1st normal form - fields have dependency on primary key

References

Reference_description_with_linked_URLs________________________________	Notes______________________________________________________________
i Data Services
m Data Services Concepts
m Data Architecture
m.Data Analytics
m Data Quality

m MySQL overview
m MySQL Workbench
s SQL and data solutions
m Data CouchDb
m MongoDb usage basics
m MongoDB
m Mongoose for MongoDb
m Apache Spark
m Apache Data
m Snowflake data services


https://en.wikipedia.org/wiki/Database_normalization https://www.bkent.net/Doc/simple5.htm	RDB normalization - wikipedia how to normalize data - process steps


https://docs.google.com/document/d/1rIt7LDGuAlM-n7vIVLc4A3r3LMyBCAcZrzSkWIKyfaQ/edit	Basic Database Design Concepts Course Overview
data-privacy-management-study-2019-Integris-and-Ivy-Exec-Data-Privacy-Maturity-Study.pdf	Interview questions - General Enterprise and Data Architecture concepts doc

https://www.zentut.com/data-warehouse/kimball-and-inmon-data-warehouse-architectures/	Inmon vs Kimball data models
file:///C:/Users/Jim%20Mason/Google%20Drive/_groups/nemug/SQL%20Stored%20 Procedures%20and%20Application%20Modernization.pdf	SQL Stored Procedures
https://www.datacamp.com/community/tutorials/finance-python-trading?fbclid=IwAR1n33gRfaNvZv02RR5wCrSNkS4RqxJQqipBGQuqMyWc6 _akSPQ2hAkFp6c	DataCamp - Python for Algorithmic Trading - Jupyter Notebooks
file:///C:/Users/Jim%20Mason/Google%20Drive/_groups/nemu/NEMUG%20-%20June%20Meeting%20-%20Ted%20Holt%20- %2030%20SQL%20tips%20in%2060%20minutes.pdf	30 SQL Tips - Ted Holt

NoSQL
	CouchDb
	CouchDb
	CouchDbase
https://www.couchbase.com/partners/amazon/ https://docs.couchbase.com/server/current/cloud/couchbase-aws-marketplace.html	CouchDbase on AWS
https://stackshare.io/immudb https://codenotary.com/technologies/immudb/ https://docs.immudb.io/master/	immuno.db - immutable key value store ** it is immutable: history is preserved and can't be changed without clients noticing. It can be used both as a key-value store, or as a relational database (SQL). Transactional integrity, multi-node ACID compliance SQL and K/V SDKs for Python, Node.js, Java, Go immudb is licensed under the Apache v2.0 License.
https://github.com/codenotary/immudb	how to setup run immudb in AWS S3 or locally in minio store
https://github.com/minio/minio	minio local object store ( like AWS S3 ) * features such as versioning, object locking, and bucket replication require distributed deploying MinIO with Erasure Coding

Jim Mason related data services presentations
Advanced SQL. Jim Mason. Web solutions for iseries engineer, build, deploy, support, train - PDF Free Download.pdf
as400pro.com-iSeries AS400 Web Applications Web Services Development Websphere Apache Tomcat PHP EGL CGIDEV2 etc f.pdf
blog.chinaunix.net-JDBCon iSeries-mason.pdf
jim-mason_5-2008_WebFacing_HATS_overview.pdf
jim-mason_windows_integration_4i_work_v2.pdf
jim-mason-QuickWebDataMart_MySQL.pdf
ibm-wdsc-2005-rse1.pdf

Key Concepts

RDB Normalization

https://en.wikipedia.org/wiki/Database_normalization

UNF (1970)	1NF (1970)	2NF (1971)	3NF (1971)	EKNF (1982)	BCNF (1974)	4NF (1977)	ETNF (2012)	5NF (1979)	DKNF (1981)	6NF (2003)
Primary key (no duplicate tuples)
No repeating groups
Atomic columns (cells have single value)^[8]
Every non-trivial functional dependency either does not begin with a proper subset of a candidate key or ends with a prime attribute (no partial functional dependencies of non-prime attributes on candidate keys)^[8]
Every non-trivial functional dependency begins with a superkey or ends with a prime attribute (no transitive functional dependencies of non-prime attributes on candidate keys)^[8]
Every non-trivial functional dependency either begins with a superkey or ends with an elementary prime attribute^[8]											N/A
Every non-trivial functional dependency begins with a superkey^[8]											N/A
Every non-trivial multivalued dependency begins with a superkey^[8]											N/A
Every join dependency has a superkey component^[9]											N/A
Every join dependency has only superkey components^[8]											N/A
Every constraint is a consequence of domain constraints and key constraints^[8]											N/A
Every join dependency is trivial^[8]

Steps to normalize data

https://www.bkent.net/Doc/simple5.htm

good examples

rdb-normalization-examples-bkent.net-A Simple Guide to Five Normal Forms in Relational Database Theory.pdf

Normalization concepts, quiz

https://rdb.dp.ua/en/chapter_03#:~:text=There%20are%20five%20different%20levels,BCNF)%2C%204NF%2C%205NF.

rdb-normalization-how2-quiz-rdb.dp.ua-3 The normalization of relations during the DB designing.pdf

NoSQL databases often provide CAP support for transactions

https://www.ibm.com/cloud/learn/cap-theorem

Consistency

Consistency means that all clients see the same data at the same time, no matter which node they connect to. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’

Availability

Availability means that that any client making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.

Partition tolerance

A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

Data models

Inmon - application driven data models

generally fit transaction processing environments well ( eg an ODS - operational data store )

typically important for real-time data capture, processing and event handling

feed analytics systems at some level of aggregation

Kimball - dimension driven data models

dimensions and facts

BWC - low-level sales summary dm allowed easy user analytics without paying for Power BI servers !!

PTP - different priority streams for data warehouse - real time dashboards vs stand alone analytics

PTP - custom user-defined data models, actions, events at runtime ( leverage MySQL with Java threads )

Some Common Data Modeling Considerations

Some EBC DB design considerations

db design should consider "next" release features
even if not supported in app in some cases

surrogate ids
all records primary key is a single surrogate id
reduces update anomalies

use natural id key references vs surrogates for links
( eg orderNumber vs _id )
_id is good for speed but not for recovery, migrations
is it product.id or product.productNbr ??

add indexes where needed for lookups

define product wrapper to wrap any type of product
wrapped types all have std interface to expose
eg id, productNbr, qty, summary, description, status, availableDateRange, etc
the rest of a wrapped item is extensible, custom
accessible by full query of the product using virtual fields or plain gql
Java provides an interface which makes this easier in apps to build services

logical transactions vs physical
logical transaction can span
multiple records ( eg order w details )
multiple stores ( eg RDB, DLT )
multiple partitions ( local, remote stores )
must track completion of logical updates
ideally async updates vs sync for performance
define usage models
define core data, first stores for initial verify

performance testing scenarios
as needed, create test cases

consider lot data without lot tracking ( accounting )
big difference
1 is simple db design
lot tracking is a chain of custody solution

denormalize transactions
reduces high pecent of lookups in theory
( eg customer.id, customerName )
when to carry a description field in addition to key id on reference to a master record ?

dimension capture
when to capture dimension data on transactions for use in warehouse

minimize use of virtual fields for high performance, real time apps
use materialized views, not vfields
vfields ok for normal reporting etc with normal volumes on queries

aggregations
decide when to materialize aggregation
vs generate on the fly
( updates vs reads usage modeling )

Basic Database Design Concepts Course Overview

https://docs.google.com/document/d/1rIt7LDGuAlM-n7vIVLc4A3r3LMyBCAcZrzSkWIKyfaQ/edit

Potential Value Opportunities

Potential Challenges

Candidate Solutions

Map Database Use Cases to Databases

https://www.linkedin.com/posts/ray-rahul_database-databaseselection-sqlvsnosql-activity-7125065141687115776-zeV6?utm_source=share&utm_medium=member_desktop

👉Reasons for SQL:

Structured data
Strict schema
Relational data
Need for complex joins
Transactions
Clear patterns for scaling
More established: developers, community, code, tools, etc
Lookups by index are very fast

👉Reasons for NoSQL:

Semi-structured data
Dynamic or flexible schema
Non-relational data
No need for complex joins
Store many TB (or PB) of data
Very data-intensive workload
Very high throughput for IOPS

💁‍♂️Sample data well-suited for NoSQL:
Rapid ingest of clickstream and log data
Leaderboard or scoring data
Temporary data, such as a shopping cart
Frequently accessed ('hot') tables
Metadata/lookup table

If you're new to these terms or looking to dive deeper, give this a ❤️ and share!
Together, we can make tech jargon easier to grasp for everyone. 🌍

MongoDB

MySQL

Configuration

1 GB memory is a minimum for a dev environment only. 2 GB recommended .. see below

http://www.tocker.ca/2014/03/10/configuring-mysql-to-use-minimal-memory.html

https://www.mysqlcalculator.com/

Postgres

recommend 2 core minimum

low memory < 256M ok to start

CouchDB

requirements

https://stackoverflow.com/questions/47302848/apache-couchdb-system-minimum-requirements

The spec is one CPU, 512mb RAM and 20gb SSD disk. We run CouchDB in production on AWS using Docker. In general, we have found that we need at least a t2.medium for each node to keep up with the CPU and memory demands, i.e. each t2.medium instance has 2 CPU cores and 4GB

MongoDB

MongoDB Full Text Search with Lucene

works in aggregation pipeline – FMR sequence

for clusters M30 or higher in 4.2 release now

Auto-scale storage

OLD - up not down though

NEW - auto-scale based on usage, not just disk

Step-by-step guide for Example

sample code block

sample code block

m Database

Key Points

References

Jim Mason related data services presentations

Key Concepts

RDB Normalization

Steps to normalize data

Normalization concepts, quiz

NoSQL databases often provide CAP support for transactions

Consistency

Availability

Partition tolerance

Data models

Inmon - application driven data models

Kimball - dimension driven data models

BWC - low-level sales summary dm allowed easy user analytics without paying for Power BI servers !!

PTP - different priority streams for data warehouse - real time dashboards vs stand alone analytics

PTP - custom user-defined data models, actions, events at runtime ( leverage MySQL with Java threads )

Some Common Data Modeling Considerations

Some EBC DB design considerations

Basic Database Design Concepts Course Overview

Potential Value Opportunities

Potential Challenges

Candidate Solutions

Map Database Use Cases to Databases

MongoDB

MySQL

Configuration

Postgres

CouchDB

requirements

MongoDB

MongoDB Full Text Search with Lucene

Auto-scale storage

Step-by-step guide for Example

Recommended Next Steps

Related articles