m Postgres RDB
Key Points
- very good open-source RDB with vendor support available
- SQL compliant
- good stored procedure support including external procedures
- PLSQL languages include: PLSQL, Python, Java
- drivers for jdbc, odbc
- use the Pipeline DB version for stream processing support
References
Reference_description_with_linked_URLs_______________________ | Notes______________________________________________________________ |
---|---|
https://www.postgresql.org/files/documentation/pdf/14/postgresql-14-US.pdf postgres-db-v14.pdf link | postgres documentation |
https://www.postgresql.org/docs/current/transaction-iso.html | |
https://www.postgresql.org/docs/9.1/plpython.html | python - internal language along with plsql |
java is externally supported | |
https://medium.com/dataseries/using-postgresql-for-real-time-iot-stream-processing-applications-965741c57315 | use PipelineDB for Postgres stream processing support Pipeline was sold and is now discontinued - no SSE for Postgres |
Key Concepts
Guide to PostgreSQL Data Change Tracking
Doesn't directly support state change tables ( temporal )
Strategies
source table trigger to a log table - has synchronous overhead
CREATE OR REPLACE FUNCTION audit.if_modified_func() RETURNS TRIGGER AS $body$
BEGIN
IF (TG_OP = 'UPDATE') THEN
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,original_data,new_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),ROW(OLD.*),ROW(NEW.*),current_query());
RETURN NEW;
elsif (TG_OP = 'DELETE') THEN
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,original_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),ROW(OLD.*),current_query());
RETURN OLD;
elsif (TG_OP = 'INSERT') THEN
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,new_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),ROW(NEW.*),current_query());
RETURN NEW;
END IF;
END;
$body$
LANGUAGE plpgsql;
CREATE TRIGGER my_table_if_modified_trigger
AFTER INSERT OR UPDATE OR DELETE ON my_table
FOR EACH ROW EXECUTE PROCEDURE if_modified_func();
Triggers and Notify/Listen
changes go through a pub/sub mechanism through a trigger to another system dedicated to reading and storing these data changes:
limitations >>
- “At most once” delivery. Listen/notify notifications are not persisted meaning if a listener disconnects, it may miss updates that happened before it reconnected again.
- 8K record size
app change tracking - app writes to db and log table concurrently
CDC ( Change Data Capture ) from binary replication log
With PostgreSQL, it is possible to connect to the Write-Ahead Log (WAL) that is used for data durability, recovery, and replication to other instances.
PostgreSQL supports two types of replications: physical replication and logical replication. The latter allows decoding WAL changes on a row level and filtering them out, for example, by table name. This is exactly what we need to implement data change tracking with CDC.
Here are the basic steps necessary for retrieving data changes by using logical replication:
1. Set wal_level
to logical
in postgresql.conf
and restart the database.
2. Create a publication like a “pub/sub channel” for receiving data changes:
3. Create a logical replication slot like a “cursor position” in the WAL:
4. Fetch the latest unread changes
To implement log-based CDC with PostgreSQL, I would recommend using the existing open-source solutions. The most popular one is Debezium.
There are downsides to CDC in Postgres, primarily performance
Complexity. Implementing CDC adds a lot of system complexity.
Debezium plugin for CDC in Postgres
Debezium is an open source project that provides a low latency data streaming platform for change data capture (CDC). You set up and configure Debezium to monitor your databases, and then your applications consume events for each row-level change made to the database. Only committed changes are visible, so your application doesn't have to worry about transactions or changes that are rolled back. Debezium provides a single model of all change events, so your application does not have to worry about the intricacies of each kind of database management system. Additionally, since Debezium records the history of data changes in durable, replicated logs, your application can be stopped and restarted at any time, and it will be able to consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely.
Potential Value Opportunities
Potential Challenges
Candidate Solutions
Step-by-step guide for Example
sample code block