Data Governance Concepts

Key Points


References


Key Concepts


Data governance (DG) is the process of managing the availability, usability, integrity and
security of the data in enterprise systems, based on internal data standards and policies that
also control data usage. Effective data governance ensures that data is consistent and
trustworthy and doesn't get misused. It's increasingly critical as organizations face new data
privacy regulations and rely more and more on data analytics to help optimize operations
and drive business decision-making.

A well-designed data governance program typically includes a governance team, a steering
committee that acts as the governing body, and a group of data stewards. They work
together to create the standards and policies for governing data, as well as implementation
and enforcement procedures that are primarily carried out by the data stewards. Ideally,
executives and other representatives from an organization's business operations take part,
in addition to the IT and data management teams.

While data governance is a core component of an overall data management strategy,
organizations need to focus on the expected business benefits of a governance program for
it to be successful,

Data Governance problems 

Without effective data governance, data inconsistencies in different systems across an organization might not get resolved. For example, customer names may be listed differently in sales, logistics and customer service systems. That could complicate data integration efforts and create data integrity issues that affect the accuracy of business intelligence (BI), enterprise reporting and analytics applications. In addition, data errors might not be identified and fixed, further affecting BI and analytics accuracy.

Data Governance programs

compliance with data privacy and protection laws, such as the European Union's GDPR and the California Consumer Privacy Act
(CCPA). An enterprise data governance program typically includes the development of
common data definitions and standard data formats that are applied in all business systems,
boosting data consistency for both business and compliance uses.


<<key < FACTUR3DT.IO - measure Data Governance Value, Impacts in the organization against defined OKRs, KPIs in the Data Governance Program

<<key < key benefits possible include

setting, implementating and enforcing well defined data governance policies

data access controls and management

data quality management against internal standards and external regulations

compliance with data management policies and regulations

easy data availability for users and services 

automation of data management tasks 

data valuation based on usage vs objectives 

compliance with data usage controls - PII, GDPR, CCPA etc industry data controls PCI HIPAA etc


Data Governance Mgt -  

Management team sets governance rules with Board approval

Data Governance committee enforces the rules and approves all policies

CDO - Chief Data Officer role administers data governance processes

Data stewards own and maintain their responsible data sets and stores

EA sets the data management architecture


Data Governance Components 

mission statement for the program, its goals and how its success will be measured, as well as decision-making responsibilities and accountability for the various functions that will be part of the program that is published

Data Governance Tools

data governance software can be used to automate aspects of managing a governance program. While data governance tools aren't a mandatory framework component, they support program and workflow management, collaboration, development of governance policies, process documentation, the creation of data catalogs and other functions. They can also be used in conjunction with data quality, metadata management and master data management (MDM) tools.

Data Governance Program Steps

steps to take, including the following to-do items:

  • identify data assets and existing informal governance processes;

  • increase the data literacy and skills of end users; and

  • decide how to measure the success of a governance program.

  • identify and update existing data management policies to meet program objectives, compliance and related regulations
  • identify and create the appropriate metrics, data controls and audit procedures for the policies
  • setup the data governance team and responsibilities including data stewards
  • define the tools needed for each stakeholder to effectively manage their governance responsibilities
  • setup data classifications to support data policies and access controls ( AATR etc )
  • define common business meta data models and glossaries that map to the industry, the business and related standards
  • build and maintain the enterprise data catalog with all internal and external sources, related controls for quality & currency,
  • define all clients for the catalog resources and their related responsibilities and access to data catalog sources


How to Build a Data Catalog - tech target article

Data Catalog: A Comprehensive Guide

  • Smith, Anne Marie, Ph.D. (2022). "How to build a data catalog: 10 key steps."

Table of Contents

  1. Introduction
    • Definition and Importance
  2. Why Data Catalogs are Essential
  3. 10 Steps to Building a Data Catalog
    • 3.1. Document Metadata Management Value
    • 3.2. Identify Data Stewardship Uses
    • 3.3. Design a Subject Area Model
    • 3.4. Build a Data Glossary
    • 3.5. Build a Data Dictionary
    • 3.6. Discover Metadata from Sources
    • 3.7. Profile the Data
    • 3.8. Identify Relationships Among Data Sources
    • 3.9. Capture Data Lineage
    • 3.10. Organize the Catalog for Users
  4. Best Practices for Building a Data Catalog
  5. Conclusion
  6. References

Introduction

A data catalog serves as a centralized reference tool enabling various users to explore, understand, and utilize data sets effectively. It collects metadata from diverse sources to create a searchable inventory, enhancing metadata management across an enterprise.

Why Data Catalogs are Essential

The primary goal of a data catalog is to overcome the challenges posed by data sprawl across different stores, making it hard for users to find relevant data. By offering a unified view and built-in search capabilities, data catalogs ensure operational and analytics initiatives are more effective, supporting data-driven decision-making.

Steps to Building a Data Catalog

3.1. Document Metadata Management Value

Highlight the benefits of metadata management to data governance, emphasizing the improved data quality and operational effectiveness it brings.

3.2. Identify Data Stewardship Uses

Distinguish between data catalogs, business glossaries, and data dictionaries to utilize each for effective metadata management.

3.3. Design a Subject Area Model (SAM)

Develop a SAM based on business uses of data, indicating data's location beyond system constraints, crucial for the data catalog's structure.

3.4. Build a Data Glossary

Create an enterprise-wide business glossary in collaboration with business data stewards, providing a foundational knowledge base for data catalog content.

3.5. Build a Data Dictionary

Compile comprehensive descriptions and mappings of data entities to guide metadata integration into the data catalog.

3.6. Discover Metadata from Sources

Identify and record metadata sources across the organization's databases and repositories for inclusion in the data catalog.

3.7. Profile the Data

Generate informative data profiles to aid users in understanding catalog metadata, focusing on both technical and business metadata aspects.

3.8. Identify Relationships Among Data Sources

Uncover and document data relationships across different systems to facilitate comprehensive data understanding and usage.

3.9. Capture Data Lineage

Utilize ETL tools for data lineage documentation, tracking data origins and flows for error tracing and user understanding.

3.10. Organize the Catalog for Users

Design the data catalog with a user-centric approach, ensuring accessibility and ease of use for data consumers.

Best Practices for Building a Data Catalog

  • Ensure data security and privacy through user permissions and sensitive data tagging.
  • Foster collaboration with user interaction features like rating, commenting, and chatting.
  • Develop user training programs for effective data catalog utilization.
  • Establish a maintenance process to keep the catalog current with evolving data assets and business needs.

Conclusion

A well-planned and implemented data catalog is integral to modern data management, offering invaluable support to data governance, metadata management, and user empowerment. It paves the way for a more informed and efficient operational and analytical environment.

References

  • Smith, Anne Marie, Ph.D. (2022). "How to build a data catalog: 10 key steps."



Raj G > How to Align Company Strategy with Data Governance Program article. ***

> Understand strategic goals
> Identify critical hashtag#dataAssets
> Establish data governance objectives
> Engage stakeholders
> Create a data governance framework
> Communicate the value proposition
> Establish hashtag#KPIs
> Integrate data governance into existing processes
> Provide training and awareness
> Continuous monitoring and improvement
> Adapt to changing strategies

Image Source: Springerlink

Data Governance




Jim DG, EDM keys


My career is managing data anywhere:  goalie, janitor, whisperer, therapist

m Data Mgt Concepts


Role keys- alignment on IT architecture, plans, priorities, business partnerships internal & external


SDP - virtual teams > discovery > assessment > plan > design > test > train > rollout > support > outcomes & impacts feedback 


From IT EA, EDM, Digital Transformation programs, create related program for EDG that is aligned, integrated w BUs, clients, vendor services to meet goals on EDM quality & services OKRs, DT, Compliance ( internal and external audits )


EDM - Enterprise Data Mgt - 4 R keys - data & services are: RIGHT, RELIABLE, RESPONSIVE, REACTIVE across the enterprise, clients and vendor services


DLT Architecture part of EA services tower w other IT lines > governance support on DLT, data > partners with innovation teams

EDM app services, infrastructure, support tools 


EDM - Commercial Tools, Solutions


Compare IBM, Oracle, Collibra, Microsoft,  < see Gartner Magic Quadrant for Data Governance

Informatica, Collibra for MDM

Couchbase for distributed NoSQL w JDBC tools, reporting

Pulsar & Solace on global events services

DLT w Firefly, Fabric, Besu

MySQL, Postgres, Oracle, Snowflake, MongoDB, Aws Aurora

Messaging > Kafka, MQ, ActiveMQ, Artemis ? 

Tools - EA, SA, Archimate, Plantuml, MDM repo ( NOT commercial ones but maybe integrated w Oracle )

App Mgr > 

Data Security - SailPoint, Guardium, 


EDM - Open-Source Tools, Solutions


Pulsar & Solace on global events services

DLT w Firefly, Fabric, Besu

MySQL, Postgres, Oracle, Snowflake, MongoDB, Aws Aurora

Messaging > Kafka, MQ, ActiveMQ, Artemis ? 

Tools - EA, SA, Archimate, Plantuml, MDM repo ( NOT commercial ones but maybe integrated w Oracle )

App Mgr > 

Data Security - SailPoint, Guardium, 



STH - DGMM - Data Governance Maturity Model


DGMM - DG Maturity Model

  • unmanaged
  • measured
  • planned
  • implemented
  • sustaining


the DG SDP

DG Goals

DG Assessments 

DG Metrics

DG Outcomes

DG Impacts

DG Strategy

DG Solutions

DG Programs

DG Management



RajG on Data Governance 


Deloitte on Data Governance Challenges


https://www.linkedin.com/posts/rajkgrover_datagovernance-generativeai-dataprivacy-activity-7145740183915720704-FQyX?utm_source=share&utm_medium=member_desktop

Top 15 #DataGovernance Challenges in Context of the #GenerativeAI and How to Mitigate Them

 

1.    Bias and Fairness

2.    Ethical Use and Misinformation

3.    #DataPrivacy

4.    #DataQuality Assurance

5.    Explainability

6.    Transparency

7.    Verification and Authenticity

8.    Regulatory Compliance

9.    #DataSecurity Risks and Malicious Use

10. #IntellectualProperty and Copyrights Issues

11. End-User Accountability

12. #HumanAICollaboration Guidelines

13. Public Perception and Trust

14. Community and Stakeholder Engagement

15. Real Time Monitoring and Control

 

Addressing these data governance challenges requires multidisciplinary, holistic and proactive approach that involves collaboration between data scientists, ethicists, legal experts, and stakeholders.

 

How to Mitigate the Above Challenges:

 

1.    Bias Audits

2.    Diverse Training Data

3.    #ExplainableAI

4.    Transparency Guidelines

5.    Anonymization and Privacy Preserving Techniques

6.    Human Review of the Quality Assurance

7.    Encryption and Access Controls

8.    Regular Audits by Legal Expertise

9.    Clear Ownership Policies

10. Intervention Protocols for Real Time Monitoring

 

Image Source: Deloitte

 

Data Governance Challenges with Generative AI



RajG Data Governance Concepts for Generative AI

Why is #datagovernance important in #GenerativeAI?

Data governance plays a pivotal role in fostering #innovation in the evolving AI landscape by ensuring responsible data practices, mitigating biases, and safeguarding privacy. A robust data governance strategy is the key to unlocking the full potential of your Generative AI use cases.






Other DG and EDM Concepts

Who should own Data Quality?

https://www.linkedin.com/posts/rajkgrover_dataquality-datamanagement-datagovernance-activity-7139936738956812288-eEwd/?utm_source=share&utm_medium=member_desktop

Ownership of data quality is a critical aspect of effective #datamanagement within an organization. While the specific team or role that owns data quality may vary depending on the organizational structure, size, and industry, there are several common approaches:

 

#DataGovernance Team:

 

Role: A dedicated Data Governance Team or Office is responsible for defining and enforcing data management policies, including data quality standards.

Responsibilities:

-Establishing data quality policies and procedures.

-Defining data quality metrics and benchmarks.

-Monitoring and enforcing data quality standards.

 

#DataStewardship Team:

Role: Data stewards are individuals or a team responsible for the management and oversight of specific sets of data.

Responsibilities:

-Ensuring data quality at the operational level.

-Resolving data quality issues and discrepancies.

-Collaborating with business units to improve data quality.

 

IT or Data Management Team:

Role: The IT or Data Management Team, including database administrators and data engineers, may be responsible for technical aspects of data quality.

Responsibilities:

-Implementing data quality tools and technologies.

-Monitoring and optimizing data quality processes.

-Collaborating with business units to understand data requirements.

 

#BusinessAnalysts or #DataAnalysts:

Role: Business analysts or data analysts who work closely with business units and understand data requirements can play a role in ensuring data quality.

Responsibilities:

-Profiling and analyzing data to identify quality issues.

-Collaborating with data stewards to address data quality concerns.

-Participating in the definition of data quality rules.

 

Quality Assurance (QA) Team:

Role: In organizations with a strong QA function, the QA team may be involved in ensuring data quality for systems and applications.

Responsibilities:

-Applying QA principles to data-related processes.

-Conducting data validation and testing.

-Collaborating with data owners and stewards.

 

Business Units and Data Owners:

Role: In a decentralized model, business units or data owners may have ownership of data quality for the data they generate and use.

Responsibilities:

-Defining and maintaining data quality requirements.

-Taking ownership of data quality improvement initiatives.

-Collaborating with data stewards and IT teams.

 

#ChiefDataOfficer (#CDO) or Chief Analytics Officer (CAO):

Role: The CDO or CAO may have a strategic role in setting the overall vision for data quality and ensuring alignment with business goals.

Responsibilities:

-Setting the strategic direction for data quality.

-Advocating for data quality best practices.

-Collaborating with executive leadership to prioritize data quality initiatives.

 

Image Source: Eckerson Group

Data Governance Structure











Analytics COE Models for Enterprises


Sample Digital Transformation Project Roadmap Checkpoints








Potential Value Opportunities




STH Data Governance Process


DG Assessment




Key Questions for DQM assessment

Answer 5WH2 ( How, History )


25 Questions for the Leadership to Address the hashtag#DataGovernance Gaps:

1.    Have we conducted a comprehensive inventory of our data assets?
2.    Do we know who owns and is responsible for each dataset?
3.    Have we implemented a clear hashtag#dataclassification system?
4.    Are our employees trained on how to handle data based on its classification?
5.    What metrics are we using to measure hashtag#dataquality?
6.    How robust are our current data access controls?
7.    When was the last time we reviewed and updated access permissions?
8.    Do we have clear policies for data retention and disposal?
9.    Are these policies being consistently applied across all departments?
10. Have we established a data governance committee?
11. How effective is our current data governance structure?
12. How complete and accurate is our hashtag#metadata?
13. Are we effectively using metadata to enhance data usability and governance?
14. Can we trace the origin and transformations of our critical data?
15. How are we documenting and visualizing hashtag#datalineage?
16. Are we fully compliant with all relevant data hashtag#regulations?
17. How are we identifying and mitigating data-related risks?
18. How effective is our current data governance training program?
19. Are all employees aware of their data governance responsibilities?
20. Do we have the right tools to support our data governance efforts?
21. How can we leverage hashtag#technology to automate and improve our data governance?
22. What hashtag#KPIs are we using to measure data governance effectiveness?
23. How often are we reporting on data governance to the board and executives?
24. What processes do we have in place for continuously improving our data governance?
25. When was our last data governance audit, and what were the key findings?

Image Source: SpringerLink

Strategy and Data Governance


Data governance framework for e-government


DGMA - DG Maturity Assessment


DG Build 


DG Management 


DG Maintenance



Potential Challenges



Candidate Solutions



Gartner on ABI Platforms - Analytics and BI

https://www.gartner.com/reviews/market/data-and-analytics-governance-platforms

DGM-Magic Quadrant for Analytics and Business Intelligence Platforms.pdf.   link

DGM-Magic Quadrant for Analytics and Business Intelligence Platforms.pdf. file

Data and analytics leaders use ABI platforms to support the needs of IT, analysts, consumers and data scientists. While integration with cloud ecosystems and business applications is a crucial selection requirement, buyers also need platforms to support governance, interoperability and AI.


Definition

Analytics and business intelligence platforms — enabled by IT and augmented by AI — empower users to model, analyze and share data.
Analytics and business intelligence (ABI) platforms enable organizations to understand their data. For example, what are the dimensions of their data — such as product, customer, time, and geography? People need to be able to ask questions about their data (e.g., which customers are likely to churn? Which salespeople are not reaching their quotas?). They need to be able to create measures from their data, such as on-time delivery, accidents in the workplace and customer or employee satisfaction. Organizations need to blend modeled and non modeled data to create new data pipelines that can be explored to find anomalies and other insights. ABI platforms make all of this possible.


Mandatory Features

  • Data visualization: Support for highly interactive dashboards and the exploration of data through the manipulation of chart images.​
  • Governance: Governance capabilities track usage and manage how information is secured, shared and promoted.​
  • Reporting: This capability provides pixel-perfect, paginated reports that can be scheduled and burst to a large user community.​
  • Analytics catalog: Display of analytic content that makes it easy to find and consume. The catalog is searchable and makes recommendations to users.​
  • Data preparation: Support for drag-and-drop, user-driven combination of data from different sources, and creating analytic models (such as user-defined measures, sets, groups and hierarchies).
  • Data science integration: Capabilities that enable augmented development and prototyping of composable data science and machine learning (DSML) models by citizen data scientists and data scientists with sophisticated integration into the broader DMSL ecosystem.



DGM Platforms - 2024 

DGM-2024-Best Data and Analytics Governance Platforms Reviews 2024 _ Gartner Peer Insights.pdf  link

DGM-2024-Best Data and Analytics Governance Platforms Reviews 2024 _ Gartner Peer Insights.pdf file


What are Data and Analytics Governance Platforms?

A D&A governance platform is a set of integrated business capabilities that helps business leaders and users evaluate and implement a diverse set of governance policies and monitor and enforce those policies across their organizations’ business systems. These platforms are unique from data management and discrete governance tools in that data management and such tools focus on policy execution, whereas these platforms are used primarily by business roles — not only or even specifically IT roles.


Azure Purview is integrated in Azure platform, provides end to end data insight capabilities to enable data catalog, data classification and data lineage. Comparing with other similar products

Azure Fabric 


"Atlan: The New Cornerstone In Our Data Management Journey"

Atlan has proven to be an outstanding asset for our organization as we navigate the intricacies of active data management. Atlan seamlessly integrated into our modern data stack, with Snowflake
IDQ - Informatic Data Quality

"Balancing Diverse Financial Data Landscape"

Our organization recently build some kind of data center for our existing financial business units such as insurance, vehicle & heavy equipment financing, fintech, bank, etc. and surely we needs t

"Alation: A New Benchmark in Vendor Partnership Success"

We look for a real partnership with our vendors, and Alation has set new standards in this regard. They have been strongly invested in our successful implementation, providing support, training

Informatica Cloud Data Governance and Catalog
by Informatica

"Plan your Informatica Tool Suite Pathway "

Support and service have always been very responsive. Kept us current on updates to the products we use.


Collibra Data Intelligence Platform
by Collibra
"Great platform to keep your data under control"

Data is everything nowadays, so you need to manage it properly. This is the best tool to do so.



Step-by-step guide for Example



sample code block

sample code block
 



Recommended Next Steps