The Ultimate Guide to Data GovernanceAdd bookmark
The world of data management and analytics has come a long way since 1970, the year IBM mathematician Edgar F Codd introduced his “relational database” framework. A precursor to modern data lakes and other data management systems, it was the first to store information in a hierarchical format and make data easily accessible to anyone, not just data scientists.
The Covid 19 pandemic has amplified the importance and value of enterprise data to unprecedented heights. According to an October 2020 study by Teradata, 91% of global business leaders surveyed said the importance of data within their organizations has “skyrocketed” ever since the onset of COVID-19. In addition, 88% of executives view data as a strategic asset to their business, while 94% agree data is an essential asset and more importantly, key to recovery and the path moving forward.
However, data-driven insights and technologies are only as accurate as the data going into them. As the old adage goes, “bad data in means bad data out.” Ensuring data accuracy across the data management and analytics lifecycle is not just about delivering meaningful business insights (though that is very important), but also about building trust with customers, employees and other stakeholders.
Siloed applications focused on specific tasks are increasingly becoming things of the past. Forward-thinking organizations are now looking to integrate data flows across systems and lines of business. As next generation technologies such as machine learning (ML), artificial intelligence (AI) and predictive analytics run on data, ensuring data accuracy and quality is paramount to the successful implementation of these tools.
In fact, according to the same Teradata study cited above, 77% of global business leaders say that their organizations are more focused on data accuracy than ever before.
Given the importance of enterprise data to an organization’s current and future success, it should no longer fall solely on the shoulders of data scientists and analytics teams to “handle” data. As data is a shared organizational asset, everyone across the enterprise should be responsible for ensuring data is properly collected, stored and used.
What is Data Governance?
The Data Governance Institute defines data governance as "a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods."
In other words, data governance refers to the people, processes, and technologies involved with data acquisition, archiving and usage.
While “data management” is a technical discipline concerned with controlling and organizing data, “data governance” is, essentially, a business strategy for data.
As Gregory Vial, assistant professor of IT at HEC Montréal, wrote in a recent article for the MIT Sloan Management Review, data governance should be “a bridge that translates a strategic vision acknowledging the importance of data for the organization and codifying it into practices and guidelines that support operations, ensuring that products and services are delivered to customers.”
Generally speaking, the goals of data governance are to:
- Define what constitutes as data
- Establish internal rules for data usage
- Maximize data accuracy and usability by standardizing data systems, policies, procedures and data standards
- Define roles and assign accountability to employees responsible for data assets throughout its lifecycle
- Protect data from external and internal threats through access management
- Maintain regulatory compliance
- Streamline and strengthen data-related training efforts
- Implement improved monitoring and tracking mechanisms for Data Quality and other data-related activities
- Promote data literacy and a shared understanding of data as an asset
What is a Data Governance Framework
Essentially a how-to-guide for your data governance efforts, a “Data Governance Framework” spells out how organizations set up and enforce data governance efforts. In other words, it formally documents all data-related policies and procedures.
*Image above sourced from "The DGI Data Governance Framework © The Data Governance Institute," https://profisee.com/data-governance-what-why-how-who/#a8
Pillars of Data Governance
According to DAMA international, data governance frameworks should address:
Perhaps more than anything else, one of the primary deliverables of data governance is data quality. By controlling how data within an organization is collected, stored, processed and managed, data governance helps ensure data is accurate, complete, timely, and consistent with all requirements and business rules.
There are six common dimensions of data quality standards that data governance should address:
- Completeness / Comprehensiveness
- Consistency / Reliability
- Validity / Integrity
Data Ownership & Data Stewardship
Data ownership and stewardship refer to the “who” of data governance. It outlines who is responsible for what data-related activity.
Effective data governance frameworks not only assign responsibilities, but also include a well-documented description of the roles and how they all interact. According to SaS, these roles usually include:
- The data governance council. Comprised of senior staff familiar with both the operations and strategic direction of the organization, they are responsible for determining the high-level policies of the program and approving the procedures developed to carry out those policies.
- Data Owners. Business and IT leaders who are responsible for ensuring that information within a specific data domain is governed across systems and lines of business. They provide feedback to the council and get regular updates on the progress of the program.
- Data Stewards. The subject-matter experts responsible for executing the policies enacted by the data governance council. They are responsible for the quality of the data in the organization, helping maximize its value.
- Data producers or consumers. Those who create data through an application or use data to drive decisions as part of a business process. They are the ones who execute the data governance strategy.
The physical manifestation of data governance strategy, data architecture “defines the blueprint for managing data assets by aligning with organizational strategy to establish strategic data requirements and designs to meet these requirements.” The standardization of policies and procedures in the data architecture prevents duplication of effort and reduces complexity caused by multivariate implementations of similar operations
Data Modeling and Design
“The process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model,” data modeling and design helps organizations better understand and manage massive volumes of data. According to Dataversity, data Modeling typically focuses on the design of a specific database at the physical level, or a particular business area at the logical or conceptual level.
Data Storage and Operations
Many organizations lack consistent management policies as well as utilize multiple databases with differing levels of data protection, security, and service level delivery. This lack of consistent oversight increases the risk of data breach and loss.
Your data governance strategy should include management policies around database operations. Typical policies include controlling database environments, performance levels and service delivery, data protection, lifecycle management, and licensing.
Metadata is the data that describes other data. The unsung hero of data analytics, metadata refers to the granular information on one specific data such as file type, format, origin, date, etc. This “lineage” provides context for data usage as well as proves data integrity and helps establish trust.
Given the complexity of data systems, creating and sustaining an enterprise-wide view of and easy access to underlying metadata can be challenging. However, as metadata “encapsulates the conceptual, logical, and physical information required to transform disparate data sets into a coherent set of models for analysis,” it is absolutely critical to include a metadata management strategy in your data governance framework.
Cyber crimes, and the costs associated with it, are on the rise. In fact, according to the Ponemon Institute, security breaches have increased by 11% since 2018 and 67% since 2014. Furthermore, in 2019, the average cost of a data breach was $3.92 million.
By defining processes for safeguarding and accessing data, data governance protects against data breaches as well as inappropriate use of data. It also helps ensure data is classified and stored according to its sensitivity.
Data integration and interoperability
Advanced and predictive analytics require the seamless integration of information from a wide array of sources, applications and formats. By establishing standards for all data uses including common data definitions and data quality best practices, a robust data governance approach helps accelerate data integration and systems interoperability.
Unstructured Data Management
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. For example, emails, images and other documents that don’t reside in a traditional database format.
The volume of unstructured data is growing exponentially. In fact, Gartner estimates that as much as 80% of enterprise data is unstructured. However, just because data is “unstructured” doesn’t mean it’s any less valuable. Though unstructured data contains many quality dimensions and can be especially difficult to validate, classify and organize, it cannot be overlooked given the severity of security and regulatory risk involved in doing so.
Reference and Main Data
Reference and main data provide the contextual capabilities for transactional data. It enables organizations to understand operational data and analyze disparately collected data effectively.
As Anne Marie Smith, Ph.D., CDMP puts it, main data “are the critical nouns of a business, and generally fall into four groupings: people (e.g., customer, employee, vendor, etc.), things (e.g., product, item, widget, etc.), places (e.g., office locations and geographic divisions), and concepts (e.g., contract, claim, account, etc.).”
While Main Data Management (MDM) is the process of defining and maintaining how main data will be created, integrated, maintained, and used throughout the enterprise, Data Governance creates the rules and adjudication of the operational processes that are executed within those processes. In other words, as the rules created within data governance ensure quality and privacy of the master data, MDM requires data governance.
Data Warehousing, Business Intelligence (BI) and Analytics
At many organizations, data warehousing, business intelligence (BI) and analytics have evolved into a separate data management system. Effective data governance of these systems helps optimize analytical data processing and enables improved access to decision support data for reporting and analysis. In addition, by creating a unified understanding of data, data governance encourages collaboration across the enterprise and leads to more dynamic uses of analytics.
According to a 2016 Forbes survey of 400+ senior executives over, 78% said that data governance was either vital or important to their BI operations, and 65% said governance is a useful means to empower end-users to uncover new insights.
From HIPAA to GDPR, there are numerous global regulations in place designed to protect people’s privacy and ensure good business practices. By boosting data accuracy and streamlining reporting capabilities, data governance helps organizations stay compliant with these regulations.
Data Governance Solutions
As data governance is more of a strategic undertaking vs. a technical discipline, there is no one size fits all solution. However, there are numerous tools and technologies out there that can help organizations more effectively and holistically govern, control and protect enterprise data.
Some examples of these solutions are:
- Collibra Data Governance helps organizations understand their ever-growing amounts of data in a way that scales with growth and change, so that teams can trust and use their data to improve their business.
- The erwin Data Intelligence Suite (erwin DI) combines data catalog and data literacy capabilities for greater awareness of and access to available data assets, guidance on their use, and guardrails to ensure data policies and best practices are followed.
- SAS Data Management helps you access the data you need, create rules, collaborate with other teams and manage metadata so you're prepared to run analytics for better decision making.
- Informatica Axon is designed to engage all constituencies, technical and business, to effectively govern an organization's data. The solution enables enterprise data governance programs across a wide array of industries including highly regulated industries such as financial services, healthcare, life sciences, insurance and others to connect people and data for better business decisions.
Agile Data Governance
Effective data governance is anything but a “one-size-fits-all” set of rules and requirements. Though setting a clear set of robust rules and procedures to ensure data quality and security is a must, these guidelines should not be intense that it hinders the strategic use of data.
It’s also important to remember that not all data is created equal. Data that is especially sensitive, (i.e. medical records), should be subject to very different standards than data that is less so. With that in mind, it’s critical that you account for these nuances within your data governance framework.
Effective data governance requires a delicate balance of control and agility. Some key considerations for building a flexible data governance approach are:
- Have a clear focus, but don’t be overly specific
- Be creative about enforcement
- Ensure scalability and leverage flexible architectures
Got a minute to share your story? We invite you to take 3 minutes to complete our survey on your biggest data governance challenges, successes and objectives.
Can't access the survey above? Try accessing it here: https://www.surveymonkey.com/r/57KKZVN