Categories
erwin Expert Blog Data Intelligence

The Top 8 Benefits of Data Lineage

It’s important we recognize the benefits of data lineage.

As corporate data governance programs have matured, the inventory of agreed-to data policies has grown rapidly. These include guidelines for data quality assurance, regulatory compliance and data democratization, among other information utilization initiatives.

Organizations that are challenged by translating their defined data policies into implemented processes and procedures are starting to identify tools and technologies that can supplement the ways organizational data policies can be implemented and practiced.

One such technique, data lineage, is gaining prominence as a core operational business component of the data governance technology architecture. Data lineage encompasses processes and technology to provide full-spectrum visibility into the ways that data flow across the enterprise.

To data-driven businesses, the benefits of data lineage are significant. Data lineage tools are used to survey, document and enable data stewards to query and visualize the end-to-end flow of information units from their origination points through the series of transformation and processing stages to their final destination.

Benefits of Data Lineage

The Benefits of Data Lineage

Data stewards are attracted to data lineage because the benefits of data lineage help in a number of different governance practices, including:

1. Operational intelligence

At its core, data lineage captures the mappings of the rapidly growing number of data pipelines in the organization. Visualizing the information flow landscape provides insight into the “demographics” of data consumption and use, answering questions such as “what data sources feed the greatest number of downstream sources” or “which data analysts use data that is ingested from a specific data source.” Collecting this intelligence about the data landscape better positions the data stewards for enforcing governance policies.

2. Business terminology consistency

One of the most confounding data governance challenges is understanding the semantics of business terminology within data management contexts. Because application development was traditionally isolated within each business function, the same (or similar) terms are used in different data models, even though the designers did not take the time to align definitions and meanings. Data lineage allows the data stewards to find common business terms, review their definitions, and determine where there are inconsistencies in the ways the terms are used.

3. Data incident root cause analysis

It has long been asserted that when a data consumer finds a data error, the error most likely was introduced into the environment at an earlier stage of processing. Yet without a “roadmap” that indicates the processing stages through which the data were processed, it is difficult to speculate where the error was actually introduced. Using data lineage, though, a data steward can insert validation probes within the information flow to validate data values and determine the stage in the data pipeline where an error originated.

4. Data quality remediation assessment

Root cause analysis is just the first part of the data quality process. Once the data steward has determined where the data flaw was introduced, the next step is to determine why the error occurred. Again, using a data lineage mapping, the steward can trace backward through the information flow to examine the standardizations and transformations applied to the data, validate that transformations were correctly performed, or identify one (or more) performed incorrectly, resulting in the data flaw.

5. Impact analysis

The enterprise is always subject to changes; externally-imposed requirements (such as regulatory compliance) evolve, internal business directives may affect user expectations, and ingested data source models may change unexpectedly. When there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape. In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

6. Performance assessment

Not only does lineage provide a collection of mappings of data pipelines, it allows for the identification of potential performance bottlenecks. Data pipeline stages with many incoming paths are candidate bottlenecks. Using a set of data lineage mappings, the performance analyst can profile execution times across different pipelines and redistribute processing to eliminate bottlenecks.

7. Policy compliance

Data policies can be implemented through the specification of business rules. Compliance with these business rules can be facilitated using data lineage by embedding business rule validation controls across the data pipelines. These controls can generate alerts when there are noncompliant data instances.

8. Auditability of data pipelines

In many cases, regulatory compliance is a combination of enforcing a set of defined data policies along with a capability for demonstrating that the overall process is compliant. Data lineage provides visibility into the data pipelines and information flows that can be audited thereby supporting the compliance process.

Evaluating Enterprise Data Lineage Tools

While data lineage benefits are obvious, large organizations with complex data pipelines and data flows do face challenges in embracing the technology to document the enterprise data pipelines. These include:

  • Surveying the enterprise – Gathering information about the sources, flows and configurations of data pipelines.
  • Maintenance – Configuring a means to maintain an up-to-date view of the data pipelines.
  • Deliverability – Providing a way to give data consumers visibility to the lineage maps.
  • Sustainability – Ensuring sustainability of the processes for producing data lineage mappings.

Producing a collection of up-to-date data lineage mappings that are easily reviewed by different data consumers depends on addressing these challenges. When considering data lineage tools, keep these issues in mind when evaluating how well the tools can meet your data governance needs.

erwin Data Intelligence (erwin DI) helps organizations automate their data lineage initiatives. Learn more about data lineage with erwin DI.

Value of Data Intelligence IDC Report

Categories
erwin Expert Blog Data Governance

Data Governance Frameworks: The Key to Successful Data Governance Implementation

A strong data governance framework is central to successful data governance implementation in any data-driven organization because it ensures that data is properly maintained, protected and maximized.

But despite this fact, enterprises often face push back when implementing a new data governance initiative or trying to mature an existing one.

Let’s assume you have some form of informal data governance operation with some strengths to build on and some weaknesses to correct. Some parts of the organization are engaged and behind the initiative, while others are skeptical about its relevance or benefits.

Some other common data governance implementation obstacles include:

  • Questions about where to begin and how to prioritize which data streams to govern first
  • Issues regarding data quality and ownership
  • Concerns about data lineage
  • Competing project and resources (time, people and funding)

By using a data governance framework, organizations can formalize their data governance implementation and subsequent adherence to. This addressess common concerns including data quality and data lineage, and provides a clear path to successful data governance implementation.

In this blog, we will cover three key steps to successful data governance implementation. We will also look into how we can expand the scope and depth of a data governance framework to ensure data governance standards remain high.

Data Governance Implementation in 3 Steps

When maturing or implementing data governance and/or a data governance framework, an accurate assessment of the ‘here and now’ is key. Then you can rethink the path forward, identifying any current policies or business processes that should be incorporated, being careful to avoid making the same mistakes of prior iterations.

With this in mind, here are three steps we recommend for implementing data governance and a data governance framework.

Data Governance Framework

Step 1: Shift the culture toward data governance

Data governance isn’t something to set and forget; it’s a strategic approach that needs to evolve over time in response to new opportunities and challenges. Therefore, a successful data governance framework has to become part of the organization’s culture but such a shift requires listening – and remembering that it’s about people, empowerment and accountability.

In most cases, a new data governance framework requires people – those in IT and across the business, including risk management and information security – to change how they work. Any concerns they raise or recommendations they make should be considered. You can encourage feedback through surveys, workshops and open dialog.

Once input has been discussed and plan agreed upon, it is critical to update roles and responsibilities, provide training and ensure ongoing communication. Many organizations now have internal certifications for different data governance roles who wear these badges with pride.

A top-down management approach will get a data governance initiative off the ground, but only bottom-up cultural adoption will carry it out.

Step 2: Refine the data governance framework

The right capabilities and tools are important for fueling an accurate, real-time data pipeline and governing it for maximum security, quality and value. For example:

Data catalogingOrganization’s implementing a data governance framework will benefit from automated metadata harvesting, data mapping, code generation and data lineage with reference data management, lifecycle management and data quality. With these capabilities, you can  efficiently integrate and activate enterprise data within a single, unified catalog in accordance with business requirements.

Data literacy Being able to discover what data is available and understand what it means in common, standardized terms is important because data elements may mean different things to different parts of the organization. A business glossary answers this need, as does the ability for stakeholders to view data relevant to their roles and understand it within a business context through a role-based portal.

Such tools are further enhanced if they can be integrated across data and business architectures and when they promote self-service and collaboration, which also are important to the cultural shift.

 

Subscribe to the erwin Expert Blog

Once you submit the trial request form, an erwin representative will be in touch to verify your request and help you start data modeling.

 

 

Step 3: Prioritize then scale the data governance framework

Because data governance is on-going, it’s important to prioritize the initial areas of focus and scale from there. Organizations that start with 30 to 50 data items are generally more successful than those that attempt more than 1,000 in the early stages.

Find some representative (familiar) data items and create examples for data ownership, quality, lineage and definition so stakeholders can see real examples of the data governance framework in action. For example:

  • Data ownership model showing a data item, its definition, producers, consumers, stewards and quality rules (for profiling)
  • Workflow showing the creation, enrichment and approval of the above data item to demonstrate collaboration

Whether your organization is just adopting data governance or the goal is to refine an existing data governance framework, the erwin DG RediChek will provide helpful insights to guide you in the journey.