Categories
erwin Expert Blog Data Governance Data Intelligence

Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

Data Lineage

Categories
erwin Expert Blog

Solving the Enterprise Data Dilemma

Due to the adoption of data-driven business, organizations across the board are facing their own enterprise data dilemmas.

This week erwin announced its acquisition of metadata management and data governance provider AnalytiX DS. The combined company touches every piece of the data management and governance lifecycle, enabling enterprises to fuel automated, high-quality data pipelines for faster speed to accurate, actionable insights.

Why Is This a Big Deal?

From digital transformation to AI, and everything in between, organizations are flooded with data. So, companies are investing heavily in initiatives to use all the data at their disposal, but they face some challenges. Chiefly, deriving meaningful insights from their data – and turning them into actions that improve the bottom line.

This enterprise data dilemma stems from three important but difficult questions to answer: What data do we have? Where is it? And how do we get value from it?

Large enterprises use thousands of unharvested, undocumented databases, applications, ETL processes and procedural code that make it difficult to gather business intelligence, conduct IT audits, and ensure regulatory compliance – not to mention accomplish other objectives around customer satisfaction, revenue growth and overall efficiency and decision-making.

The lack of visibility and control around “data at rest” combined with “data in motion”, as well as difficulties with legacy architectures, means these organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

To remedy this, enterprises need smarter and faster data management and data governance capabilities, including the ability to efficiently catalog and document their systems, processes and the associated data without errors. In addition, business and IT must collaborate outside their traditional operational silos.

But this coveted state of data nirvana isn’t possible without the right approach and technology platform.

Enterprise Data: Making the Data Management-Data Governance Love Connection

Enterprise Data: Making the Data Management-Data Governance Love Connection

Bringing together data management and data governance delivers greater efficiencies to technical users and better analytics to business users. It’s like two sides of the same coin:

  • Data management drives the design, deployment and operation of systems that deliver operational and analytical data assets.
  • Data governance delivers these data assets within a business context, tracks their physical existence and lineage, and maximizes their security, quality and value.

Although these disciplines approach data from different perspectives and are used to produce different outcomes, they have a lot in common. Both require a real-time, accurate picture of an organization’s data landscape, including data at rest in data warehouses and data lakes and data in motion as it is integrated with and used by key applications.

However, creating and maintaining this metadata landscape is challenging because this data in its various forms and from numerous sources was never designed to work in concert. Data infrastructures have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration, so the applications and initiatives that depend on data infrastructure are often out-of-date and inaccurate, rendering faulty insights and analyses.

Organizations need to know what data they have and where it’s located, where it came from and how it got there, what it means in common business terms [or standardized business terms] and be able to transform it into useful information they can act on – all while controlling its access.

To support the total enterprise data management and governance lifecycle, they need an automated, real-time, high-quality data pipeline. Then every stakeholder – data scientist, ETL developer, enterprise architect, business analyst, compliance officer, CDO and CEO – can fuel the desired outcomes with reliable information on which to base strategic decisions.

Enterprise Data: Creating Your “EDGE”

At the end of the day, all industries are in the data business and all employees are data people. The success of an organization is not measured by how much data it has, but by how well it’s used.

Data governance enables organizations to use their data to fuel compliance, innovation and transformation initiatives with greater agility, efficiency and cost-effectiveness.

Organizations need to understand their data from different perspectives, identify how it flows through and impacts the business, aligns this business view with a technical view of the data management infrastructure, and synchronizes efforts across both disciplines for accuracy, agility and efficiency in building a data capability that impacts the business in a meaningful and sustainable fashion.

The persona-based erwin EDGE creates an “enterprise data governance experience” that facilitates collaboration between both IT and the business to discover, understand and unlock the value of data both at rest and in motion.

By bringing together enterprise architecture, business process, data mapping and data modeling, erwin’s approach to data governance enables organizations to get a handle on how they handle their data. With the broadest set of metadata connectors and automated code generation, data mapping and cataloging tools, the erwin EDGE Platform simplifies the total data management and data governance lifecycle.

This single, integrated solution makes it possible to gather business intelligence, conduct IT audits, ensure regulatory compliance and accomplish any other organizational objective by fueling an automated, high-quality and real-time data pipeline.

With the erwin EDGE, data management and data governance are unified and mutually supportive, with one hand aware and informed by the efforts of the other to:

  • Discover data: Identify and integrate metadata from various data management silos.
  • Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source.
  • Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards.
  • Analyze data: Understand how data relates to the business and what attributes it has.
  • Map data flows: Identify where to integrate data and track how it moves and transforms.
  • Govern data: Develop a governance model to manage standards and policies and set best practices.
  • Socialize data: Enable stakeholders to see data in one place and in the context of their roles.

An integrated solution with data preparation, modeling and governance helps businesses reach data governance maturity – which equals a role-based, collaborative data governance system that serves both IT and business users equally. Such maturity may not happen overnight, but it will ultimately deliver the accurate and actionable insights your organization needs to compete and win.

Your journey to data nirvana begins with a demo of the enhanced erwin Data Governance solution. Register now.

erwin ADS webinar