Tag: data cataloging

Top 5 Data Catalog Benefits

Post author By David Loshin
Post date August 7, 2019
No Comments on Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

erwin Expert Blog

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

Post author By Danny Sandwell
Post date July 12, 2019
1 Comment on Using Strategic Data Governance to Manage GDPR/CCPA Complexity

In light of recent, high-profile data breaches, it’s past-time we re-examined strategic data governance and its role in managing regulatory requirements.

News broke earlier this week of British Airways being fined 183 million pounds – or $228 million – by the U.K. for alleged violations of the European Union’s General Data Protection Regulation (GDPR). While not the first, it is the largest penalty levied since the GDPR went into effect in May 2018.

Given this, Oppenheimer & Co. cautions:

“European regulators could accelerate the crackdown on GDPR violators, which in turn could accelerate demand for GDPR readiness. Although the CCPA [California Consumer Privacy Act, the U.S. equivalent of GDPR] will not become effective until 2020, we believe that new developments in GDPR enforcement may influence the regulatory framework of the still fluid CCPA.”

With all the advance notice and significant chatter for GDPR/CCPA, why aren’t organizations more prepared to deal with data regulations?

In a word? Complexity.

The complexity of regulatory requirements in and of themselves is aggravated by the complexity of the business and data landscapes within most enterprises.

So it’s important to understand how to use strategic data governance to manage the complexity of regulatory compliance and other business objectives …

Designing and Operationalizing Regulatory Compliance Strategy

It’s not easy to design and deploy compliance in an environment that’s not well understood and difficult in which to maneuver. First you need to analyze and design your compliance strategy and tactics, and then you need to operationalize them.

Modern, strategic data governance, which involves both IT and the business, enables organizations to plan and document how they will discover and understand their data within context, track its physical existence and lineage, and maximize its security, quality and value. It also helps enterprises put these strategic capabilities into action by:

Understanding their business, technology and data architectures and their inter-relationships, aligning them with their goals and defining the people, processes and technologies required to achieve compliance.
Creating and automating a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand lineage.
Activating their metadata to drive agile data preparation and governance through integrated data glossaries and dictionaries that associate policies to enable stakeholder data literacy.

Five Steps to GDPR/CCPA Compliance

With the right technology, GDPR/CCPA compliance can be automated and accelerated in these five steps:

Catalog systems

Harvest, enrich/transform and catalog data from a wide array of sources to enable any stakeholder to see the interrelationships of data assets across the organization.

Govern PII “at rest”

Classify, flag and socialize the use and governance of personally identifiable information regardless of where it is stored.

Govern PII “in motion”

Scan, catalog and map personally identifiable information to understand how it moves inside and outside the organization and how it changes along the way.

Manage policies and rules

Govern business terminology in addition to data policies and rules, depicting relationships to physical data catalogs and the applications that use them with lineage and impact analysis views.

Strengthen data security

Identify regulatory risks and guide the fortification of network and encryption security standards and policies by understanding where all personally identifiable information is stored, processed and used.

How erwin Can Help

erwin is the only software provider with a complete, metadata-driven approach to data governance through our integrated enterprise modeling and data intelligence suites. We help customers overcome their data governance challenges, with risk management and regulatory compliance being primary concerns.

However, the erwin EDGE also delivers an “enterprise data governance experience” in terms of agile innovation and business transformation – from creating new products and services to keeping customers happy to generating more revenue.

Whatever your organization’s key drivers are, a strategic data governance approach – through business process, enterprise architecture and data modeling combined with data cataloging and data literacy – is key to success in our modern, digital world.

If you’d like to get a handle on handling your data, you can sign up for a free, one-on-one demo of erwin Data Intelligence.

For more information on GDPR/CCPA, we’ve also published a white paper on the Regulatory Rationale for Integrating Data Management and Data Governance.

erwin Expert Blog

The Data Governance (R)Evolution

Post author By Mariann McDonagh
Post date September 28, 2018
No Comments on The Data Governance (R)Evolution

Data governance continues to evolve – and quickly.

Historically, Data Governance 1.0 was siloed within IT and mainly concerned with cataloging data to support search and discovery. However, it fell short in adding value because it neglected the meaning of data assets and their relationships within the wider data landscape.

Then the push for digital transformation and Big Data created the need for DG to come out of IT’s shadows – Data Governance 2.0 was ushered in with principles designed for modern, data-driven business. This approach acknowledged the demand for collaborative data governance, the tearing down of organizational silos, and spreading responsibilities across more roles.

But this past year we all witnessed a data governance awakening – or as the Wall Street Journal called it, a “global data governance reckoning.” There was tremendous data drama and resulting trauma – from Facebook to Equifax and from Yahoo to Aetna. The list goes on and on. And then, the European Union’s General Data Protection Regulation (GDPR) took effect, with many organizations scrambling to become compliant.

So where are we today?

Simply put, data governance needs to be a ubiquitous part of your company’s culture. Your stakeholders encompass both IT and business users in collaborative relationships, so that makes data governance everyone’s business.

Data governance underpins data privacy, security and compliance. Additionally, most organizations don’t use all the data they’re flooded with to reach deeper conclusions about how to grow revenue, achieve regulatory compliance, or make strategic decisions. They face a data dilemma: not knowing what data they have or where some of it is—plus integrating known data in various formats from numerous systems without a way to automate that process.

To accelerate the transformation of business-critical information into accurate and actionable insights, organizations need an automated, real-time, high-quality data pipeline. Then every stakeholder—data scientist, ETL developer, enterprise architect, business analyst, compliance officer, CDO and CEO—can fuel the desired outcomes based on reliable information.

Connecting Data Governance to Your Organization

Data Mapping & Data Governance

The automated generation of the physical embodiment of data lineage—the creation, movement and transformation of transactional and operational data for harmonization and aggregation—provides the best route for enabling stakeholders to understand their data, trust it as a well-governed asset and use it effectively. Being able to quickly document lineage for a standardized, non-technical environment brings business alignment and agility to the task of building and maintaining analytics platforms.

Data Modeling & Data Governance

Data modeling discovers and harvests data schema, and analyzes, represents and communicates data requirements. It synthesizes and standardizes data sources for clarity and consistency to back up governance requirements to use only controlled data. It benefits from the ability to automatically map integrated and cataloged data to and from models, where they can be stored in a central repository for re-use across the organization.

Business Process Modeling & Data Governance

Business process modeling reveals the workflows, business capabilities and applications that use particular data elements. That requires that these assets be appropriately governed components of an integrated data pipeline that rests on automated data lineage and business glossary creation.

Enterprise Architecture & Data Governance

Data flows and architectural diagrams within enterprise architecture benefit from the ability to automatically assess and document the current data architecture. Automatically providing and continuously maintaining business glossary ontologies and integrated data catalogs inform a key part of the governance process.

The EDGE Revolution

By bringing together enterprise architecture, business process, data mapping and data modeling, erwin’s approach to data governance enables organizations to get a handle on how they handle their data and realize its maximum value. With the broadest set of metadata connectors and automated code generation, data mapping and cataloging tools, the erwin EDGE Platform simplifies the total data management and data governance lifecycle.

This single, integrated solution makes it possible to gather business intelligence, conduct IT audits, ensure regulatory compliance and accomplish any other organizational objective by fueling an automated, high-quality and real-time data pipeline.

The erwin EDGE creates an “enterprise data governance experience” that facilitates collaboration between both IT and the business to discover, understand and unlock the value of data both at rest and in motion.

With the erwin EDGE, data management and data governance are unified and mutually supportive of business stakeholders and IT to:

Discover data: Identify and integrate metadata from various data management silos.
Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source.
Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards.
Analyze data: Understand how data relates to the business and what attributes it has.
Map data flows: Identify where to integrate data and track how it moves and transforms.
Govern data: Develop a governance model to manage standards and policies and set best practices.
Socialize data: Enable stakeholders to see data in one place and in the context of their roles.

If you’ve enjoyed this latest blog series, then you’ll want to request a copy of Solving the Enterprise Data Dilemma, our new e-book that highlights how to answer the three most important data management and data governance questions: What data do we have? Where is it? And how do we get value from it?