Tag: data mapping

How Metadata Makes Data Meaningful

Post author By Mariann McDonagh
Post date December 12, 2019
No Comments on How Metadata Makes Data Meaningful

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

So most early-stage data governance managers kick off a series of projects to profile data, make inferences about data element structure and format, and store the presumptive metadata in some metadata repository. But are these rampant and often uncontrolled projects to collect metadata properly motivated?

There is rarely a clear directive about how metadata is used. Therefore prior to launching metadata collection tasks, it is important to specifically direct how the knowledge embedded within the corporate metadata should be used.

Managing metadata should not be a sub-goal of data governance. Today, metadata is the heart of enterprise data management and governance/ intelligence efforts and should have a clear strategy – rather than just something you do.

What Is Metadata?

Quite simply, metadata is data about data. It’s generated every time data is captured at a source, accessed by users, moved through an organization, integrated or augmented with other data from other sources, profiled, cleansed and analyzed. Metadata is valuable because it provides information about the attributes of data elements that can be used to guide strategic and operational decision-making. It answers these important questions:

What data do we have?
Where did it come from?
Where is it now?
How has it changed since it was originally created or captured?
Who is authorized to use it and how?
Is it sensitive or are there any risks associated with it?

The Role of Metadata in Data Governance

Organizations don’t know what they don’t know, and this problem is only getting worse. As data continues to proliferate, so does the need for data and analytics initiatives to make sense of it all. Here are some benefits of metadata management for data governance use cases:

Better Data Quality: Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair.
Quicker Project Delivery: Accelerate Big Data deployments, Data Vaults, data warehouse modernization, cloud migration, etc., by up to 70 percent.
Faster Speed to Insights: Reverse the current 80/20 rule that keeps high-paid knowledge workers too busy finding, understanding and resolving errors or inconsistencies to actually analyze source data.
Greater Productivity & Reduced Costs: Being able to rely on automated and repeatable metadata management processes results in greater productivity. Some erwin customers report productivity gains of 85+% for coding, 70+% for metadata discovery, up to 50% for data design, up to 70% for data conversion, and up to 80% for data mapping.
Regulatory Compliance: Regulations such as GDPR, HIPAA, PII, BCBS and CCPA have data privacy and security mandates, so sensitive data needs to be tagged, its lineage documented, and its flows depicted for traceability.
Digital Transformation: Knowing what data exists and its value potential promotes digital transformation by improving digital experiences, enhancing digital operations, driving digital innovation and building digital ecosystems.
Enterprise Collaboration: With the business driving alignment between data governance and strategic enterprise goals and IT handling the technical mechanics of data management, the door opens to finding, trusting and using data to effectively meet organizational objectives.

Giving Metadata Meaning

So how do you give metadata meaning? While this sounds like a deep philosophical question, the reality is the right tools can make all the difference.

erwin Data Intelligence (erwin DI) combines data management and data governance processes in an automated flow.

It’s unique in its ability to automatically harvest, transform and feed metadata from a wide array of data sources, operational processes, business applications and data models into a central data catalog and then make it accessible and understandable within the context of role-based views.

What Is Data Modeling: Conceptual, Logical and Physical. The Types of Data Model Explained

erwin DI sits on a common metamodel that is open, extensible and comes with a full set of APIs. A comprehensive list of erwin-owned standard data connectors are included for automated harvesting, refreshing and version-controlled metadata management. Optional erwin Smart Data Connectors reverse-engineer ETL code of all types and connect bi-directionally with reporting and other ecosystem tools. These connectors offer the fastest and most accurate path to data lineage, impact analysis and other detailed graphical relationships.

Additionally, erwin DI is part of the larger erwin EDGE platform that integrates data modeling, enterprise architecture, business process modeling, data cataloging and data literacy. We know our customers need an active metadata-driven approach to:

Understand their business, technology and data architectures and the relationships between them
Create an automate a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand lineage
Activate their metadata to drive agile and well-governed data preparation with integrated business glossaries and data dictionaries that provide business context for stakeholder data literacy

erwin was named a Leader in Gartner’s “2019 Magic Quadrant for Metadata Management Solutions.”

Click here to get a free copy of the report.

Click here to request a demo of erwin DI.

erwin Expert Blog

Metadata Management, Data Governance and Automation

Post author By Mariann McDonagh
Post date November 6, 2019
No Comments on Metadata Management, Data Governance and Automation

Can the 80/20 Rule Be Reversed?

erwin released its State of Data Governance Report in February 2018, just a few months before the General Data Protection Regulation (GDPR) took effect.

This research showed that the majority of responding organizations weren’t actually prepared for GDPR, nor did they have the understanding, executive support and budget for data governance – although they recognized the importance of it.

Download Free GDPR Guide | Step By Step Guide to Data Governance for GDPR‎

Of course, data governance has evolved with astonishing speed, both in response to data privacy and security regulations and because organizations see the potential for using it to accomplish other organizational objectives.

But many of the world’s top brands still seem to be challenged in implementing and sustaining effective data governance programs (hello, Facebook).

We wonder why.

Too Much Time, Too Few Insights

According to IDC’s “Data Intelligence in Context” Technology Spotlight sponsored by erwin, “professionals who work with data spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analytics.”

Specifically, 80 percent of data professionals’ time is spent on data discovery, preparation and protection, and only 20 percent on analysis leading to insights.

IDC Technology Spotlight, Data Intelligence in Context: Get the report (… it’s free)

In most companies, an incredible amount of data flows from multiple sources in a variety of formats and is constantly being moved and federated across a changing system landscape.

Often these enterprises are heavily regulated, so they need a well-defined data integration model that will help avoid data discrepancies and remove barriers to enterprise business intelligence and other meaningful use.

IT teams need the ability to smoothly generate hundreds of mappings and ETL jobs. They need their data mappings to fall under governance and audit controls, with instant access to dynamic impact analysis and data lineage.

But most organizations, especially those competing in the digital economy, don’t have enough time or money for data management using manual processes. Outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too.

How to automate data mapping

The Role of Data Automation

Data governance maturity includes the ability to rely on automated and repeatable processes.

For example, automatically importing mappings from developers’ Excel sheets, flat files, Access and ETL tools into a comprehensive mappings inventory, complete with automatically generated and meaningful documentation of the mappings, is a powerful way to support governance while providing real insight into data movement — for data lineage and impact analysis — without interrupting system developers’ normal work methods.

GDPR compliance, for instance, requires a business to discover source-to-target mappings with all accompanying transactions, such as what business rules in the repository are applied to it, to comply with audits.

When data movement has been tracked and version-controlled, it’s possible to conduct data archeology — that is, reverse-engineering code from existing XML within the ETL layer — to uncover what has happened in the past and incorporating it into a mapping manager for fast and accurate recovery.

With automation, data professionals can meet the above needs at a fraction of the cost of the traditional, manual way. To summarize, just some of the benefits of data automation are:

• Centralized and standardized code management with all automation templates stored in a governed repository
• Better quality code and minimized rework
• Business-driven data movement and transformation specifications
• Superior data movement job designs based on best practices
• Greater agility and faster time-to-value in data preparation, deployment and governance
• Cross-platform support of scripting languages and data movement technologies

One global pharmaceutical giant reduced costs by 70 percent and generated 95 percent of production code with “zero touch.” With automation, the company improved the time to business value and significantly reduced the costly re-work associated with error-prone manual processes.

Help Us Help You by Taking a Brief Survey

With 2020 just around the corner and another data regulation about to take effect, the California Consumer Privacy Act (CCPA), we’re working with Dataversity on another research project.

What is the difference between GDPR and CCPA?

And this time, you guessed it – we’re focusing on data automation and how it could impact metadata management and data governance.

We would appreciate your input and will release the findings in January 2020.

Click here to take the brief survey

erwin Expert Blog

Top 5 Data Catalog Benefits

Post author By David Loshin
Post date August 7, 2019
No Comments on Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

erwin Expert Blog

A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

Post author By Bunny Tharpe
Post date April 18, 2019
No Comments on A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

California Consumer Privacy Act (CCPA) compliance shares many of the same requirements in the European Unions’ General Data Protection Regulation (GDPR).

While the CCPA has been signed into law, organizations have until Jan. 1, 2020, to enact its mandates. Luckily, many organizations have already laid the regulatory groundwork for it because of their efforts to comply with GDPR.

However, there are some key differences that we’ll explore in the Q&A below.

Data governance, thankfully, provides a framework for compliance with either or both – in addition to other regulatory mandates your organization may be subject to.

CCPA Compliance Requirements vs. GDPR FAQ

Does CCPA apply to not-for-profit organizations?

No, CCPA compliance only applies to for-profit organizations. GDPR compliance is required for any organization, public or private (including not-for-profit).

What for-profit businesses does CCPA apply to?

The mandate for CCPA compliance only applies if a for-profit organization:

Has an annual gross revenue exceeding $25 million
Collects, sells or shares the personal data of 50,000 or more consumers, households or devices
Earns 50% of more of its annual revenue by selling consumers’ personal information

Does the CCPA apply outside of California?

As the name suggests, the legislation is designed to protect the personal data of consumers who reside in the state of California.

But like GDPR, CCPA compliance has impacts outside the area of origin. This means businesses located outside of California, but selling to (or collecting the data of) California residents must also comply.

Does the CCPA exclude anything that GDPR doesn’t?

GDPR encompasses all categories of “personal data,” with no distinctions.

CCPA does make distinctions, particularly when other regulations may overlap. These include:

Medical information covered by the Confidentiality of Medical Information Act (CMIA) and the Health Insurance Portability and Accountability Act (HIPAA)
Personal information covered by the Gramm-Leach-Bliley Act (GLBA)
Personal information covered by the Driver’s Privacy Protection Act (DPPA)
Clinical trial data
Information sold to or by consumer reporting agencies
Publicly available personal information (federal, state and local government records)

What about access requests?

Under the GDPR, organizations must make any personal data collected from an EU citizen available upon request.

CCPA compliance only requires data collected within the last 12 months to be shared upon request.

Does the CCPA include the right to opt out?

CCPA, like GDPR, empowers gives consumers/citizens the right to opt out in regard to the processing of their personal data.

However, CCPA compliance only requires an organization to observe an opt-out request when it comes to the sale of personal data. GDPR does not make any distinctions between “selling” personal data and any other kind of data processing.

To meet CCPA compliance opt-out standards, organizations must provide a “Do Not Sell My Personal Information” link on their home pages.

Does the CCPA require individuals to willingly opt in?

No. Whereas the GDPR requires informed consent before an organization sells an individual’s information, organizations under the scope of the CCPA can still assume consent. The only exception involves the personal information of children (under 16). Children over 13 can consent themselves, but if the consumer is a child under 13, a parent or guardian must authorize the sale of said child’s personal data.

What about fines for CCPA non-compliance?

In theory, fines for CCPA non-compliance are potentially more far reaching than those of GDPR because there is no ceiling for CCPA penalties. Under GDPR, penalties have a ceiling of 4% of global annual revenue or €20 million, whichever is greater. GDPR recently resulted in a record fine for Google.

Organizations outside of CCPA compliance can only be fined up to $7,500 per violation, but there is no upper ceiling.

Data Governance for Regulatory Compliance

While CCPA has a more narrow geography and focus than GDPR, compliance is still a serious effort for organizations under its scope. And as data-driven business continues to expand, so too will the pressure on lawmakers to regulate how organizations process data. Remember the Facebook hearings and now inquiries into Google and Twitter, for example?

Regulatory compliance remains a key driver for data governance. After all, to understand how to meet data regulations, an organization must first understand its data.

An effective data governance initiative should enable just that, by giving an organization the tools to:

Discover data: Identify and interrogate metadata from various data management silos
Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source
Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards
Analyze data: Understand how data relates to the business and what attributes it has
Map data flows: Identify where to integrate data and track how it moves and transforms
Govern data: Develop a governance model to manage standards and policies and set best practices
Socialize data: Enable all stakeholders to see data in one place in their own context

A Regulatory EDGE

The erwin EDGE software platform creates an “enterprise data governance experience” to transform how all stakeholders discover, understand, govern and socialize data assets. It includes enterprise modeling, data cataloging and data literacy capabilities, giving organizations visibility and control over their disparate architectures and all the supporting data.

Both IT and business stakeholders have role-based, self-service access to the information they need to collaborate in making strategic decisions. And because many of the associated processes can be automated, you reduce errors and increase the speed and quality of your data pipeline. This data intelligence unlocks knowledge and value.

The erwin EDGE provides the most agile, efficient and cost-effective means of launching and sustaining a strategic and comprehensive data governance initiative, whether you wish to deploy on premise or in the cloud. But you don’t have to implement every component of the erwin EDGE all at once to see strategic value.

Because of the platform’s federated design, you can address your organization’s most urgent needs, such as regulatory compliance, first. Then you can proactively address other organization objectives, such as operational efficiency, revenue growth, increasing customer satisfaction and improving overall decision-making.

You can learn more about leveraging data governance to navigate the changing tide of data regulations here.

erwin Expert Blog

What’s Business Process Modeling Got to Do with It? – Choosing A BPM Tool

Post author By Bunny Tharpe
Post date March 21, 2019
No Comments on What’s Business Process Modeling Got to Do with It? – Choosing A BPM Tool

With business process modeling (BPM) being a key component of data governance, choosing a BPM tool is part of a dilemma many businesses either have or will soon face.

Historically, BPM didn’t necessarily have to be tied to an organization’s data governance initiative.

However, data-driven business and the regulations that oversee it are becoming increasingly extensive, so the need to view data governance as a collective effort – in terms of personnel and the tools that make up the strategy – is becoming harder to ignore.

Data governance also relies on business process modeling and analysis to drive improvement, including identifying business practices susceptible to security, compliance or other risks and adding controls to mitigate exposures.

Choosing a BPM Tool: An Overview

As part of a data governance strategy, a BPM tool aids organizations in visualizing their business processes, system interactions and organizational hierarchies to ensure elements are aligned and core operations are optimized.

The right BPM tool also helps organizations increase productivity, reduce errors and mitigate risks to achieve strategic objectives.

With insights from the BPM tool, you can clarify roles and responsibilities – which in turn should influence an organization’s policies about data ownership and make data lineage easier to manage.

Organizations also can use a BPM tool to identify the staff who function as “unofficial data repositories.” This has both a primary and secondary function:

1. Organizations can document employee processes to ensure vital information isn’t lost should an employee choose to leave.

2. It is easier to identify areas where expertise may need to be bolstered.

Organizations that adopt a BPM tool also enjoy greater process efficiency. This is through a combination of improving existing processes or designing new process flows, eliminating unnecessary or contradictory steps, and documenting results in a shareable format that is easy to understand so the organization is pulling in one direction.

Silo Buster

Understanding the typical use cases for business process modeling is the first step. As with any tech investment, it’s important to understand how the technology will work in the context of your organization/business.

For example, it’s counter-productive to invest in a solution that reduces informational silos only to introduce a new technological silo through a lack of integration.

Ideally, organizations want a BPM tool that works in conjunction with the wider data management platform and data governance initiative – not one that works against them.

That means it must support data imports and integrations from/with external sources, a solution that enables in-tool collaboration to reduce departmental silos, and most crucial, a solution that taps into a central metadata repository to ensure consistency across the whole data management and governance initiatives.

The lack of a central metadata repository is a far too common thorn in an organization’s side. Without it, they have to juggle multiple versions as changes to the underlying data aren’t automatically updated across the platform.

It also means organizations waste crucial time manually manufacturing and maintaining data quality, when an automation framework could achieve the same goal instantaneously, without human error and with greater consistency.

A central metadata repository ensures an organization can acknowledge and get behind a single source of truth. This has a wealth of favorable consequences including greater cohesion across the organization, better data quality and trust, and faster decision-making with less false starts due to plans based on misleading information.

Three Key Questions to Ask When Choosing a BPM Tool

Organizations in the market for a BPM tool should also consider the following:

1. Configurability: Does the tool support the ability to model and analyze business processes with links to data, applications and other aspects of your organization? And how easy is this to achieve?

2. Role-based views: Can the tool develop integrated business models for a single source of truth but with different views for different stakeholders based on their needs – making regulatory compliance more manageable? Does it enable cross-functional and enterprise collaboration through discussion threads, surveys and other social features?

3. Business and IT infrastructure interoperability: How well does the tool integrate with other key components of data governance including enterprise architecture, data modeling, data cataloging and data literacy? Can it aid in providing data intelligence to connect all the pieces of the data management and governance lifecycles?

For more information and to find out how such a solution can integrate with your organization and current data management and data governance initiatives, click here.

erwin Expert Blog

Four Use Cases Proving the Benefits of Metadata-Driven Automation

Post author By Bunny Tharpe
Post date February 7, 2019
1 Comment on Four Use Cases Proving the Benefits of Metadata-Driven Automation

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation.

The volume and variety of data has snowballed, and so has its velocity. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down. They are time-consuming and prone to human error, making compliance, innovation and transformation initiatives more complicated, which is less than ideal in the information age.

So it’s safe to say that organizations can’t reap the rewards of their data without automation.

Data scientists and other data professionals can spend up to 80 percent of their time bogged down trying to understand source data or addressing errors and inconsistencies.

That’s time needed and better used for data analysis.

By implementing metadata-driven automation, organizations across industry can unleash the talents of their highly skilled, well paid data pros to focus on finding the goods: actionable insights that will fuel the business.

Metadata-Driven Automation in the BFSI Industry

The banking, financial services and insurance industry typically deals with higher data velocity and tighter regulations than most. This bureaucracy is rife with data management bottlenecks.

These bottlenecks are only made worse when organizations attempt to get by with systems and tools that are not purpose-built.

For example, manually managing data mappings for the enterprise data warehouse via MS Excel spreadsheets had become cumbersome and unsustainable for one BSFI company.

After embracing metadata-driven automation and custom code automation templates, it saved hundreds of thousands of dollars in code generation and development costs and achieved more work in less time with fewer resources. ROI on the automation solutions was realized within the first year.

Metadata-Driven Automation in the Pharmaceutical Industry

Despite its shortcomings, the Excel spreadsheet method for managing data mappings is common within many industries.

But with the amount of data organizations need to process in today’s business climate, this manual approach makes change management and determining end-to-end lineage a significant and time-consuming challenge.

One global pharmaceutical giant headquartered in the United States experienced such issues until it adopted metadata-driven automation. Then the pharma company was able to scan in all source and target system metadata and maintain it within a single repository. Users now view end-to-end data lineage from the source layer to the reporting layer within seconds.

On the whole, the implementation resulted in extraordinary time savings and a total cost reduction of 60 percent.

Metadata-Driven Automation in the Insurance Industry

Insurance is another industry that has to cope with high data velocity and stringent data regulations. Plus many organizations in this sector find that they’ve outgrown their systems.

For example, an insurance company using a CDMA product to centralize data mappings is probably missing certain critical features, such as versioning, impact analysis and lineage, which adds to costs, times to market and errors.

By adopting metadata-driven automation, organizations can standardize the pre-ETL data mapping process and better manage data integration through the change and release process. As a result, both internal data mapping and cross functional teams now have easy and fast web-based access to data mappings and valuable information like impact analysis and lineage.

Here is the story of a business that adopted such an approach and achieved operational excellence and a delivery time reduction by 80 percent, as well as achieving ROI within 12 months.

Metadata-Driven Automation for a Non-Profit

Another common issue cited by organizations using manual data mapping is ballooning complexity and subsequent confusion.

Any organization expanding its data-driven focus without sufficiently maturing data management initiative(s) will experience this at some point.

One of the world’s largest humanitarian organizations, with millions of members and volunteers operating all over the world, was confronted with this exact issue.

It recognized the need for a solution to standardize the pre-ETL data mapping process to make data integration more efficient and cost-effective.

With metadata-driven automation, the organization would be able to scan and store metadata and data dictionaries in a central repository, as well as manage the business definitions and data dictionary for legacy systems contributing data to the enterprise data warehouse.

By adopting such an approach, the organization realized time savings across all IT development and cross-functional testing teams. Additionally, they were able to more easily manage mappings, code sets, reference data and data validation rules.

Again, ROI was achieved within a year.

A Universal Solution for Metadata-Driven Automation

Metadata-driven automation is a capability any organization can benefit from – regardless of industry, as demonstrated by the various real-world use cases chronicled here.

The erwin Automation Framework is a key component of the erwin EDGE platform for comprehensive data management and data governance.

With it, data professionals realize these industry-agnostic benefits:

Centralized and standardized code management with all automation templates stored in a governed repository
Better quality code and minimized rework
Business-driven data movement and transformation specifications
Superior data movement job designs based on best practices
Greater agility and faster time-to-value in data preparation, deployment and governance
Cross-platform support of scripting languages and data movement technologies

Learn more about metadata-driven automation as it relates to data preparation and enterprise data mapping.

Join one our weekly erwin Mapping Manager demos.

erwin Expert Blog

Five Benefits of an Automation Framework for Data Governance

Post author By Sam Benedict and John Carter
Post date January 24, 2019
1 Comment on Five Benefits of an Automation Framework for Data Governance

Organizations are responsible for governing more data than ever before, making a strong automation framework a necessity. But what exactly is an automation framework and why does it matter?

In most companies, an incredible amount of data flows from multiple sources in a variety of formats and is constantly being moved and federated across a changing system landscape.

Often these enterprises are heavily regulated, so they need a well-defined data integration model that helps avoid data discrepancies and removes barriers to enterprise business intelligence and other meaningful use.

With an automation framework, data professionals can meet these needs at a fraction of the cost of the traditional manual way.

In data governance terms, an automation framework refers to a metadata-driven universal code generator that works hand in hand with enterprise data mapping for:

Pre-ETL enterprise data mapping
Governing metadata
Governing and versioning source-to-target mappings throughout the lifecycle
Data lineage, impact analysis and business rules repositories
Automated code generation

Such automation enables organizations to bypass bottlenecks, including human error and the time required to complete these tasks manually.

In fact, being able to rely on automated and repeatable processes can result in up to 50 percent in design savings, up to 70 percent conversion savings and up to 70 percent acceleration in total project delivery.

So without further ado, here are the five key benefits of an automation framework for data governance.

Benefits of an Automation Framework for Data Governance

Creates simplicity, reliability, consistency and customization for the integrated development environment.

Code automation templates (CATs) can be created – for virtually any process and any tech platform – using the SDK scripting language or the solution’s published libraries to completely automate common, manual data integration tasks.

CATs are designed and developed by senior automation experts to ensure they are compliant with industry or corporate standards as well as with an organization’s best practice and design standards.

The 100-percent metadata-driven approach is critical to creating reliable and consistent CATs.

It is possible to scan, pull in and configure metadata sources and targets using standard or custom adapters and connectors for databases, ERP, cloud environments, files, data modeling, BI reports and Big Data to document data catalogs, data mappings, ETL (XML code) and even SQL procedures of any type.

Provides blueprints anyone in the organization can use.

Stage DDL from source metadata for the target DBMS; profile and test SQL for test automation of data integration projects; generate source-to-target mappings and ETL jobs for leading ETL tools, among other capabilities.

It also can populate and maintain Big Data sets by generating PIG, Scoop, MapReduce, Spark, Python scripts and more.

Incorporates data governance into the system development process.

An organization can achieve a more comprehensive and sustainable data governance initiative than it ever could with a homegrown solution.

An automation framework’s ability to automatically create, version, manage and document source-to-target mappings greatly matters both to data governance maturity and a shorter-time-to-value.

This eliminates duplication that occurs when project teams are siloed, as well as prevents the loss of knowledge capital due to employee attrition.

Another value capability is coordination between data governance and SDLC, including automated metadata harvesting and cataloging from a wide array of sources for real-time metadata synchronization with core data governance capabilities and artifacts.

Proves the value of data lineage and impact analysis for governance and risk assessment.

Automated reverse-engineering of ETL code into natural language enables a more intuitive lineage view for data governance.

With end-to-end lineage, it is possible to view data movement from source to stage, stage to EDW, and on to a federation of marts and reporting structures, providing a comprehensive and detailed view of data in motion.

The process includes leveraging existing mapping documentation and auto-documented mappings to quickly render graphical source-to-target lineage views including transformation logic that can be shared across the enterprise.

Similarly, impact analysis – which involves data mapping and lineage across tables, columns, systems, business rules, projects, mappings and ETL processes – provides insight into potential data risks and enables fast and thorough remediation when needed.

Impact analysis across the organization while meeting regulatory compliance with industry regulators requires detailed data mapping and lineage.

THE REGULATORY RATIONALE FOR INTEGRATING DATA MANAGEMENT & DATA GOVERNANCE

Supports a wide spectrum of business needs.

Intelligent automation delivers enhanced capability, increased efficiency and effective collaboration to every stakeholder in the data value chain: data stewards, architects, scientists, analysts; business intelligence developers, IT professionals and business consumers.

It makes it easier for them to handle jobs such as data warehousing by leveraging source-to-target mapping and ETL code generation and job standardization.

It’s easier to map, move and test data for regular maintenance of existing structures, movement from legacy systems to new systems during a merger or acquisition, or a modernization effort.

erwin’s Approach to Automation for Data Governance: The erwin Automation Framework

Mature and sustainable data governance requires collaboration from both IT and the business, backed by a technology platform that accelerates the time to data intelligence.

Part of the erwin EDGE portfolio for an “enterprise data governance experience,” the erwin Automation Framework transforms enterprise data into accurate and actionable insights by connecting all the pieces of the data management and data governance lifecycle.

As with all erwin solutions, it embraces any data from anywhere (Any²) with automation for relational, unstructured, on-premise and cloud-based data assets and data movement specifications harvested and coupled with CATs.

If your organization would like to realize all the benefits explained above – and gain an “edge” in how it approaches data governance, you can start by joining one of our weekly demos for erwin Mapping Manager.

erwin Expert Blog

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms, including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. But the attempts to standardize data across the entire enterprise haven’t produced the desired results.

A company can’t effectively implement data governance – documenting and applying business rules and processes, analyzing the impact of changes and conducting audits – if it fails at data management.

The problem usually starts by relying on manual integration methods for data preparation and mapping. It’s only when companies take their first stab at manually cataloging and documenting operational systems, processes and the associated data, both at rest and in motion, that they realize how time-consuming the entire data prepping and mapping effort is, and why that work is sure to be compounded by human error and data quality issues.

To effectively promote business transformation, as well as fulfil regulatory and compliance mandates, there can’t be any mishaps.

It’s obvious that the manual road is very challenging to discover and synthesize data that resides in different formats in thousands of unharvested, undocumented databases, applications, ETL processes and procedural code.

Consider the problematic issue of manually mapping source system fields (typically source files or database tables) to target system fields (such as different tables in target data warehouses or data marts).

These source mappings generally are documented across a slew of unwieldy spreadsheets in their “pre-ETL” stage as the input for ETL development and testing. However, the ETL design process often suffers as it evolves because spreadsheet mapping data isn’t updated or may be incorrectly updated thanks to human error. So questions linger about whether transformed data can be trusted.

Data Quality Obstacles

The sad truth is that high-paid knowledge workers like data scientists spend up to 80 percent of their time finding and understanding source data and resolving errors or inconsistencies, rather than analyzing it for real value.

Statistics are similar when looking at major data integration projects, such as data warehousing and master data management with data stewards challenged to identify and document data lineage and sensitive data elements.

So how can businesses produce value from their data when errors are introduced through manual integration processes? How can enterprise stakeholders gain accurate and actionable insights when data can’t be easily and correctly translated into business-friendly terms?

How can organizations master seamless data discovery, movement, transformation and IT and business collaboration to reverse the ratio of preparation to value delivered.

What’s needed to overcome these obstacles is establishing an automated, real-time, high-quality and metadata- driven pipeline useful for everyone, from data scientists to enterprise architects to business analysts to C-level execs.

Doing so will require a hearty data management strategy and technology for automating the timely delivery of quality data that measures up to business demands.

From there, they need a sturdy data governance strategy and technology to automatically link and sync well-managed data with core capabilities for auditing, statutory reporting and compliance requirements as well as to drive business insights.

Creating a High-Quality Data Pipeline

Working hand-in-hand, data management and data governance provide a real-time, accurate picture of the data landscape, including “data at rest” in databases, data lakes and data warehouses and “data in motion” as it is integrated with and used by key applications. And there’s control of that landscape to facilitate insight and collaboration and limit risk.

With a metadata-driven, automated, real-time, high-quality data pipeline, all stakeholders can access data that they now are able to understand and trust and which they are authorized to use. At last they can base strategic decisions on what is a full inventory of reliable information.

The integration of data management and governance also supports industry needs to fulfill regulatory and compliance mandates, ensuring that audits are not compromised by the inability to discover key data or by failing to tag sensitive data as part of integration processes.

Data-driven insights, agile innovation, business transformation and regulatory compliance are the fruits of data preparation/mapping and enterprise modeling (business process, enterprise architecture and data modeling) that revolves around a data governance hub.

erwin Mapping Manager (MM) combines data management and data governance processes in an automated flow through the integration lifecycle from data mapping for harmonization and aggregation to generating the physical embodiment of data lineage – that is the creation, movement and transformation of transactional and operational data.

Its hallmark is a consistent approach to data delivery (business glossaries connect physical metadata to specific business terms and definitions) and metadata management (via data mappings).

erwin Expert Blog

Massive Marriott Data Breach: Data Governance for Data Security

Post author By Danny Sandwell
Post date December 5, 2018
2 Comments on Massive Marriott Data Breach: Data Governance for Data Security

Organizations have been served yet another reminder of the value of data governance for data security.

Hotel and hospitality powerhouse Marriott recently revealed a massive data breach that led to the theft of personal data for an astonishing 500 million customers of its Starwood hotels. This is the second largest data breach in recent history, surpassed only by Yahoo’s breach of 3 billion accounts in 2013 for which it has agreed to pay a $50 million settlement to more than 200 million customers.

Now that Marriott has taken a major hit to its corporate reputation, it has two moves:

Respond: Marriott’s response to its data breach so far has not received glowing reviews. But beyond how it communicates to effected customers, the company must examine how the breach occurred in the first place. This means understanding the context of its data – what assets exist and where, the relationship between them and enterprise systems and processes, and how and by what parties the data is used – to determine the specific vulnerability.
Fix it: Marriott must fix the problem, and quickly, to ensure it doesn’t happen again. This step involves a lot of analysis. A data governance solution would make it a lot less painful by providing visibility into the full data landscape – linkages, processes, people and so on. Then more context-sensitive data security architectures can put in place to for corporate and consumer data privacy.

The GDPR Factor

It’s been six months since the General Data Protection Regulation (GDPR) took effect. While fines for noncompliance have been minimal to date, we anticipate them to dramatically increase in the coming year. Marriott’s bad situation could potentially worsen in this regard, without holistic data governance in place to identify whose and what data was taken.

Data management and data governance, together, play a vital role in compliance, including GDPR. It’s easier to protect sensitive data when you know what it is, where it’s stored and how it needs to be governed.

FREE GUIDE: THE REGULATORY RATIONALE FOR INTEGRATING DATA MANAGEMENT & DATA GOVERNANCE

Truly understanding an organization’s data, including the data’s value and quality, requires a harmonized approach embedded in business processes and enterprise architecture. Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

Data Governance for Data Security: Lessons Learned

Other companies should learn (like pronto) that they need to be prepared. At this point it’s not if, but when, a data breach will rear its ugly head. Preparation is your best bet for avoiding the entire fiasco – from the painstaking process of identifying what happened and why to notifying customers their data and trust in your organization have been compromised.

A well-formed security architecture that is driven by and aligned by data intelligence is your best defense. However, if there is nefarious intent, a hacker will find a way. So being prepared means you can minimize your risk exposure and the damage to your reputation.

Multiple components must be considered to effectively support a data governance, security and privacy trinity. They are:

Data models
Enterprise architecture
Business process models

What’s key to remember is that these components act as links in the data governance chain by making it possible to understand what data serves the organization, its connection to the enterprise architecture, and all the business processes it touches.

THE EXPERT GUIDE TO DATA GOVERNANCE, SECURITY AND PRIVACY

Creating policies for data handling and accountability and driving culture change so people understand how to properly work with data are two important components of a data governance initiative, as is the technology for proactively managing data assets.

Without the ability to harvest metadata schemas and business terms; analyze data attributes and relationships; impose structure on definitions; and view all data in one place according to each user’s role within the enterprise, businesses will be hard pressed to stay in step with governance standards and best practices around security and privacy.

As a consequence, the private information held within organizations will continue to be at risk. Organizations suffering data breaches will be deprived of the benefits they had hoped to realize from the money spent on security technologies and the time invested in developing data privacy classifications. They also may face heavy fines and other financial, not to mention PR, penalties.

Less Pain, More Gain

Most organizations don’t have enough time or money for data management using manual processes. And outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too. Furthermore, manual processes require manual analysis and auditing, which is always more expensive and time consuming.

So the more processes an organization can automate, the less risk of human error, which is actually the primary cause of most data breaches. And automated processes are much easier to analyze and audit because everything is captured, versioned and available for review in a log somewhere. You can read more about automation in our 10 Reasons to Automate Data Mapping and Data Preparation.

And to learn more about how data governance underpins data security and privacy, click here.

erwin Expert Blog

Data Modeling and Data Mapping: Results from Any Data Anywhere

Post author By Andrew McGovern
Post date November 20, 2018
No Comments on Data Modeling and Data Mapping: Results from Any Data Anywhere

A unified approach to data modeling and data mapping could be the breakthrough that many data-driven organizations need.

In most of the conversations I have with clients, they express the need for a viable solution to model their data, as well as the ability to capture and document the metadata within their environments.

Data modeling is an integral part of any data management initiative. Organizations use data models to tame “data at rest” for business use, governance and technical management of databases of all types.

What Is Data Modeling?

However, once an organization understands what data it has and how it’s structured via data models, it needs answers to other critical questions: Where did it come from? Did it change along the journey? Where does it go from here?

Data Mapping: Taming “Data in Motion”

Knowing how data moves throughout technical and business data architectures is key for true visibility, context and control of all data assets.

Managing data in motion has been a difficult, time-consuming task that involves mapping source elements to the data model, defining the required transformations, and/or providing the same for downstream targets.

Historically, it either has been outsourced to ETL/ELT developers who often create a siloed, technical infrastructure opaque to the business, or business-friendly mappings have been kept in an assortment of unwieldy spreadsheets difficult to consolidate and reuse much less capable of accommodating new requirements.

What if you could combine data at rest and data in motion to create an efficient, accurate and real-time data pipeline that also includes lineage? Then you can spend your time finding the data you need and using it to produce meaningful business outcomes.

Good news … you can.

Automated Data Mapping

Your data modelers can continue to use erwin Data Modeler (DM) as the foundation of your database management system, documenting, enforcing and improving those standards. But instead of relying on data models to disseminate metadata information, you can scan and integrate any data source and present it to all interested parties – automatically.

erwin Mapping Manager (MM) shifts the management of metadata away from data models to a dedicated, automated platform. It can collect metadata from any source, including JSON documents, erwin data models, databases and ERP systems, out of the box.

This functionality underscores our Any² data approach by collecting any data from anywhere. And erwin MM can schedule data collection and create versions for comparison to clearly identify any changes.

Metadata definitions can be enhanced using extended data properties, and detailed data lineages can be created based on collected metadata. End users can quickly search for information and see specific data in the context of business processes.

To summarize the key features current data modeling customers seem to be most excited about:

Easy import of legacy mappings, plus share and reuse mappings and transformations
Metadata catalog to automatically harvest any data from anywhere
Comprehensive upstream and downstream data lineage
Versioning with comparison features
Impact analysis

And all of these features support and can be integrated with erwin Data Governance. The end result is knowing what data you have and where it is so you can fuel a fast, high-quality and complete pipeline of any data from anywhere to accomplish your organizational objectives.

Want to learn more about a unified approach to data modeling and data mapping? Join us for our weekly demo to see erwin MM in action for yourself.