Categories
erwin Expert Blog Metadata Management

7 Benefits of Metadata Management

Metadata management is key to wringing all the value possible from data assets.

However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives.

What Is Metadata?

Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”

Quite simply, metadata is data about data. It’s generated every time data is captured at a source, accessed by users, moved through an organization, integrated or augmented with other data from other sources, profiled, cleansed and analyzed.

It’s valuable because it provides information about the attributes of data elements that can be used to guide strategic and operational decision-making. Metadata management is the administration of data that describes other data, with an emphasis on associations and lineage. It involves establishing policies and processes to ensure information can be integrated, accessed, shared, linked, analyzed and maintained across an organization.

Metadata Answers Key Questions

A strong data management strategy and supporting technology enables the data quality the business requires, including data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

Metadata answers a lot of important questions:

  • What data do we have?
  • Where did it come from?
  • Where is it now?
  • How has it changed since it was originally created or captured?
  • Who is authorized to use it and how?
  • Is it sensitive or are there any risks associated with it?

Metadata also helps your organization to:

  • Discover data. Identify and interrogate metadata from various data management silos.
  • Harvest data. Automate the collection of metadata from various data management silos and consolidate it into a single source.
  • Structure and deploy data sources. Connect physical metadata to specific data models, business terms, definitions and reusable design standards.
  • Analyze metadata. Understand how data relates to the business and what attributes it has.
  • Map data flows. Identify where to integrate data and track how it moves and transforms.
  • Govern data. Develop a governance model to manage standards, policies and best practices and associate them with physical assets.
  • Socialize data. Empower stakeholders to see data in one place and in the context of their roles.

Metadata management

The Benefits of Metadata Management

1. Better data quality. With automation, data quality is systemically assured with the data pipeline seamlessly governed and operationalized to the benefit of all stakeholders. Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair. It’s easier to map, move and test data for regular maintenance of existing structures, movement from legacy systems to new systems during a merger or acquisition or a modernization effort.

2. Quicker project delivery. Automated enterprise metadata management provides greater accuracy and up to 70 percent acceleration in project delivery for data movement and/or deployment projects. It harvests metadata from various data sources and maps any data element from source to target and harmonize data integration across platforms. With this accurate picture of your metadata landscape, you can accelerate Big Data deployments, Data Vaults, data warehouse modernization, cloud migration, etc.

3. Faster speed to insights. High-paid knowledge workers like data scientists spend up to 80 percent of their time finding and understanding source data and resolving errors or inconsistencies, rather than analyzing it for real value. That equation can be reversed with stronger data operations and analytics leading to insights more quickly, with access/connectivity to underlying metadata and its lineage. Technical resources are free to concentrate on the highest-value projects, while business analysts, data architects, ETL developers, testers and project managers can collaborate more easily for faster decision-making.

4. Greater productivity & reduced costs. Being able to rely on automated and repeatable metadata management processes results in greater productivity. For example, one erwin DI customer has experienced a steep improvement in productivity – more than 85 percent – because manually intensive and complex coding efforts have been automated and 70+ percent because of seamless access to and visibility of all metadata, including end-to-end lineage. Significant data design and conversion savings, up to 50 percent and 70 percent respectively, also are possible with data mapping costs going down as much as 80 percent.

5. Regulatory compliance. Regulations such as the General Data Protection Regulation (GDPR), Health Insurance and Portability Accountability Act (HIPAA), Basel Committee on Banking Supervision (BCBS) and The California Consumer Privacy Act (CCPA) particularly affect sectors such as finance, retail, healthcare and pharmaceutical/life sciences. When key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed. Sensitive data is automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its use across workflows easily traced.

6. Digital transformation. Knowing what data exists and its value potential promotes digital transformation by 1) improving digital experiences because you understand how the organization interacts with and supports customers, 2) enhancing digital operations because data preparation and analysis projects happen faster, 3) driving digital innovation because data can be used to deliver new products and services, and 4) building digital ecosystems because organizations need to establish platforms and partnerships to scale and grow.

7. An enterprise data governance experience. Stakeholders include both IT and business users in collaborative relationships, so that makes data governance everyone’s business. Modern, strategic data governance must be an ongoing initiative, and it requires everyone from executives on down to rethink their data duties and assume new levels of cooperation and accountability. With business data stakeholders driving alignment between data governance and strategic enterprise goals and IT handling the technical mechanics of data management, the door opens to finding, trusting and using data to effectively meet any organizational objective.

An Automated Solution

When approached manually, metadata management is expensive, time-consuming, error-prone and can’t keep pace with a dynamic enterprise data management infrastructure.

And while integrating and automating data management and data governance is still a new concept for many organizations, its advantages are clear.

erwin’s metadata management offering, the erwin Data Intelligence Suite (erwin DI), includes data catalogdata literacy and automation capabilities for greater awareness of and access to data assets, guidance on their use, and guardrails to ensure data policies and best practices are followed. Its automated, metadata-driven framework gives organizations visibility and control over their disparate data streams – from harvesting to aggregation and integration, including transformation with complete upstream and downstream lineage and all the associated documentation.

erwin has been named a leader in the Gartner 2020 “Magic Quadrant for Metadata Management Solutions” for two consecutive years. Click here to download the full Gartner 2020 “Magic Quadrant for Metadata Management Solutions” report.

Categories
erwin Expert Blog Data Governance

What is Data Lineage? Top 5 Benefits of Data Lineage

What is Data Lineage and Why is it Important?

Data lineage is the journey data takes from its creation through its transformations over time. It describes a certain dataset’s origin, movement, characteristics and quality.

Tracing the source of data is an arduous task.

Many large organizations, in their desire to modernize with technology, have acquired several different systems with various data entry points and transformation rules for data as it moves into and across the organization.

data lineage

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (API)s, file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

With all these diverse data sources, and if systems are integrated, it is difficult to understand the complicated data web they form much less get a simple visual flow. This is why data’s lineage must be tracked and why its role is so vital to business operations, providing the ability to understand where data originates, how it is transformed, and how it moves into, across and outside a given organization.

Data Lineage Use Case: From Tracing COVID-19’s Origins to Data-Driven Business

A lot of theories have emerged about the origin of the coronavirus. A recent University of California San Francisco (UCSF) study conducted a genetic analysis of COVID-19 to determine how the virus was introduced specifically to California’s Bay Area.

It detected at least eight different viral lineages in 29 patients in February and early March, suggesting no regional patient zero but rather multiple independent introductions of the pathogen. The professor who directed the study said, “it’s like sparks entering California from various sources, causing multiple wildfires.”

Much like understanding viral lineage is key to stopping this and other potential pandemics, understanding the origin of data, is key to a successful data-driven business.

Top Five Data Lineage Benefits

From my perspective in working with customers of various sizes across multiple industries, I’d like to highlight five data lineage benefits:

1. Business Impact

Data is crucial to every organization’s survival. For that reason, businesses must think about the flow of data across multiple systems that fuel organizational decision-making.

For example, the marketing department uses demographics and customer behavior to forecast sales. The CEO also makes decisions based on performance and growth statistics. An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including:

  • How the report tables and columns are defined in the metadata?
  • Who are the data owners?
  • What are the transformation rules?

Without data lineage, these functions are irrelevant, so it makes sense for a business to have a clear understanding of where data comes from, who uses it, and how it transforms. Also, when there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape.

In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

2. Compliance & Auditability

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Regulatory compliance places greater transparency demands on firms when it comes to tracing and auditing data. For example, capital markets trading firms must understand their data’s origins and history to support risk management, data governance and reporting for various regulations such as BCBS 239 and MiFID II.

Also, different organizational stakeholders (customers, employees and auditors) need to be able to understand and trust reported data. Data lineage offers proof that the data provided is reflected accurately.

3. Data Governance

An automated data lineage solution stitches together metadata for understanding and validating data usage, as well as mitigating the associated risks.

It can auto-document end-to-end upstream and downstream data lineage, revealing any changes that have been made, by whom and when.

This data ownership, accountability and traceability is foundational to a sound data governance program.

See: The Benefits of Data Governance

4. Collaboration

Analytics and reporting are data-dependent, making collaboration among different business groups and/or departments crucial.

The visualization of data lineage can help business users spot the inherent connections of data flows and thus provide greater transparency and auditability.

Seeing data pipelines and information flows further supports compliance efforts.

5. Data Quality

Data quality is affected by data’s movement, transformation, interpretation and selection through people, process and technology.

Root-cause analysis is the first step in repairing data quality. Once a data steward determines where a data flaw was introduced, the reason for the error can be determined.

With data lineage and mapping, the data steward can trace the information flow backward to examine the standardizations and transformations applied to confirm whether they were performed correctly.

See Data Lineage in Action

Data lineage tools document the flow of data into and out of an organization’s systems. They capture end-to-end lineage and ensure proper impact analysis can be performed in the event of problems or changes to data assets as they move across pipelines.

The erwin Data Intelligence Suite (erwin DI) automatically generates end-to-end data lineage, down to the column level and between repositories. You can view data flows from source systems to the reporting layers, including intermediate transformation and business logic.

Join us for the next live demo of erwin Data Intelligence (DI) to see metadata-driven, automated data lineage in action.

erwin data intelligence

Subscribe to the erwin Expert Blog

Once you submit the trial request form, an erwin representative will be in touch to verify your request and help you start data modeling.

Categories
erwin Expert Blog

Data Governance for Smart Data Distancing

Hello from my home office! I hope you and your family are staying safe, practicing social distancing, and of course, washing your hands.

These are indeed strange days. During this coronavirus emergency, we are all being deluged by data from politicians, government agencies, news outlets, social media and websites, including valid facts but also opinions and rumors.

Happily for us data geeks, the general public is being told how important our efforts and those of data scientists are to analyzing, mapping and ultimately shutting down this pandemic.

Yay, data geeks!

Unfortunately though, not all of the incoming information is of equal value, ethically sourced, rigorously prepared or even good.

As we work to protect the health and safety of those around us, we need to understand the nuances of meaning for the received information as well as the motivations of information sources to make good decisions.

On a very personal level, separating the good information from the bad becomes a matter of life and potential death. On a business level, decisions based on bad external data may have the potential to cause business failures.

In business, data is the food that feeds the body or enterprise. Better data makes the body stronger and provides a foundation for the use of analytics and data science tools to reduce errors in decision-making. Ultimately, it gives our businesses the strength to deliver better products and services to our customers.

How then, as a business, can we ensure that the data we consume is of good quality?

Distancing from Third-Party Data

Just as we are practicing social distancing in our personal lives, so too we must practice data distancing in our professional lives.

In regard to third-party data, we should ask ourselves: How was the data created? What formulas were used? Does the definition (description, classification, allowable range of values, etc.) of incoming, individual data elements match our internal definitions of those data elements?

If we reflect on the coronavirus example, we can ask: How do individual countries report their data? Do individual countries use the same testing protocols? Are infections universally defined the same way (based on widely administered tests or only hospital admissions)? Are asymptomatic infections reported? Are all countries using the same methods and formulas to collect and calculate infections, recoveries and deaths?

In our businesses, it is vital that we work to develop a deeper understanding of the sources, methods and quality of incoming third-party data. This deeper understanding will help us make better decisions about the risks and rewards of using that external data.

Data Governance Methods for Data Distancing

We’ve received lots of instructions lately about how to wash our hands to protect ourselves from coronavirus. Perhaps we thought we already knew how to wash our hands, but nonetheless, a refresher course has been worthwhile.

Similarly, perhaps we think we know how to protect our business data, but maybe a refresher would be useful here as well?

Here are a few steps you can take to protect your business:

  • Establish comprehensive third-party data sharing guidelines (for both inbound and outbound data). These guidelines should include communicating with third parties about how they make changes to collection and calculation methods.
  • Rationalize external data dictionaries to our internal data dictionaries and understand where differences occur and how we will overcome those differences.
  • Ingest to a quarantined area where it can be profiled and measured for quality, completeness, and correctness, and where necessary, cleansed.
  • Periodically review all data ingestion or data-sharing policies, processes and procedures to ensure they remain aligned to business needs and goals.
  • Establish data-sharing training programs so all data stakeholders understand associated security considerations, contextual meaning, and when and when not to share and/or ingest third-party data.

erwin Data Intelligence for Data Governance and Distancing

With solutions like those in the erwin Data Intelligence Suite (erwin DI), organizations can auto-document their metadata; classify their data with respect to privacy, contractual and regulatory requirements; attach data-sharing and management policies; and implement an appropriate level of data security.

If you believe the management of your third-party data interfaces could benefit from a review or tune-up, feel free to reach out to me and my colleagues here at erwin.

We’d be happy to provide a demo of how to use erwin DI for data distancing.

erwin Data Intelligence

Categories
erwin Expert Blog

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata, or the data about the data.

However, most organizations don’t use all the data they’re flooded with to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or make other strategic decisions. They don’t know exactly what data they have or even where some of it is.

Quite honestly, knowing what data you have and where it lives is complicated. And to truly understand it, you need to be able to create and sustain an enterprise-wide view of and easy access to underlying metadata.

This isn’t an easy task. Organizations are dealing with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and with little thought for downstream integration.

As a result, the applications and initiatives that depend on a solid data infrastructure may be compromised, leading to faulty analysis and insights.

Metadata Is the Heart of Data Intelligence

A recent IDC Innovators: Data Intelligence Report says that getting answers to such questions as “where is my data, where has it been, and who has access to it” requires harnessing the power of metadata.

Metadata is generated every time data is captured at a source, accessed by users, moves through an organization, and then is profiled, cleansed, aggregated, augmented and used for analytics to guide operational or strategic decision-making.

In fact, data professionals spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analysis, according to IDC.

To flip this 80/20 rule, they need an automated metadata management solution for:

• Discovering data – Identify and interrogate metadata from various data management silos.
• Harvesting data – Automate the collection of metadata from various data management silos and consolidate it into a single source.
• Structuring and deploying data sources – Connect physical metadata to specific data models, business terms, definitions and reusable design standards.
• Analyzing metadata – Understand how data relates to the business and what attributes it has.
• Mapping data flows – Identify where to integrate data and track how it moves and transforms.
• Governing data – Develop a governance model to manage standards, policies and best practices and associate them with physical assets.
• Socializing data – Empower stakeholders to see data in one place and in the context of their roles.

Addressing the Complexities of Metadata Management

The complexities of metadata management can be addressed with a strong data management strategy coupled with metadata management software to enable the data quality the business requires.

This encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossary maintenance, and metadata management (associations and lineage).

erwin has developed the only data intelligence platform that provides organizations with a complete and contextual depiction of the entire metadata landscape.

It is the only solution that can automatically harvest, transform and feed metadata from operational processes, business applications and data models into a central data catalog and then made accessible and understandable within the context of role-based views.

erwin’s ability to integrate and continuously refresh metadata from an organization’s entire data ecosystem, including business processes, enterprise architecture and data architecture, forms the foundation for enterprise-wide data discovery, literacy, governance and strategic usage.

Organizations then can take a data-driven approach to business transformation, speed to insights, and risk management.
With erwin, organizations can:

1. Deliver a trusted metadata foundation through automated metadata harvesting and cataloging
2. Standardize data management processes through a metadata-driven approach
3. Centralize data-driven projects around centralized metadata for planning and visibility
4. Accelerate data preparation and delivery through metadata-driven automation
5. Master data management platforms through metadata abstraction
6. Accelerate data literacy through contextual metadata enrichment and integration
7. Leverage a metadata repository to derive lineage, impact analysis and enable audit/oversight ability

With erwin Data Intelligence as part of the erwin EDGE platform, you know what data you have, where it is, where it’s been and how it transformed along the way, plus you can understand sensitivities and risks.

With an automated, real-time, high-quality data pipeline, enterprise stakeholders can base strategic decisions on a full inventory of reliable information.

Many of our customers are hard at work addressing metadata management challenges, and that’s why erwin was Named a Leader in Gartner’s “2019 Magic Quadrant for Metadata Management Solutions.”

Gartner Magic Quadrant Metadata Management

Categories
erwin Expert Blog

Business Process Can Make or Break Data Governance

Data governance isn’t a one-off project with a defined endpoint. It’s an on-going initiative that requires active engagement from executives and business leaders.

Data governance, today, comes back to the ability to understand critical enterprise data within a business context, track its physical existence and lineage, and maximize its value while ensuring quality and security.

Free Data Modeling Best Practice Guide

Historically, little attention has focused on what can literally make or break any data governance initiative — turning it from a launchpad for competitive advantage to a recipe for disaster. Data governance success hinges on business process modeling and enterprise architecture.

To put it even more bluntly, successful data governance* must start with business process modeling and analysis.

*See: Three Steps to Successful & Sustainable Data Governance Implementation

Business Process Data Governance

Passing the Data Governance Ball

For years, data governance was the volleyball passed back and forth over the net between IT and the business, with neither side truly owning it. However, once an organization understands that IT and the business are both responsible for data, it needs to develop a comprehensive, holistic strategy for data governance that is capable of four things:

  1. Reaching every stakeholder in the process
  2. Providing a platform for understanding and governing trusted data assets
  3. Delivering the greatest benefit from data wherever it lives, while minimizing risk
  4. Helping users understand the impact of changes made to a specific data element across the enterprise.

To accomplish this, a modern data governance strategy needs to be interdisciplinary to break down traditional silos. Enterprise architecture is important because it aligns IT and the business, mapping a company’s applications and the associated technologies and data to the business functions and value streams they enable.

Ovum Market Radar: Enterprise Architecture

The business process and analysis component is vital because it defines how the business operates and ensures employees understand and are accountable for carrying out the processes for which they are responsible. Enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Slow Down, Ask Questions

In a rush to implement a data governance methodology and system, organizations can forget that a system must serve a process – and be governed/controlled by one.

To choose the correct system and implement it effectively and efficiently, you must know – in every detail – all the processes it will impact. You need to ask these important questions:

  1. How will it impact them?
  2. Who needs to be involved?
  3. When do they need to be involved?

These questions are the same ones we ask in data governance. They involve impact analysis, ownership and accountability, control and traceability – all of which effectively documented and managed business processes enable.

Data sets are not important in and of themselves. Data sets become important in terms of how they are used, who uses them and what their use is – and all this information is described in the processes that generate, manipulate and use them. So unless we know what those processes are, how can any data governance implementation be complete or successful?

Processes need to be open and shared in a concise, consistent way so all parts of the organization can investigate, ask questions, and then add their feedback and information layers. In other words, processes need to be alive and central to the organization because only then will the use of data and data governance be truly effective.

A Failure to Communicate

Consider this scenario: We’ve perfectly captured our data lineage, so we know what our data sets mean, how they’re connected, and who’s responsible for them – not a simple task but a massive win for any organization. Now a breach occurs. Will any of the above information tell us why it happened? Or where? No! It will tell us what else is affected and who can manage the data layer(s), but unless we find and address the process failure that led to the breach, it is guaranteed to happen again.

By knowing where data is used – the processes that use and manage it – we can quickly, even instantly, identify where a failure occurs. Starting with data lineage (meaning our forensic analysis starts from our data governance system), we can identify the source and destination processes and the associated impacts throughout the organization.

We can know which processes need to change and how. We can anticipate the pending disruptions to our operations and, more to the point, the costs involved in mitigating and/or addressing them.

But knowing all the above requires that our processes – our essential and operational business architecture – be accurately captured and modelled. Instituting data governance without processes is like building a castle on sand.

Rethinking Business Process Modeling and Analysis

Modern organizations need a business process modeling and analysis tool with easy access to all the operational layers across the organization – from high-level business architecture all the way down to data.

Such a system should be flexible, adjustable, easy-to-use and capable of supporting multiple layers simultaneously, allowing users to start in their comfort zones and mature as they work toward their organization’s goals.

The erwin EDGE is one of the most comprehensive software platforms for managing an organization’s data governance and business process initiatives, as well as the whole data architecture. It allows natural, organic growth throughout the organization and the assimilation of data governance and business process management under the same platform provides a unique data governance experience because of its integrated, collaborative approach.

Start your free, cloud-based trial of erwin Business Process and see how some of the world’s largest enterprises have benefited from its centralized repository and integrated, role-based views.

We’d also be happy to show you our data governance software, which includes data cataloging and data literacy capabilities.

Enterprise Architecture Business Process Trial

Categories
erwin Expert Blog

Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Catalog Benefits

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

  • First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
  • Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
  • Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

Value of Data Intelligence IDC Report

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

Categories
erwin Expert Blog Data Intelligence

The Top 8 Benefits of Data Lineage

It’s important we recognize the benefits of data lineage.

As corporate data governance programs have matured, the inventory of agreed-to data policies has grown rapidly. These include guidelines for data quality assurance, regulatory compliance and data democratization, among other information utilization initiatives.

Organizations that are challenged by translating their defined data policies into implemented processes and procedures are starting to identify tools and technologies that can supplement the ways organizational data policies can be implemented and practiced.

One such technique, data lineage, is gaining prominence as a core operational business component of the data governance technology architecture. Data lineage encompasses processes and technology to provide full-spectrum visibility into the ways that data flow across the enterprise.

To data-driven businesses, the benefits of data lineage are significant. Data lineage tools are used to survey, document and enable data stewards to query and visualize the end-to-end flow of information units from their origination points through the series of transformation and processing stages to their final destination.

Benefits of Data Lineage

The Benefits of Data Lineage

Data stewards are attracted to data lineage because the benefits of data lineage help in a number of different governance practices, including:

1. Operational intelligence

At its core, data lineage captures the mappings of the rapidly growing number of data pipelines in the organization. Visualizing the information flow landscape provides insight into the “demographics” of data consumption and use, answering questions such as “what data sources feed the greatest number of downstream sources” or “which data analysts use data that is ingested from a specific data source.” Collecting this intelligence about the data landscape better positions the data stewards for enforcing governance policies.

2. Business terminology consistency

One of the most confounding data governance challenges is understanding the semantics of business terminology within data management contexts. Because application development was traditionally isolated within each business function, the same (or similar) terms are used in different data models, even though the designers did not take the time to align definitions and meanings. Data lineage allows the data stewards to find common business terms, review their definitions, and determine where there are inconsistencies in the ways the terms are used.

3. Data incident root cause analysis

It has long been asserted that when a data consumer finds a data error, the error most likely was introduced into the environment at an earlier stage of processing. Yet without a “roadmap” that indicates the processing stages through which the data were processed, it is difficult to speculate where the error was actually introduced. Using data lineage, though, a data steward can insert validation probes within the information flow to validate data values and determine the stage in the data pipeline where an error originated.

4. Data quality remediation assessment

Root cause analysis is just the first part of the data quality process. Once the data steward has determined where the data flaw was introduced, the next step is to determine why the error occurred. Again, using a data lineage mapping, the steward can trace backward through the information flow to examine the standardizations and transformations applied to the data, validate that transformations were correctly performed, or identify one (or more) performed incorrectly, resulting in the data flaw.

5. Impact analysis

The enterprise is always subject to changes; externally-imposed requirements (such as regulatory compliance) evolve, internal business directives may affect user expectations, and ingested data source models may change unexpectedly. When there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape. In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

6. Performance assessment

Not only does lineage provide a collection of mappings of data pipelines, it allows for the identification of potential performance bottlenecks. Data pipeline stages with many incoming paths are candidate bottlenecks. Using a set of data lineage mappings, the performance analyst can profile execution times across different pipelines and redistribute processing to eliminate bottlenecks.

7. Policy compliance

Data policies can be implemented through the specification of business rules. Compliance with these business rules can be facilitated using data lineage by embedding business rule validation controls across the data pipelines. These controls can generate alerts when there are noncompliant data instances.

8. Auditability of data pipelines

In many cases, regulatory compliance is a combination of enforcing a set of defined data policies along with a capability for demonstrating that the overall process is compliant. Data lineage provides visibility into the data pipelines and information flows that can be audited thereby supporting the compliance process.

Evaluating Enterprise Data Lineage Tools

While data lineage benefits are obvious, large organizations with complex data pipelines and data flows do face challenges in embracing the technology to document the enterprise data pipelines. These include:

  • Surveying the enterprise – Gathering information about the sources, flows and configurations of data pipelines.
  • Maintenance – Configuring a means to maintain an up-to-date view of the data pipelines.
  • Deliverability – Providing a way to give data consumers visibility to the lineage maps.
  • Sustainability – Ensuring sustainability of the processes for producing data lineage mappings.

Producing a collection of up-to-date data lineage mappings that are easily reviewed by different data consumers depends on addressing these challenges. When considering data lineage tools, keep these issues in mind when evaluating how well the tools can meet your data governance needs.

erwin Data Intelligence (erwin DI) helps organizations automate their data lineage initiatives. Learn more about data lineage with erwin DI.

Value of Data Intelligence IDC Report

Categories
erwin Expert Blog

Constructing a Digital Transformation Strategy: Putting the Data in Digital Transformation

Having a clearly defined digital transformation strategy is an essential best practice for successful digital transformation. But what makes a digital transformation strategy viable?

Part Two of the Digital Transformation Journey …

In our last blog on driving digital transformation, we explored how business architecture and process (BP) modeling are pivotal factors in a viable digital transformation strategy.

EA and BP modeling squeeze risk out of the digital transformation process by helping organizations really understand their businesses as they are today. It gives them the ability to identify what challenges and opportunities exist, and provides a low-cost, low-risk environment to model new options and collaborate with key stakeholders to figure out what needs to change, what shouldn’t change, and what’s the most important changes are.

Once you’ve determined what part(s) of your business you’ll be innovating — the next step in a digital transformation strategy is using data to get there.

Digital Transformation Examples

Constructing a Digital Transformation Strategy: Data Enablement

Many organizations prioritize data collection as part of their digital transformation strategy. However, few organizations truly understand their data or know how to consistently maximize its value.

If your business is like most, you collect and analyze some data from a subset of sources to make product improvements, enhance customer service, reduce expenses and inform other, mostly tactical decisions.

The real question is: are you reaping all the value you can from all your data? Probably not.

Most organizations don’t use all the data they’re flooded with to reach deeper conclusions or make other strategic decisions. They don’t know exactly what data they have or even where some of it is, and they struggle to integrate known data in various formats and from numerous systems—especially if they don’t have a way to automate those processes.

How does your business become more adept at wringing all the value it can from its data?

The reality is there’s not enough time, people and money for true data management using manual processes. Therefore, an automation framework for data management has to be part of the foundations of a digital transformation strategy.

Your organization won’t be able to take complete advantage of analytics tools to become data-driven unless you establish a foundation for agile and complete data management.

You need automated data mapping and cataloging through the integration lifecycle process, inclusive of data at rest and data in motion.

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes—complete with full documentation.

Without this framework and the ability to automate many of its processes, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by taking a manual approach. Outsourcing these data management efforts to professional services firms only delays schedules and increases costs.

With automation, data quality is systemically assured. The data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders.

Constructing a Digital Transformation Strategy: Smarter Data

Ultimately, data is the foundation of the new digital business model. Companies that have the ability to harness, secure and leverage information effectively may be better equipped than others to promote digital transformation and gain a competitive advantage.

While data collection and storage continues to happen at a dramatic clip, organizations typically analyze and use less than 0.5 percent of the information they take in – that’s a huge loss of potential. Companies have to know what data they have and understand what it means in common, standardized terms so they can act on it to the benefit of the organization.

Unfortunately, organizations spend a lot more time searching for data rather than actually putting it to work. In fact, data professionals spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analysis, according to IDC.

The solution is data intelligence. It improves IT and business data literacy and knowledge, supporting enterprise data governance and business enablement.

It helps solve the lack of visibility and control over “data at rest” in databases, data lakes and data warehouses and “data in motion” as it is integrated with and used by key applications.

Organizations need a real-time, accurate picture of the metadata landscape to:

  • Discover data – Identify and interrogate metadata from various data management silos.
  • Harvest data – Automate metadata collection from various data management silos and consolidate it into a single source.
  • Structure and deploy data sources – Connect physical metadata to specific data models, business terms, definitions and reusable design standards.
  • Analyze metadata – Understand how data relates to the business and what attributes it has.
  • Map data flows – Identify where to integrate data and track how it moves and transforms.
  • Govern data – Develop a governance model to manage standards, policies and best practices and associate them with physical assets.
  • Socialize data – Empower stakeholders to see data in one place and in the context of their roles.

The Right Tools

When it comes to digital transformation (like most things), organizations want to do it right. Do it faster. Do it cheaper. And do it without the risk of breaking everything. To accomplish all of this, you need the right tools.

The erwin Data Intelligence (DI) Suite is the heart of the erwin EDGE platform for creating an “enterprise data governance experience.” erwin DI combines data cataloging and data literacy capabilities to provide greater awareness of and access to available data assets, guidance on how to use them, and guardrails to ensure data policies and best practices are followed.

erwin Data Catalog automates enterprise metadata management, data mapping, reference data management, code generation, data lineage and impact analysis. It efficiently integrates and activates data in a single, unified catalog in accordance with business requirements. With it, you can:

  • Schedule ongoing scans of metadata from the widest array of data sources.
  • Keep metadata current with full versioning and change management.
  • Easily map data elements from source to target, including data in motion, and harmonize data integration across platforms.

erwin Data Literacy provides self-service, role-based, contextual data views. It also provides a business glossary for the collaborative definition of enterprise data in business terms, complete with built-in accountability and workflows. With it, you can:

  • Enable data consumers to define and discover data relevant to their roles.
  • Facilitate the understanding and use of data within a business context.
  • Ensure the organization is fluent in the language of data.

With data governance and intelligence, enterprises can discover, understand, govern and socialize mission-critical information. And because many of the associated processes can be automated, you reduce errors and reliance on technical resources while increasing the speed and quality of your data pipeline to accomplish whatever your strategic objectives are, including digital transformation.

Check out our latest whitepaper, Data Intelligence: Empowering the Citizen Analyst with Democratized Data.

Data Intelligence: Empowering the Citizen Analyst with Democratized Data

Categories
erwin Expert Blog

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

In light of recent, high-profile data breaches, it’s past-time we re-examined strategic data governance and its role in managing regulatory requirements.

News broke earlier this week of British Airways being fined 183 million pounds – or $228 million – by the U.K. for alleged violations of the European Union’s General Data Protection Regulation (GDPR). While not the first, it is the largest penalty levied since the GDPR went into effect in May 2018.

Given this, Oppenheimer & Co. cautions:

“European regulators could accelerate the crackdown on GDPR violators, which in turn could accelerate demand for GDPR readiness. Although the CCPA [California Consumer Privacy Act, the U.S. equivalent of GDPR] will not become effective until 2020, we believe that new developments in GDPR enforcement may influence the regulatory framework of the still fluid CCPA.”

With all the advance notice and significant chatter for GDPR/CCPA,  why aren’t organizations more prepared to deal with data regulations?

In a word? Complexity.

The complexity of regulatory requirements in and of themselves is aggravated by the complexity of the business and data landscapes within most enterprises.

So it’s important to understand how to use strategic data governance to manage the complexity of regulatory compliance and other business objectives …

Designing and Operationalizing Regulatory Compliance Strategy

It’s not easy to design and deploy compliance in an environment that’s not well understood and difficult in which to maneuver. First you need to analyze and design your compliance strategy and tactics, and then you need to operationalize them.

Modern, strategic data governance, which involves both IT and the business, enables organizations to plan and document how they will discover and understand their data within context, track its physical existence and lineage, and maximize its security, quality and value. It also helps enterprises put these strategic capabilities into action by:

  • Understanding their business, technology and data architectures and their inter-relationships, aligning them with their goals and defining the people, processes and technologies required to achieve compliance.
  • Creating and automating a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand lineage.
  • Activating their metadata to drive agile data preparation and governance through integrated data glossaries and dictionaries that associate policies to enable stakeholder data literacy.

Strategic Data Governance for GDPR/CCPA

Five Steps to GDPR/CCPA Compliance

With the right technology, GDPR/CCPA compliance can be automated and accelerated in these five steps:

  1. Catalog systems

Harvest, enrich/transform and catalog data from a wide array of sources to enable any stakeholder to see the interrelationships of data assets across the organization.

  1. Govern PII “at rest”

Classify, flag and socialize the use and governance of personally identifiable information regardless of where it is stored.

  1. Govern PII “in motion”

Scan, catalog and map personally identifiable information to understand how it moves inside and outside the organization and how it changes along the way.

  1. Manage policies and rules

Govern business terminology in addition to data policies and rules, depicting relationships to physical data catalogs and the applications that use them with lineage and impact analysis views.

  1. Strengthen data security

Identify regulatory risks and guide the fortification of network and encryption security standards and policies by understanding where all personally identifiable information is stored, processed and used.

How erwin Can Help

erwin is the only software provider with a complete, metadata-driven approach to data governance through our integrated enterprise modeling and data intelligence suites. We help customers overcome their data governance challenges, with risk management and regulatory compliance being primary concerns.

However, the erwin EDGE also delivers an “enterprise data governance experience” in terms of agile innovation and business transformation – from creating new products and services to keeping customers happy to generating more revenue.

Whatever your organization’s key drivers are, a strategic data governance approach – through  business process, enterprise architecture and data modeling combined with data cataloging and data literacy – is key to success in our modern, digital world.

If you’d like to get a handle on handling your data, you can sign up for a free, one-on-one demo of erwin Data Intelligence.

For more information on GDPR/CCPA, we’ve also published a white paper on the Regulatory Rationale for Integrating Data Management and Data Governance.

GDPR White Paper

Categories
erwin Expert Blog Data Governance

Data Governance Frameworks: The Key to Successful Data Governance Implementation

A strong data governance framework is central to successful data governance implementation in any data-driven organization because it ensures that data is properly maintained, protected and maximized.

But despite this fact, enterprises often face push back when implementing a new data governance initiative or trying to mature an existing one.

Let’s assume you have some form of informal data governance operation with some strengths to build on and some weaknesses to correct. Some parts of the organization are engaged and behind the initiative, while others are skeptical about its relevance or benefits.

Some other common data governance implementation obstacles include:

  • Questions about where to begin and how to prioritize which data streams to govern first
  • Issues regarding data quality and ownership
  • Concerns about data lineage
  • Competing project and resources (time, people and funding)

By using a data governance framework, organizations can formalize their data governance implementation and subsequent adherence to. This addressess common concerns including data quality and data lineage, and provides a clear path to successful data governance implementation.

In this blog, we will cover three key steps to successful data governance implementation. We will also look into how we can expand the scope and depth of a data governance framework to ensure data governance standards remain high.

Data Governance Implementation in 3 Steps

When maturing or implementing data governance and/or a data governance framework, an accurate assessment of the ‘here and now’ is key. Then you can rethink the path forward, identifying any current policies or business processes that should be incorporated, being careful to avoid making the same mistakes of prior iterations.

With this in mind, here are three steps we recommend for implementing data governance and a data governance framework.

Data Governance Framework

Step 1: Shift the culture toward data governance

Data governance isn’t something to set and forget; it’s a strategic approach that needs to evolve over time in response to new opportunities and challenges. Therefore, a successful data governance framework has to become part of the organization’s culture but such a shift requires listening – and remembering that it’s about people, empowerment and accountability.

In most cases, a new data governance framework requires people – those in IT and across the business, including risk management and information security – to change how they work. Any concerns they raise or recommendations they make should be considered. You can encourage feedback through surveys, workshops and open dialog.

Once input has been discussed and plan agreed upon, it is critical to update roles and responsibilities, provide training and ensure ongoing communication. Many organizations now have internal certifications for different data governance roles who wear these badges with pride.

A top-down management approach will get a data governance initiative off the ground, but only bottom-up cultural adoption will carry it out.

Step 2: Refine the data governance framework

The right capabilities and tools are important for fueling an accurate, real-time data pipeline and governing it for maximum security, quality and value. For example:

Data catalogingOrganization’s implementing a data governance framework will benefit from automated metadata harvesting, data mapping, code generation and data lineage with reference data management, lifecycle management and data quality. With these capabilities, you can  efficiently integrate and activate enterprise data within a single, unified catalog in accordance with business requirements.

Data literacy Being able to discover what data is available and understand what it means in common, standardized terms is important because data elements may mean different things to different parts of the organization. A business glossary answers this need, as does the ability for stakeholders to view data relevant to their roles and understand it within a business context through a role-based portal.

Such tools are further enhanced if they can be integrated across data and business architectures and when they promote self-service and collaboration, which also are important to the cultural shift.

 

Subscribe to the erwin Expert Blog

Once you submit the trial request form, an erwin representative will be in touch to verify your request and help you start data modeling.

 

 

Step 3: Prioritize then scale the data governance framework

Because data governance is on-going, it’s important to prioritize the initial areas of focus and scale from there. Organizations that start with 30 to 50 data items are generally more successful than those that attempt more than 1,000 in the early stages.

Find some representative (familiar) data items and create examples for data ownership, quality, lineage and definition so stakeholders can see real examples of the data governance framework in action. For example:

  • Data ownership model showing a data item, its definition, producers, consumers, stewards and quality rules (for profiling)
  • Workflow showing the creation, enrichment and approval of the above data item to demonstrate collaboration

Whether your organization is just adopting data governance or the goal is to refine an existing data governance framework, the erwin DG RediChek will provide helpful insights to guide you in the journey.