Categories
erwin Expert Blog Data Governance

Are Data Governance Bottlenecks Holding You Back?

Better decision-making has now topped compliance as the primary driver of data governance. However, organizations still encounter a number of bottlenecks that may hold them back from fully realizing the value of their data in producing timely and relevant business insights.

While acknowledging that data governance is about more than risk management and regulatory compliance may indicate that companies are more confident in their data, the data governance practice is nonetheless growing in complexity because of more:

  • Data to handle, much of it unstructured
  • Sources, like IoT
  • Points of integration
  • Regulations

Without an accurate, high-quality, real-time enterprise data pipeline, it will be difficult to uncover the necessary intelligence to make optimal business decisions.

So what’s holding organizations back from fully using their data to make better, smarter business decisions?

Data Governance Bottlenecks

erwin’s 2020 State of Data Governance and Automation report, based on a survey of business and technology professionals at organizations of various sizes and across numerous industries, examined the role of automation in  data governance and intelligence  efforts.  It uncovered a number of obstacles that organizations have to overcome to improve their data operations.

The No.1 bottleneck, according to 62 percent of respondents, was documenting complete data lineage. Understanding the quality of source data is the next most serious bottleneck (58 percent); followed by finding, identifying, and harvesting data (55 percent); and curating assets with business context (52 percent).

The report revealed that all but two of the possible bottlenecks were marked by more than 50 percent of respondents. Clearly, there’s a massive need for a data governance framework to keep these obstacles from stymying enterprise innovation.

As we zeroed in on the bottlenecks of day-to-day operations, 25 percent of respondents said length of project/delivery time was the most significant challenge, followed by data quality/accuracy is next at 24 percent, time to value at 16 percent, and reliance on developer and other technical resources at 13 percent.

Are Data Governance Bottlenecks Holding You Back?

Overcoming Data Governance Bottlenecks

The 80/20 rule describes the unfortunate reality for many data stewards: they spend 80 percent of their time finding, cleaning and reorganizing huge amounts of data and only 20 percent on actual data analysis.

In fact, we found that close to 70 percent of our survey respondents spent an average of 10 or more hours per week on data-related activities, most of it searching for and preparing data.

What can you do to reverse the 80/20 rule and subsequently overcome data governance bottlenecks?

1. Don’t ignore the complexity of data lineage: It’s a risky endeavor to support data lineage using a manual approach, and businesses that attempt it that way will find it’s not sustainable given data’s constant movement from one place to another via multiple routes – and doing it correctly down to the column level. Adopting automated end-to-end lineage makes it possible to view data movement from the source to reporting structures, providing a comprehensive and detailed view of data in motion.

2. Automate code generation: Alleviate the need for developers to hand code connections from data sources to target schema. Mapping data elements to their sources within a single repository to determine data lineage and harmonize data integration across platforms reduces the need for specialized, technical resources with knowledge of ETL and database procedural code. It also makes it easier for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

3. Use an integrated impact analysis solution: By automating data due diligence for IT you can deliver operational intelligence to the business. Business users benefit from automating impact analysis to better examine value and prioritize individual data sets. Impact analysis has equal importance to IT for automatically tracking changes and understanding how data from one system feeds other systems and reports. This is an aspect of data lineage, created from technical metadata, ensuring nothing “breaks” along the change train.

4. Put data quality first: Users must have confidence in the data they use for analytics. Automating and matching business terms with data assets and documenting lineage down to the column level are critical to good decision-making. If this approach hasn’t been the case to date, enterprises should take a few steps back to review data quality measures before jumping into automating data analytics.

5. Catalog data using a solution with a broad set of metadata connectors: All data sources will be leveraged, including big data, ETL platforms, BI reports, modeling tools, mainframe, and relational data as well as data from many other types of systems. Don’t settle for a data catalog from an emerging vendor that only supports a narrow swath of newer technologies, and don’t rely on a catalog from a legacy provider that may supply only connectors for standard, more mature data sources.

6. Stress data literacy: You want to ensure that data assets are used strategically. Automation expedites the benefits of data cataloging. Curated internal and external datasets for a range of content authors doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives. There’s a clear connection to data literacy here because of its foundation in business glossaries and socializing data so all stakeholders can view and understand it within the context of their roles.

7. Make automation the norm across all data governance processes: Too many companies still live in a world where data governance is a high-level mandate, not practically implemented. To fully realize the advantages of data governance and the power of data intelligence, data operations must be automated across the board. Without automated data management, the governance housekeeping load on the business will be so great that data quality will inevitably suffer. Being able to account for all enterprise data and resolve disparity in data sources and silos using manual approaches is wishful thinking.

8. Craft your data governance strategy before making any investments: Gather multiple stakeholders—both business and IT— with multiple viewpoints to discover where their needs mesh and where they diverge and what represents the greatest pain points to the business. Solve for these first, but build buy-in by creating a layered, comprehensive strategy that ultimately will address most issues. From there, it’s on to matching your needs to an automated data governance solution that squares with business and IT – both for immediate requirements and future plans.

Register now for the first of a new, six-part webinar series on the practice of data governance and how to proactively deal with the complexities. “The What & Why of Data Governance” webinar on Tuesday, Feb. 23rd at 3 pm GMT/10 am ET.

Categories
erwin Expert Blog Data Intelligence

Top 6 Benefits of Automating End-to-End Data Lineage

Replace manual and recurring tasks for fast, reliable data lineage and overall data governance

Benefits of Data Lineage

It’s paramount that organizations understand the benefits of automating end-to-end data lineage. Critically, it makes it easier to get a clear view of how information is created and flows into, across and outside an enterprise.

The importance of end-to-end data lineage is widely understood and ignoring it is risky business. But it’s also important to understand why and how automation plays a critical role.

Benjamin Franklin said, “Lost time is never found again.” According to erwin’s “2020 State of Data Governance and Automation” report, close to 70 percent of data professional respondents say they spend an average of 10 or more hours per week on data-related activities, and most of that time is spent searching for and preparing data.

Data automation reduces the loss of time in collecting, processing and storing large chunks of data because it replaces manual processes (and human errors) with intelligent processes, software and artificial intelligence (AI).

Automating end-to-end data lineage helps organizations further focus their available resources on more important and strategic tasks, which ultimately provides greater value.

For example, automatically importing mappings from developers’ Excel sheets, flat files, Access and ETL tools into a comprehensive mappings inventory, complete with auto generated and meaningful documentation of the mappings, is a powerful way to support overall data governance.

According to the erwin report, documenting complete data lineage is currently the data operation with the largest percentage spread between its current level of automation (25%) and being seen as the most valuable operation to automate (65%).

Doing Data Lineage Right

Eliminating manual tasks is not the only reason to adopt automated data lineage. Replacing recurring tasks that don’t rely on human intelligence for completion is where automation makes an even bigger difference. Here are six benefits of automating end-to-end data lineage:

  1. Reduced Errors and Operational Costs

Data quality is crucial to every organization. Automated data capture can significantly reduce errors when compared to manual entry. Company documents can be filled out, stored, retrieved, and used more accurately and this, in turn, can save organizations a significant amount of money.

The 1-10-100 rule, commonly used in business circles, states that preventing an error will cost an organization $1, correcting an error already made will cost $10, and allowing an error to stand will cost $100.

Ratios will vary depending on the magnitude of the mistake and the company involved, of course, but the point remains that adopting the most reliable means of preventing a mistake, is the best approach to take in the long run.

  1. Faster Business Turnaround

Speed and faster time to market is a driving force behind most organizations’ efforts with data lineage automation. More work can be done when you are not waiting on someone to manually process data or forms.

For example, when everything can be scanned using RFID technology, it can be documented and confirmed instantaneously, cutting hours of work down to seconds.

This opens opportunities for employees to train for more profitable roles, allowing organizations to reinvest in their employees. With complex data architectures and systems within so many organizations, tracking data in motion and data at rest is daunting to say the least.

Harvesting the data through automation seamlessly removes ambiguity and speeds up the processing time-to-market capabilities.

  1. Compliance and Auditability

Regulatory compliance places greater transparency demands on firms when it comes to tracing and auditing data.

For example, capital markets trading firms must implement data lineage to support risk management, data governance and reporting for various regulations such as the Basel Committee on Banking Supervision’s standard number 239 (BCBS 239) and Markets in Financial Instruments Directive (MiFID II).

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Also, different organizational stakeholders (customers, employees and auditors) need to understand and trust reported data. Automated data lineage ensures captured data is accurate and consistent across its trajectory.

  1. Consistency, Clarity and Greater Efficiency

Data lineage automation can help improve efficiency and ensure accuracy. The more streamlined your processes, the more efficient your business. The more efficient your business, the more money you save on daily operations.

For example, backing up your data effectively and routinely is important. Data is one of the most important assets for any business.

However, different types of data need to be treated differently. Some data needs to be backed up daily while some types of data demand weekly or monthly backups.

With automation in place, you just need to develop backup strategies for your data with a consistent scheduling process. The actual job of backing things up will be managed by the system processes you set up for consistency and clarity.

  1. Improved Customer and Employee Satisfaction

Customer disengagement is a more severe problem than you might think. A recent study has shown that it costs U.S. businesses around $300 billion annually, nearly equal to the U.S. defense budget. When the employees are disengaged, they consistently give you their time but do not put the best of their efforts.

With data lineage automation, employers can automate such tasks and free up time for high-value work. According to a smartsheet report, 69% of employees thought that automation would reduce wasting time during their workday and 59% thought that they would have more than six spare hours per week if repetitive jobs were automated.

  1. Governance Enforcement

Data lineage automation is a great way to implement governance in any business. Any task that an automated process completes is always documented and has traceability.

For every task, you get clear logs that tell you what was done, who did it and when it was done. As stated before, automation plays a major role in reducing human errors and speeds up tasks that need to be performed repeatedly.

If you have not made the jump to digital yet, you are probably wading through high volumes of resources and manual processes daily. There is no denying the fact that automating business processes contributes immensely to an organization’s success. 

Automated Data Lineage in Action

Automated data lineage tools document the flow of data into and out of an organization’s systems. They capture end-to-end lineage and ensure proper impact analysis can be performed in the event of problems or changes to data assets as they move across pipelines.

erwin Data Intelligence (erwin DI) helps bind business terms to technical data assets with a complete data lineage of scanned metadata assets. Automating data capture frees up resources to focus on more strategic and useful tasks.

It automatically generates end-to-end data lineage, down to the column level and between repositories. You can view data flows from source systems to the reporting layers, including intermediate transformation and business logic.

Request your own demo of erwin DI to see metadata-driven, automated data lineage in action.

erwin Data Intelligence

Categories
erwin Expert Blog Data Intelligence

Why You Need End-to-End Data Lineage

Not Documenting End-to-End Data Lineage Is Risky Business – Understanding your data’s origins is key to successful data governance.

Not everyone understands what end-to-end data lineage is or why it is important. In a previous blog, I explained that data lineage is basically the history of data, including a data set’s origin, characteristics, quality and movement over time.

This information is critical to regulatory compliance, change management and data governance not to mention delivering an optimal customer experience. But given the volume, velocity and variety of data (the three Vs of data) we generate today, producing and keeping up with end-to-end data linage is complex and time-consuming.

Yet given this era of digital transformation and fierce competition, understanding what data you have, where it came from, how it’s changed since creation or acquisition, and whether it poses any risks is paramount to optimizing its value. Furthermore, faulty decision-making based on inconsistent analytics and inaccurate reporting can cost millions.

Data Lineage

Data Lineage Tells an Important Origin Story

End-to-end data lineage explains how information flows into, across and outside an organization. And knowing how information was created, its origin and quality may have greater value than a given data set’s current state.

For example, data lineage provides a way to determine which downstream applications and processes are affected by a change in data expectations and helps in planning for application updates.

As I mentioned above, the three Vs of data and the integration of systems makes it difficult to understand the resulting data web much less capture a simple visual of that flow. Yet a consistent view of data and how it flows is paramount to the success of enterprise data governance and any data-driven initiative.

Whether you need to drill down for a granular view of a particular data set or create a high-level summary to describe a particular system and the data it relies on, end-to-end data lineage must be documented and tracked, with an emphasis on the dynamics of data processing and movement as opposed to data structures. Data lineage helps answer questions about the origin of data in key performance indicator (KPI) reports, including:

  • How are the report tables and columns defined in the metadata?
  • Who are the data owners?
  • What are the transformation rules?

Five Consequences of Ignoring Data Lineage

Why do so many organizations struggle with end-to-end data lineage?

The struggle is real for a number of reasons. At the top of the list, organizations are dealing with more data than ever before using systems that weren’t designed to communicate effectively with one another.

Next, their IT and business stakeholders have a difficult time collaborating. And, for a lot of organizations, they’ve relied mostly on manual processes – if data lineage documentation has been attempted at all.

The risks of ignoring end-to-end data lineage are just too great. Let’s look at some of those consequences:

  1. Derailed Projects

Effectively managing business operations is a key factor to success– especially for organizations that are in the midst of digital transformation. Failures in business processes attributed to errors can be a big problem.

For example, in a typical business scenario where an incorrect data set is discovered within a report, the length of time (on average) that it takes a team to find the source of the error can take days or sometimes weeks – derailing the project and costing time and money.

  1. Policy Bloat and Unruly Rules

The business glossary environment must represent the actual environment, e.g., be refreshed and synched, otherwise it becomes obsolete. You need real collaboration.

Data dictionaries, glossaries and policies can’t live in different formats and in different places. It is common for these to be expressed in different ways, depending on the database and underlying storage technology, but this causes policy bloat and rules that no organization, team or employee will understand, let alone realistically manage.

Effective data governance requires that business glossaries, data dictionaries and data privacy policies live in one central location, so they can be easily tracked, monitored and updated over time.

  1. Major Inefficiencies

Successful data migration and upgrades rely on seamless integration of tools and processes with coordinated efforts of people/resources. A passive approach frequently relies on creating new copies of data, usually with sensitive identifiers removed or obscured.

Not only does this passive approach create inefficiencies between determining what data to copy, how to copy it, and where to store the copy, it also creates new volumes of data that become harder to track over time. Yet again, a passive approach to data cannot scale. Direct access to the same live data across the organization is required.

  1. Not Knowing Where Your Data Is

Metadata management and manual mapping are a challenge to most organizations. Data comes in all shapes, sizes and formats, and there is no way to know what type of data a project will need – or even where that data will sit.

Some data might be in the cloud, some on premise, and sometimes projects will require a hybrid approach. All data must be governed, regardless of where it is located.

  1. Privacy and Compliance Challenges

Privacy and compliance personnel know the rules that must be applied to data, but may not necessarily know the technology. Instead, automated data governance requires that anyone, with any level of expertise, can understand what rules (e.g. privacy policies) are applied to enterprise data.

Organizations with established data governance must empower both those with technical skill sets and those with privacy and compliance knowledge, so all teams can play a meaningful role controlling how data is used.

For more information on data lineage, get the free white paper, Tech Brief: Data Lineage.

End-to-End Data Lineage

 

Categories
erwin Expert Blog Data Governance

What is Data Lineage? Top 5 Benefits of Data Lineage

What is Data Lineage and Why is it Important?

Data lineage is the journey data takes from its creation through its transformations over time. It describes a certain dataset’s origin, movement, characteristics and quality.

Tracing the source of data is an arduous task.

Many large organizations, in their desire to modernize with technology, have acquired several different systems with various data entry points and transformation rules for data as it moves into and across the organization.

data lineage

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (API)s, file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

With all these diverse data sources, and if systems are integrated, it is difficult to understand the complicated data web they form much less get a simple visual flow. This is why data’s lineage must be tracked and why its role is so vital to business operations, providing the ability to understand where data originates, how it is transformed, and how it moves into, across and outside a given organization.

Data Lineage Use Case: From Tracing COVID-19’s Origins to Data-Driven Business

A lot of theories have emerged about the origin of the coronavirus. A recent University of California San Francisco (UCSF) study conducted a genetic analysis of COVID-19 to determine how the virus was introduced specifically to California’s Bay Area.

It detected at least eight different viral lineages in 29 patients in February and early March, suggesting no regional patient zero but rather multiple independent introductions of the pathogen. The professor who directed the study said, “it’s like sparks entering California from various sources, causing multiple wildfires.”

Much like understanding viral lineage is key to stopping this and other potential pandemics, understanding the origin of data, is key to a successful data-driven business.

Top Five Data Lineage Benefits

From my perspective in working with customers of various sizes across multiple industries, I’d like to highlight five data lineage benefits:

1. Business Impact

Data is crucial to every organization’s survival. For that reason, businesses must think about the flow of data across multiple systems that fuel organizational decision-making.

For example, the marketing department uses demographics and customer behavior to forecast sales. The CEO also makes decisions based on performance and growth statistics. An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including:

  • How the report tables and columns are defined in the metadata?
  • Who are the data owners?
  • What are the transformation rules?

Without data lineage, these functions are irrelevant, so it makes sense for a business to have a clear understanding of where data comes from, who uses it, and how it transforms. Also, when there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape.

In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

2. Compliance & Auditability

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Regulatory compliance places greater transparency demands on firms when it comes to tracing and auditing data. For example, capital markets trading firms must understand their data’s origins and history to support risk management, data governance and reporting for various regulations such as BCBS 239 and MiFID II.

Also, different organizational stakeholders (customers, employees and auditors) need to be able to understand and trust reported data. Data lineage offers proof that the data provided is reflected accurately.

3. Data Governance

An automated data lineage solution stitches together metadata for understanding and validating data usage, as well as mitigating the associated risks.

It can auto-document end-to-end upstream and downstream data lineage, revealing any changes that have been made, by whom and when.

This data ownership, accountability and traceability is foundational to a sound data governance program.

See: The Benefits of Data Governance

4. Collaboration

Analytics and reporting are data-dependent, making collaboration among different business groups and/or departments crucial.

The visualization of data lineage can help business users spot the inherent connections of data flows and thus provide greater transparency and auditability.

Seeing data pipelines and information flows further supports compliance efforts.

5. Data Quality

Data quality is affected by data’s movement, transformation, interpretation and selection through people, process and technology.

Root-cause analysis is the first step in repairing data quality. Once a data steward determines where a data flaw was introduced, the reason for the error can be determined.

With data lineage and mapping, the data steward can trace the information flow backward to examine the standardizations and transformations applied to confirm whether they were performed correctly.

See Data Lineage in Action

Data lineage tools document the flow of data into and out of an organization’s systems. They capture end-to-end lineage and ensure proper impact analysis can be performed in the event of problems or changes to data assets as they move across pipelines.

The erwin Data Intelligence Suite (erwin DI) automatically generates end-to-end data lineage, down to the column level and between repositories. You can view data flows from source systems to the reporting layers, including intermediate transformation and business logic.

Join us for the next live demo of erwin Data Intelligence (DI) to see metadata-driven, automated data lineage in action.

erwin data intelligence

Subscribe to the erwin Expert Blog

Once you submit the trial request form, an erwin representative will be in touch to verify your request and help you start data modeling.

Categories
erwin Expert Blog

How Metadata Makes Data Meaningful

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

So most early-stage data governance managers kick off a series of projects to profile data, make inferences about data element structure and format, and store the presumptive metadata in some metadata repository. But are these rampant and often uncontrolled projects to collect metadata properly motivated?

There is rarely a clear directive about how metadata is used. Therefore prior to launching metadata collection tasks, it is important to specifically direct how the knowledge embedded within the corporate metadata should be used.

Managing metadata should not be a sub-goal of data governance. Today, metadata is the heart of enterprise data management and governance/ intelligence efforts and should have a clear strategy – rather than just something you do.

metadata data governance

What Is Metadata?

Quite simply, metadata is data about data. It’s generated every time data is captured at a source, accessed by users, moved through an organization, integrated or augmented with other data from other sources, profiled, cleansed and analyzed. Metadata is valuable because it provides information about the attributes of data elements that can be used to guide strategic and operational decision-making. It answers these important questions:

  • What data do we have?
  • Where did it come from?
  • Where is it now?
  • How has it changed since it was originally created or captured?
  • Who is authorized to use it and how?
  • Is it sensitive or are there any risks associated with it?

The Role of Metadata in Data Governance

Organizations don’t know what they don’t know, and this problem is only getting worse. As data continues to proliferate, so does the need for data and analytics initiatives to make sense of it all. Here are some benefits of metadata management for data governance use cases:

  • Better Data Quality: Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair.
  • Quicker Project Delivery: Accelerate Big Data deployments, Data Vaults, data warehouse modernization, cloud migration, etc., by up to 70 percent.
  • Faster Speed to Insights: Reverse the current 80/20 rule that keeps high-paid knowledge workers too busy finding, understanding and resolving errors or inconsistencies to actually analyze source data.
  • Greater Productivity & Reduced Costs: Being able to rely on automated and repeatable metadata management processes results in greater productivity. Some erwin customers report productivity gains of 85+% for coding, 70+% for metadata discovery, up to 50% for data design, up to 70% for data conversion, and up to 80% for data mapping.
  • Regulatory Compliance: Regulations such as GDPR, HIPAA, PII, BCBS and CCPA have data privacy and security mandates, so sensitive data needs to be tagged, its lineage documented, and its flows depicted for traceability.
  • Digital Transformation: Knowing what data exists and its value potential promotes digital transformation by improving digital experiences, enhancing digital operations, driving digital innovation and building digital ecosystems.
  • Enterprise Collaboration: With the business driving alignment between data governance and strategic enterprise goals and IT handling the technical mechanics of data management, the door opens to finding, trusting and using data to effectively meet organizational objectives.

Giving Metadata Meaning

So how do you give metadata meaning? While this sounds like a deep philosophical question, the reality is the right tools can make all the difference.

erwin Data Intelligence (erwin DI) combines data management and data governance processes in an automated flow.

It’s unique in its ability to automatically harvest, transform and feed metadata from a wide array of data sources, operational processes, business applications and data models into a central data catalog and then make it accessible and understandable within the context of role-based views.

erwin DI sits on a common metamodel that is open, extensible and comes with a full set of APIs. A comprehensive list of erwin-owned standard data connectors are included for automated harvesting, refreshing and version-controlled metadata management. Optional erwin Smart Data Connectors reverse-engineer ETL code of all types and connect bi-directionally with reporting and other ecosystem tools. These connectors offer the fastest and most accurate path to data lineage, impact analysis and other detailed graphical relationships.

Additionally, erwin DI is part of the larger erwin EDGE platform that integrates data modelingenterprise architecturebusiness process modelingdata cataloging and data literacy. We know our customers need an active metadata-driven approach to:

  • Understand their business, technology and data architectures and the relationships between them
  • Create an automate a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand lineage
  • Activate their metadata to drive agile and well-governed data preparation with integrated business glossaries and data dictionaries that provide business context for stakeholder data literacy

erwin was named a Leader in Gartner’s “2019 Magic Quadrant for Metadata Management Solutions.”

Click here to get a free copy of the report.

Click here to request a demo of erwin DI.

Gartner Magic Quadrant Metadata Management

 

Categories
erwin Expert Blog

Metadata Management, Data Governance and Automation

Can the 80/20 Rule Be Reversed?

erwin released its State of Data Governance Report in February 2018, just a few months before the General Data Protection Regulation (GDPR) took effect.

This research showed that the majority of responding organizations weren’t actually prepared for GDPR, nor did they have the understanding, executive support and budget for data governance – although they recognized the importance of it.

Of course, data governance has evolved with astonishing speed, both in response to data privacy and security regulations and because organizations see the potential for using it to accomplish other organizational objectives.

But many of the world’s top brands still seem to be challenged in implementing and sustaining effective data governance programs (hello, Facebook).

We wonder why.

Too Much Time, Too Few Insights

According to IDC’s “Data Intelligence in Context” Technology Spotlight sponsored by erwin, “professionals who work with data spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analytics.”

Specifically, 80 percent of data professionals’ time is spent on data discovery, preparation and protection, and only 20 percent on analysis leading to insights.

In most companies, an incredible amount of data flows from multiple sources in a variety of formats and is constantly being moved and federated across a changing system landscape.

Often these enterprises are heavily regulated, so they need a well-defined data integration model that will help avoid data discrepancies and remove barriers to enterprise business intelligence and other meaningful use.

IT teams need the ability to smoothly generate hundreds of mappings and ETL jobs. They need their data mappings to fall under governance and audit controls, with instant access to dynamic impact analysis and data lineage.

But most organizations, especially those competing in the digital economy, don’t have enough time or money for data management using manual processes. Outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too.

The Role of Data Automation

Data governance maturity includes the ability to rely on automated and repeatable processes.

For example, automatically importing mappings from developers’ Excel sheets, flat files, Access and ETL tools into a comprehensive mappings inventory, complete with automatically generated and meaningful documentation of the mappings, is a powerful way to support governance while providing real insight into data movement — for data lineage and impact analysis — without interrupting system developers’ normal work methods.

GDPR compliance, for instance, requires a business to discover source-to-target mappings with all accompanying transactions, such as what business rules in the repository are applied to it, to comply with audits.

When data movement has been tracked and version-controlled, it’s possible to conduct data archeology — that is, reverse-engineering code from existing XML within the ETL layer — to uncover what has happened in the past and incorporating it into a mapping manager for fast and accurate recovery.

With automation, data professionals can meet the above needs at a fraction of the cost of the traditional, manual way. To summarize, just some of the benefits of data automation are:

• Centralized and standardized code management with all automation templates stored in a governed repository
• Better quality code and minimized rework
• Business-driven data movement and transformation specifications
• Superior data movement job designs based on best practices
• Greater agility and faster time-to-value in data preparation, deployment and governance
• Cross-platform support of scripting languages and data movement technologies

One global pharmaceutical giant reduced costs by 70 percent and generated 95 percent of production code with “zero touch.” With automation, the company improved the time to business value and significantly reduced the costly re-work associated with error-prone manual processes.

Gartner Magic Quadrant Metadata Management

Help Us Help You by Taking a Brief Survey

With 2020 just around the corner and another data regulation about to take effect, the California Consumer Privacy Act (CCPA), we’re working with Dataversity on another research project.

And this time, you guessed it – we’re focusing on data automation and how it could impact metadata management and data governance.

We would appreciate your input and will release the findings in January 2020.

Click here to take the brief survey

Categories
erwin Expert Blog

Business Process Can Make or Break Data Governance

Data governance isn’t a one-off project with a defined endpoint. It’s an on-going initiative that requires active engagement from executives and business leaders.

Data governance, today, comes back to the ability to understand critical enterprise data within a business context, track its physical existence and lineage, and maximize its value while ensuring quality and security.

Free Data Modeling Best Practice Guide

Historically, little attention has focused on what can literally make or break any data governance initiative — turning it from a launchpad for competitive advantage to a recipe for disaster. Data governance success hinges on business process modeling and enterprise architecture.

To put it even more bluntly, successful data governance* must start with business process modeling and analysis.

*See: Three Steps to Successful & Sustainable Data Governance Implementation

Business Process Data Governance

Passing the Data Governance Ball

For years, data governance was the volleyball passed back and forth over the net between IT and the business, with neither side truly owning it. However, once an organization understands that IT and the business are both responsible for data, it needs to develop a comprehensive, holistic strategy for data governance that is capable of four things:

  1. Reaching every stakeholder in the process
  2. Providing a platform for understanding and governing trusted data assets
  3. Delivering the greatest benefit from data wherever it lives, while minimizing risk
  4. Helping users understand the impact of changes made to a specific data element across the enterprise.

To accomplish this, a modern data governance strategy needs to be interdisciplinary to break down traditional silos. Enterprise architecture is important because it aligns IT and the business, mapping a company’s applications and the associated technologies and data to the business functions and value streams they enable.

Ovum Market Radar: Enterprise Architecture

The business process and analysis component is vital because it defines how the business operates and ensures employees understand and are accountable for carrying out the processes for which they are responsible. Enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Slow Down, Ask Questions

In a rush to implement a data governance methodology and system, organizations can forget that a system must serve a process – and be governed/controlled by one.

To choose the correct system and implement it effectively and efficiently, you must know – in every detail – all the processes it will impact. You need to ask these important questions:

  1. How will it impact them?
  2. Who needs to be involved?
  3. When do they need to be involved?

These questions are the same ones we ask in data governance. They involve impact analysis, ownership and accountability, control and traceability – all of which effectively documented and managed business processes enable.

Data sets are not important in and of themselves. Data sets become important in terms of how they are used, who uses them and what their use is – and all this information is described in the processes that generate, manipulate and use them. So unless we know what those processes are, how can any data governance implementation be complete or successful?

Processes need to be open and shared in a concise, consistent way so all parts of the organization can investigate, ask questions, and then add their feedback and information layers. In other words, processes need to be alive and central to the organization because only then will the use of data and data governance be truly effective.

A Failure to Communicate

Consider this scenario: We’ve perfectly captured our data lineage, so we know what our data sets mean, how they’re connected, and who’s responsible for them – not a simple task but a massive win for any organization. Now a breach occurs. Will any of the above information tell us why it happened? Or where? No! It will tell us what else is affected and who can manage the data layer(s), but unless we find and address the process failure that led to the breach, it is guaranteed to happen again.

By knowing where data is used – the processes that use and manage it – we can quickly, even instantly, identify where a failure occurs. Starting with data lineage (meaning our forensic analysis starts from our data governance system), we can identify the source and destination processes and the associated impacts throughout the organization.

We can know which processes need to change and how. We can anticipate the pending disruptions to our operations and, more to the point, the costs involved in mitigating and/or addressing them.

But knowing all the above requires that our processes – our essential and operational business architecture – be accurately captured and modelled. Instituting data governance without processes is like building a castle on sand.

Rethinking Business Process Modeling and Analysis

Modern organizations need a business process modeling and analysis tool with easy access to all the operational layers across the organization – from high-level business architecture all the way down to data.

Such a system should be flexible, adjustable, easy-to-use and capable of supporting multiple layers simultaneously, allowing users to start in their comfort zones and mature as they work toward their organization’s goals.

The erwin EDGE is one of the most comprehensive software platforms for managing an organization’s data governance and business process initiatives, as well as the whole data architecture. It allows natural, organic growth throughout the organization and the assimilation of data governance and business process management under the same platform provides a unique data governance experience because of its integrated, collaborative approach.

Start your free, cloud-based trial of erwin Business Process and see how some of the world’s largest enterprises have benefited from its centralized repository and integrated, role-based views.

We’d also be happy to show you our data governance software, which includes data cataloging and data literacy capabilities.

Enterprise Architecture Business Process Trial

Categories
erwin Expert Blog

Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Catalog Benefits

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

  • First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
  • Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
  • Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

Value of Data Intelligence IDC Report

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

Categories
erwin Expert Blog Data Intelligence

The Top 8 Benefits of Data Lineage

It’s important we recognize the benefits of data lineage.

As corporate data governance programs have matured, the inventory of agreed-to data policies has grown rapidly. These include guidelines for data quality assurance, regulatory compliance and data democratization, among other information utilization initiatives.

Organizations that are challenged by translating their defined data policies into implemented processes and procedures are starting to identify tools and technologies that can supplement the ways organizational data policies can be implemented and practiced.

One such technique, data lineage, is gaining prominence as a core operational business component of the data governance technology architecture. Data lineage encompasses processes and technology to provide full-spectrum visibility into the ways that data flow across the enterprise.

To data-driven businesses, the benefits of data lineage are significant. Data lineage tools are used to survey, document and enable data stewards to query and visualize the end-to-end flow of information units from their origination points through the series of transformation and processing stages to their final destination.

Benefits of Data Lineage

The Benefits of Data Lineage

Data stewards are attracted to data lineage because the benefits of data lineage help in a number of different governance practices, including:

1. Operational intelligence

At its core, data lineage captures the mappings of the rapidly growing number of data pipelines in the organization. Visualizing the information flow landscape provides insight into the “demographics” of data consumption and use, answering questions such as “what data sources feed the greatest number of downstream sources” or “which data analysts use data that is ingested from a specific data source.” Collecting this intelligence about the data landscape better positions the data stewards for enforcing governance policies.

2. Business terminology consistency

One of the most confounding data governance challenges is understanding the semantics of business terminology within data management contexts. Because application development was traditionally isolated within each business function, the same (or similar) terms are used in different data models, even though the designers did not take the time to align definitions and meanings. Data lineage allows the data stewards to find common business terms, review their definitions, and determine where there are inconsistencies in the ways the terms are used.

3. Data incident root cause analysis

It has long been asserted that when a data consumer finds a data error, the error most likely was introduced into the environment at an earlier stage of processing. Yet without a “roadmap” that indicates the processing stages through which the data were processed, it is difficult to speculate where the error was actually introduced. Using data lineage, though, a data steward can insert validation probes within the information flow to validate data values and determine the stage in the data pipeline where an error originated.

4. Data quality remediation assessment

Root cause analysis is just the first part of the data quality process. Once the data steward has determined where the data flaw was introduced, the next step is to determine why the error occurred. Again, using a data lineage mapping, the steward can trace backward through the information flow to examine the standardizations and transformations applied to the data, validate that transformations were correctly performed, or identify one (or more) performed incorrectly, resulting in the data flaw.

5. Impact analysis

The enterprise is always subject to changes; externally-imposed requirements (such as regulatory compliance) evolve, internal business directives may affect user expectations, and ingested data source models may change unexpectedly. When there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape. In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

6. Performance assessment

Not only does lineage provide a collection of mappings of data pipelines, it allows for the identification of potential performance bottlenecks. Data pipeline stages with many incoming paths are candidate bottlenecks. Using a set of data lineage mappings, the performance analyst can profile execution times across different pipelines and redistribute processing to eliminate bottlenecks.

7. Policy compliance

Data policies can be implemented through the specification of business rules. Compliance with these business rules can be facilitated using data lineage by embedding business rule validation controls across the data pipelines. These controls can generate alerts when there are noncompliant data instances.

8. Auditability of data pipelines

In many cases, regulatory compliance is a combination of enforcing a set of defined data policies along with a capability for demonstrating that the overall process is compliant. Data lineage provides visibility into the data pipelines and information flows that can be audited thereby supporting the compliance process.

Evaluating Enterprise Data Lineage Tools

While data lineage benefits are obvious, large organizations with complex data pipelines and data flows do face challenges in embracing the technology to document the enterprise data pipelines. These include:

  • Surveying the enterprise – Gathering information about the sources, flows and configurations of data pipelines.
  • Maintenance – Configuring a means to maintain an up-to-date view of the data pipelines.
  • Deliverability – Providing a way to give data consumers visibility to the lineage maps.
  • Sustainability – Ensuring sustainability of the processes for producing data lineage mappings.

Producing a collection of up-to-date data lineage mappings that are easily reviewed by different data consumers depends on addressing these challenges. When considering data lineage tools, keep these issues in mind when evaluating how well the tools can meet your data governance needs.

erwin Data Intelligence (erwin DI) helps organizations automate their data lineage initiatives. Learn more about data lineage with erwin DI.

Value of Data Intelligence IDC Report

Categories
erwin Expert Blog

Constructing a Digital Transformation Strategy: Putting the Data in Digital Transformation

Having a clearly defined digital transformation strategy is an essential best practice for successful digital transformation. But what makes a digital transformation strategy viable?

Part Two of the Digital Transformation Journey …

In our last blog on driving digital transformation, we explored how business architecture and process (BP) modeling are pivotal factors in a viable digital transformation strategy.

EA and BP modeling squeeze risk out of the digital transformation process by helping organizations really understand their businesses as they are today. It gives them the ability to identify what challenges and opportunities exist, and provides a low-cost, low-risk environment to model new options and collaborate with key stakeholders to figure out what needs to change, what shouldn’t change, and what’s the most important changes are.

Once you’ve determined what part(s) of your business you’ll be innovating — the next step in a digital transformation strategy is using data to get there.

Digital Transformation Examples

Constructing a Digital Transformation Strategy: Data Enablement

Many organizations prioritize data collection as part of their digital transformation strategy. However, few organizations truly understand their data or know how to consistently maximize its value.

If your business is like most, you collect and analyze some data from a subset of sources to make product improvements, enhance customer service, reduce expenses and inform other, mostly tactical decisions.

The real question is: are you reaping all the value you can from all your data? Probably not.

Most organizations don’t use all the data they’re flooded with to reach deeper conclusions or make other strategic decisions. They don’t know exactly what data they have or even where some of it is, and they struggle to integrate known data in various formats and from numerous systems—especially if they don’t have a way to automate those processes.

How does your business become more adept at wringing all the value it can from its data?

The reality is there’s not enough time, people and money for true data management using manual processes. Therefore, an automation framework for data management has to be part of the foundations of a digital transformation strategy.

Your organization won’t be able to take complete advantage of analytics tools to become data-driven unless you establish a foundation for agile and complete data management.

You need automated data mapping and cataloging through the integration lifecycle process, inclusive of data at rest and data in motion.

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes—complete with full documentation.

Without this framework and the ability to automate many of its processes, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by taking a manual approach. Outsourcing these data management efforts to professional services firms only delays schedules and increases costs.

With automation, data quality is systemically assured. The data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders.

Constructing a Digital Transformation Strategy: Smarter Data

Ultimately, data is the foundation of the new digital business model. Companies that have the ability to harness, secure and leverage information effectively may be better equipped than others to promote digital transformation and gain a competitive advantage.

While data collection and storage continues to happen at a dramatic clip, organizations typically analyze and use less than 0.5 percent of the information they take in – that’s a huge loss of potential. Companies have to know what data they have and understand what it means in common, standardized terms so they can act on it to the benefit of the organization.

Unfortunately, organizations spend a lot more time searching for data rather than actually putting it to work. In fact, data professionals spend 80 percent of their time looking for and preparing data and only 20 percent of their time on analysis, according to IDC.

The solution is data intelligence. It improves IT and business data literacy and knowledge, supporting enterprise data governance and business enablement.

It helps solve the lack of visibility and control over “data at rest” in databases, data lakes and data warehouses and “data in motion” as it is integrated with and used by key applications.

Organizations need a real-time, accurate picture of the metadata landscape to:

  • Discover data – Identify and interrogate metadata from various data management silos.
  • Harvest data – Automate metadata collection from various data management silos and consolidate it into a single source.
  • Structure and deploy data sources – Connect physical metadata to specific data models, business terms, definitions and reusable design standards.
  • Analyze metadata – Understand how data relates to the business and what attributes it has.
  • Map data flows – Identify where to integrate data and track how it moves and transforms.
  • Govern data – Develop a governance model to manage standards, policies and best practices and associate them with physical assets.
  • Socialize data – Empower stakeholders to see data in one place and in the context of their roles.

The Right Tools

When it comes to digital transformation (like most things), organizations want to do it right. Do it faster. Do it cheaper. And do it without the risk of breaking everything. To accomplish all of this, you need the right tools.

The erwin Data Intelligence (DI) Suite is the heart of the erwin EDGE platform for creating an “enterprise data governance experience.” erwin DI combines data cataloging and data literacy capabilities to provide greater awareness of and access to available data assets, guidance on how to use them, and guardrails to ensure data policies and best practices are followed.

erwin Data Catalog automates enterprise metadata management, data mapping, reference data management, code generation, data lineage and impact analysis. It efficiently integrates and activates data in a single, unified catalog in accordance with business requirements. With it, you can:

  • Schedule ongoing scans of metadata from the widest array of data sources.
  • Keep metadata current with full versioning and change management.
  • Easily map data elements from source to target, including data in motion, and harmonize data integration across platforms.

erwin Data Literacy provides self-service, role-based, contextual data views. It also provides a business glossary for the collaborative definition of enterprise data in business terms, complete with built-in accountability and workflows. With it, you can:

  • Enable data consumers to define and discover data relevant to their roles.
  • Facilitate the understanding and use of data within a business context.
  • Ensure the organization is fluent in the language of data.

With data governance and intelligence, enterprises can discover, understand, govern and socialize mission-critical information. And because many of the associated processes can be automated, you reduce errors and reliance on technical resources while increasing the speed and quality of your data pipeline to accomplish whatever your strategic objectives are, including digital transformation.

Check out our latest whitepaper, Data Intelligence: Empowering the Citizen Analyst with Democratized Data.

Data Intelligence: Empowering the Citizen Analyst with Democratized Data