Category: Data Intelligence

The very best, best practice guides and advice for professionals interested in data intelligence.

Do I Need a Data Catalog?

Post author By Michael Adjei
Post date June 26, 2020
No Comments on Do I Need a Data Catalog?

If you’re serious about a data-driven strategy, you’re going to need a data catalog.

Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner.

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer.

It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., legacy systems, data warehouses, flat files stored on individual desktops and laptops, and modern, cloud-based repositories.)

These fragmented data environments make data governance a challenge since business stakeholders, data analysts and other users are unable to discover data or run queries across an entire data set. This also diminishes the value of data as an asset.

Data catalogs combine physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals in certain circumstances.

You also can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

The data catalog is a searchable asset that enables all data – including even formerly siloed tribal knowledge – to be cataloged and more quickly exposed to users for analysis.

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process.

For example, before users can effectively and meaningfully engage with robust business intelligence (BI) platforms, they must have a way to ensure that the most relevant, important and valuable data set are included in analysis.

The most optimal and streamlined way to achieve this is by using a data catalog, which can provide a first stop for users ahead of working in BI platforms.

As a collective intelligent asset, a data catalog should include capabilities for collecting and continually enriching or curating the metadata associated with each data asset to make them easier to identify, evaluate and use properly.

Three Types of Metadata in a Data Catalog

A data catalog uses metadata, data that describes or summarizes data, to create an informative and searchable inventory of all data assets in an organization.

These assets can include but are not limited to structured data, unstructured data (including documents, web pages, email, social media content, mobile data, images, audio, video and reports) and query results, etc. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

For example, Amazon handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, the company also provides detailed information about each product, the seller’s information, shipping times, reviews, and a list of companion products. Sales are measured down to a zip code territory level across product categories.

Another classic example is the online or card catalog at a library. Each card or listing contains information about a book or publication (e.g., title, author, subject, publication date, edition, location) that makes the publication easier for a reader to find and to evaluate.

There are many types of metadata, but a data catalog deals primarily with three: technical metadata, operational or “process” metadata, and business metadata.

Technical Metadata

Technical metadata describes how the data is organized, stored, its transformation and lineage. It is structural and describes data objects such as tables, columns, rows, indexes and connections.

This aspect of the metadata guides data experts on how to work with the data (e.g. for analysis and integration purposes).

Operational Metadata

Operational metadata describes systems that process data, the applications in those systems, and the rules in those applications. This is also called “process” metadata that describes the data asset’s creation, when, how and by whom it has been accessed, used, updated or changed.

Operational metadata provides information about the asset’s history and lineage, which can help an analyst decide if the asset is recent enough for the task at hand, if it comes from a reliable source, if it has been updated by trustworthy individuals, and so on.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Business Metadata

Business metadata is sometimes referred to as external metadata attributed to the business aspects of a data asset. It defines the functionality of the data captured, definition of the data, definition of the elements, and definition of how the data is used within the business.

This is the area which binds all users together in terms of consistency and usage of catalogued data asset.

Tools should be provided that enable data experts to explore the data catalogs, curate and enrich the metadata with tags, associations, ratings, annotations, and any other information and context that helps users find data faster and use it with confidence.

Why You Need a Data Catalog – Three Business Benefits of Data Catalogs

When data professionals can help themselves to the data they need—without IT intervention and having to rely on finding experts or colleagues for advice, limiting themselves to only the assets they know about, and having to worry about governance and compliance—the entire organization benefits.

Free Cloud App for Remote Workforce Management & Compliance

Catalog critical systems and data elements plus enable the calculation and evaluation of key performance measures. It is also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

Makes data accessible and usable, reducing operational costs while increasing time to value

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

Data asset need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance.

Automating the curation of data assets also accelerates the time to value for analytics/insights reporting and significantly reduces operational costs.

Ensures regulatory compliance

Regulations like the California Consumer Privacy Act (CCPA ) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance or reputational damage are the last things you need to worry about, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all the above and much more.

Request your own demo of erwin DI.

Tags business metadata, operational metadata, technical metadata, data intelligence, data catalog, metadata management, metadata, data governance

Data Intelligence Data Governance erwin Expert Blog

Overcoming the 80/20 Rule – Finding More Time with Data Intelligence

Post author By Rachel Haines
Post date June 22, 2020
No Comments on Overcoming the 80/20 Rule – Finding More Time with Data Intelligence

The 80/20 rule is well known. It describes an unfortunate reality for many data stewards, who spend 80 percent of their time finding, cleaning and reorganizing huge amounts of data, and only 20 percent of their time on actual data analysis.

That’s a lot wasted of time.

Earlier this year, erwin released its 2020 State of Data Governance and Automation (DGA) report. About 70 percent of the DGA report respondents – a combination of roles from data architects to executive managers – say they spend an average of 10 or more hours per week on data-related activities.

Data Intelligence in Context: Enabling Data Governance for Digital Transformation

COVID-19 has changed the way we work – essentially overnight – and may change how companies work moving forward. Companies like Twitter, Shopify and Box have announced that they are moving to a permanent work-from-home status as their new normal.

For much of our time as data stewards, collecting, revising and building consensus around our metadata has meant that we need to balance find time on multiple calendars against multiple competing priorities so that we can pull the appropriate data stakeholders into a room to discuss term definitions, the rules for measuring “clean” data, and identifying processes and applications that use the data.

This style of data governance most often presents us with eight one-hour opportunities per day (40 one-hour opportunities per week) to meet.

As the 80/20 rule suggests, getting through hundreds, or perhaps thousands of individual business terms using this one-hour meeting model can take … a … long … time.

Now that pulling stakeholders into a room has been disrupted … what if we could use this as 40 opportunities to update the metadata PER DAY?

What if we could buck the trend, and overcome the 80/20 rule?

Overcoming the 80/20 Rule with Micro Governance for Metadata

Micro governance is a strategy that leverages the native functionality around workflows.

erwin Data Intelligence (DI) offers Workflow Manager that creates a persistent, reusable role-based workflow such that edits to the metadata for any term can move from, for example, draft to under review to approved to published.

Using a defined workflow, it can eliminate the need for hour-long meetings with multiple stakeholders in a room. Now users can suggest edits, review changes, and approve changes on their own schedule! Using micro governance these steps should take less than 10 minutes per term:

Log on the DI Suite
Open your work queue to see items requiring your attention
Review and/or approve changes
Log out

That’s it!

And as a bonus, where stakeholders may need to discuss the edits to achieve consensus, the Collaboration Center within the Business Glossary Manager facilitates conversations between stakeholders that persistent and attached directly to the business term. No more searching through months of email conversations or forgetting to cc a key stakeholder.

Using the DI Suite Workflow Manager and the Collaboration Center, and assuming an 8-hour workday, we should each have 48 opportunities for 10 minutes of micro-governance stewardship each day.

A Culture of Micro Governance

In these days when we are all working at home, and face-to-face meetings are all but impossible, we should see this time as an opportunity to develop a culture of micro governance around our metadata.

This new way of thinking and acting will help us continuously improve our transparency and semantic understanding of our data while staying connected and collaborating with each other.

When we finally get back into the office, the micro governance ethos we’ve built while at home will help make our data governance programs more flexible, responsive and agile. And ultimately, we’ll take up less of our colleagues’ precious time.

Request a free demo of erwin DI.

Tags metadata governance, automation, data intelligence, data governance

erwin Expert Blog Data Intelligence

The Top Five Data Intelligence Benefits

Post author By Bunny Tharpe
Post date June 11, 2020
1 Comment on The Top Five Data Intelligence Benefits

Data intelligence benefits data-driven organizations immensely. Primarily, it’s about helping organizations make more intelligent decisions based on their data.

It does this by affording organizations greater visibility and control over “data at rest” in databases, data lakes and data warehouses and “data in motion” as it’s integrated with and used by key applications.

For more context, see: What is Data Intelligence?

The Top 5 Data Intelligence Benefits

Through a better understanding of what data an organization has available – including its lineage, associated metadata and access permissions – organization’s data-driven decisions are afforded more context and ultimately, a greater likelihood of successful implementation.

Considering this, the benefits of data intelligence are huge, and include:

1. Improved consumer profiling and segmentation

Customer profiling and segmentation enables businesses and marketers to better understand their target consumer and group them together according to common characteristics and behavior.

Businesses will be able to cluster and classify consumers according to demographics, purchasing behavior, experience with product and services, and so much more. Having a holistic view of the customers’ preferences, transactions, and purchasing behavior enables businesses to make better decisions regarding the products and services they provide. Great examples are BMW Mini, Comfort Keepers, and Teleflora.

2. A greater understanding of company investments

Data intelligence is able to provide business data with a greater context in regard to the progress and effectiveness of their investments. Businesses that partner with IT companies can develop data intelligence that is tailored to monitoring and evaluating their current investments, as well as forecast potential future investments.

If the current investments that a business has is not as effective, then data intelligence tools can provide guidance on the best avenues to invest in. Big IT companies even have off-the-shelf data analytics software ready to be configured by a company to their needs.

3. The ability to apply real-time data in marketing strategies

With real-time analytics, businesses are able to utilize information such as regional or local sales patterns, inventory level summaries, local event trends, sales history, or seasonal factors in reviving marketing models and strategies and directing them to better serve their customers.

Real-time data analytics can be used by businesses to better meet customer needs as it arises and improve customer satisfaction. Dickey’s BBQ Pit was able to utilize data analytics across all its stores and, using the resulting information, adjust their promotions strategy from weekly to around every 12 to 24 hours.

4. A greater opportunity to enhance logistical and operational planning

Data intelligence can also enable businesses to enhance their operational and logistical planning. Insights on things such as delivery times, optimal event dates, potential external factors, potential route roadblocks, and optimal warehousing locations can help optimize operations and logistics.

Data intelligence can take raw, untimely, and incomprehensible data and present it in an aggregated, condensed, digestible, and usable information. UPS employed the Orion route optimization system and was able to cut down 364 million miles from its routes globally.

5. An enhanced capacity to improve customer experience

To keep pace with technology, businesses have been employing more tools and methods that incorporate modern technology like, Machine Learning, and the Internet of Things(IoT) to enhance the consumer experience.

Information derived from tools like customer profiling analyses is able to provide insight into consumer purchasing behavior, which the business then uses to tailor their products and services to match the needs of their target consumers. Businesses are also able to use such information to provide customers with user-centric customer experience.

Transforming Industries with Data Intelligence

With big data, and tools such as Artificial Intelligence, Machine Learning, and Data Mining, organizations collect and analyze large amounts of data reliably and more efficiently. From Amazon to Airbnb, over the last decade, we’ve seen orgnaizations that take advantage of the aforementioned data intelligence benefits to manage large data volumes, rise to the pole position in their industry.

Now, in 2020, the benefits of data intelligence are enjoyed by organizations from a plethora of different markets and industries.

Data intelligence transforms the way industries operate by enabling businesses to hasten the process of analyzing and understanding the derived information with its more understandable models and aggregated trends.

Here’s how data intelligence is benefiting some of the most common industries:

Travel

The travel industry has found enhanced quality and range of products and services to provide travelers, as well as optimization of travel pricing strategies for future travel offerings.

Businesses in the travel industry can analyze historical trends on travel peak travel seasons and customer Key Performance Indicators (KPI) and can adjust services, amenities, and packages to match customer needs.

Education

Educators can provide a more valuable learning experience and environment for students. With the use of data intelligence tools, educational institutes can provide teachers with a more holistic view of a student’s academic performance.

Teachers can spot avenues for academic improvement, provide their students with support in aspects that need their help.

Healthcare

Several hospitals have also employed data intelligence tools in their services and operational processes. These hospitals are making use of dashboards that provide summary information on hospital patient trends, treatment costs, and waiting times.

Aside from these, these data intelligence tools also provide healthcare institutions with an encompassing view of the hospital and care critical data that hospitals can use to improve the quality and level of service and increase their economic efficiency.

Retail

The retail industry has also employed data intelligence in developing tools to better forecast and plan according to supply and demand trends and consumer Key Performance Indicators (KPI).

Businesses, both small and large, have made use of dashboards to monitor and illustrate transaction trends and product consumption rates. Tools such as these dashboards provide insight into customer purchasing patterns and transaction value that businesses such as Teleflora are leveraging to provide better products and services.

Data Intelligence Trends

With its rate of success evident among many of the most successful organizations in history, data intelligence is clearly no fad. Therefore, it’s important to keep an eye on both the current and upcoming data intelligence trends:

Real-time enterprise is the market.

Businesses, small and big, will be employing real-time data analytics and data-driven products and services as it will be what consumers will demand from businesses going forward.

Expanding big data.

Not moving from big data but instead expanding big data and incorporating more multifaceted data and data analytic methods and tools for more well-rounded insights and information.

Graph analytics and associative technology for better results.

This is where businesses and IT companies move forward with using natural associations within the data and use associative technology to derive better data for decision making.

DataOps and self-service.

DataOps will make business data processes more efficient and agile. This will make the business’s customer engagement and communication able to provide self-service interactions in their transactions and services.

Data literacy as a service.

Even more, businesses will be integrating data intelligence, hence the increasing demand for the skills and experienced dedicated development teams. Data literacy and data intelligence will further become an in-demand service.

Expanding search to multiform interaction.

Simple searches will be expanded to incorporate multifaceted search technology, from analyzing human expressions to transaction pattern analysis, and provide more robust search capabilities.

Ethical computing becomes crucial.

As technology becomes more ingrained in our day-to-day activities and consumes even more personal data, ethics and responsible computing will become essential in safeguarding consumer privacy and rights.

Incorporating blockchain technology into more industries.

Blockchain enables more secure and complex transaction record-keeping for businesses. More businesses employing data intelligence will be incorporating blockchain to support its processes.

Data quality management.

As exponential amounts of data will be consumed and processed, quality data governance and management will be essential. Overseeing the data collection and processing and implementing governance of these is important.

Enhanced data discovery and visualization.

With improved tools to process large volumes of data, numerous tools geared towards transforming this data into understandable and digestible information will be highly coveted.

As a Data-Driven Global Society, We Must Adapt

Data is what drives all of our actions, from individually trying to decide what to eat in the morning to entire global enterprises deciding what the next big global product will be. How we collect, process, and use the data for is what differs. Businesses will eventually move towards data-driven strategies and business models and with it the increased partnership with IT companies or hiring in-house dedicated development teams.

With a global market at hand, businesses can also employ a remote team and be assured that the same quality work will be provided. How businesses go about it may be diverse, but the direction is towards data-driven enterprises providing consumer-centric products and services.

This is a guest post from IT companies in Ukraine, a Ukraine-based software development company that provides top-level outsourcing services.

Tags digital transformation, data intelligence

Data Intelligence erwin Expert Blog

What is a Data Catalog?

The easiest way to understand a data catalog is to look at how libraries catalog books and manuals in a hierarchical structure, making it easy for anyone to find exactly what they need.

Similarly, a data catalog enables businesses to create a seamless way for employees to access and consume data and business assets in an organized manner.

By combining physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals, you can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

Another foundational purpose of a data catalog is to streamline, organize and process the thousands, if not millions, of an organization’s data assets to help consumers/users search for specific datasets and understand metadata, ownership, data lineage and usage.

Look at Amazon and how it handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, they also give detailed information about each product, the seller’s information, shipping times, reviews and a list of companion products. The company measure sales down to a zip-code territory level across product categories.

Data Catalog Use Case Example: Crisis Proof Your Business

One of the biggest lessons we’re learning from the global COVID-19 pandemic is the importance of data, specifically using a data catalog to comply, collaborate and innovate to crisis-proof our businesses.

As COVID-19 continues to spread, organizations are evaluating and adjusting their operations in terms of both risk management and business continuity. Data is critical to these decisions, such as how to ramp up and support remote employees, re-engineer processes, change entire business models, and adjust supply chains.

Think about the pandemic itself and the numerous global entities involved in identifying it, tracking its trajectory, and providing guidance to governments, healthcare systems and the general public. One example is the European Union (EU) Open Data Portal, which is used to document, catalog and govern EU data related to the pandemic. This information has helped:

Provide daily updates
Give guidance to governments, health professionals and the public
Support the development and approval of treatments and vaccines
Help with crisis coordination, including repatriation and humanitarian aid
Put border controls in place
Assist with supply chain control and consular coordination

So one of the biggest lessons we’re learning from COVID-19 is the need for data collection, management and governance. What’s the best way to organize data and ensure it is supported by business policies and well-defined, governed systems, data elements and performance measures?

According to Gartner, “organizations that offer a curated catalog of internal and external data to diverse users will realize twice the business value from their data and analytics investments than those that do not.”

5 Advantages of Using a Data Catalog for Crisis Preparedness & Business Continuity

The World Bank has been able to provide an array of real-time data, statistical indicators, and other types of data relevant to the coronavirus pandemic through its authoritative data catalogs. The World Bank data catalogs contain datasets, policies, critical data elements and measures useful for analysis and modeling the virus’ trajectory to help organizations measure the impact.

What can your organization learn from this example when it comes to crisis preparedness and business continuity? By developing and maintaining a data catalog as part of a larger data governance program supported by stakeholders across the organization, you can:

Catalog and Share Information Assets

Catalog critical systems and data elements, plus enable the calculation and evaluation of key performance measures. It’s also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

Clearly Document Data Policies and Rules

Managing a remote workforce creates new challenges and risks. Do employees have remote access to essential systems? Do they know what the company’s work-from-home policies are? Do employees understand how to handle sensitive data? Are they equipped to maintain data security and privacy? A data catalog with self-service access serves up the correct policies and procedures.

Reduce Operational Costs While Increasing Time to Value

Datasets need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance. Automating the curation of data assets also accelerates the time to value for analytics/insights reporting significantly reduce operational costs.

Make Data Accessible & Usable

Ensure Regulatory Compliance

Regulations like the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance is the last thing you need on top of everything else your organization is dealing with, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all of the above and more.

Join us for the next live demo of erwin DI.

Tags coronavirus, covid-19, data intelligence, regulatory compliance, data catalog, data governance

Data Intelligence erwin Expert Blog

erwin Recognized as a March 2020 Gartner Peer Insights Customers’ Choice for Metadata Management Solutions

Post author By Mariann McDonagh
Post date April 16, 2020
No Comments on erwin Recognized as a March 2020 Gartner Peer Insights Customers’ Choice for Metadata Management Solutions

We’re excited about our recognition as a March 2020 Gartner Peer Insights Customers’ Choice for Metadata Management Solutions. Our team here at erwin takes great pride in this distinction because customer feedback has always shaped our products and services.

The Gartner Peer Insights Customers’ Choice is a recognition of vendors in the metadata management solutions market by verified end-user professionals, taking into account both the number of reviews and the overall user ratings. To ensure fair evaluation, Gartner maintains rigorous criteria for recognizing vendors with a high customer satisfaction rate.

erwin’s metadata management offering, the erwin Data Intelligence Suite (erwin DI), is comprised of erwin Data Catalog (erwin DC) and erwin Data Literacy (erwin DL) with built-in automation for greater visibility, understanding and use of enterprise data.

The solutions work in tandem to automate the processes involved in harvesting, integrating, activating and governing enterprise data according to business requirements. This automation results in greater accuracy, faster analysis and better decision-making for data governance and digital transformation initiatives.

Metadata management is key to sustainable data governance and any other organizational effort that is data-driven. erwin DC automates enterprise metadata management, data mapping, data cataloging, code generation, data profiling and data lineage. erwin DL provides integrated business glossary management and self-service data discovery tools so both IT and business users can find data relevant to their roles and understand it within a business context.

Together as erwin DI, these solutions give organizations a complete and clear view of their metadata landscape, including semantic, business and technical elements.

Here are some excerpts from customers:

“The Best Solution for Data Governance” – Warehouse Manager, Manufacturing Industry
“Critical Application for Information Governance” -Information Scientist, Healthcare Industry
“World-class data governance beyond IT, any data, anywhere (Any²) approach is a key thing” – Test Engineer, Services Industry

Everyone at erwin is honored to be named as a March 2020 Customers’ Choice for Metadata Management Solutions. To learn more about this distinction, or to read the reviews written about our products by the IT professionals who use them, please visit Customers’ Choice.

And to all of our customers who submitted reviews, thank you! We appreciate you and look forward to building on the experience that led to this distinction!

Customer input will continue to guide our technology road map and the entire customer journey. In fact, it has influenced our entire corporate direction as we expanded our focus from data modeling to enterprise modeling and data governance/intelligence.

Data underpins every type of architecture – business, technology and data – so it only makes sense that both IT and the wider enterprise collaborate to ensure it’s accurate, in context and available to the right people for the right purposes.

If you have an erwin story to share, we encourage you to join the Gartner Peer Insights crowd and weigh in.

Request a complimentary copy of the Gartner Peer Insights ‘Voice of the Customer’: Metadata Management Solutions (March 2020) report.

The GARTNER PEER INSIGHTS CUSTOMERS’ CHOICE badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved. Gartner Peer Insights Customers’ Choice constitute the subjective opinions of individual end-user reviews, ratings, and data applied against a documented methodology; they neither represent the views of, nor constitute an endorsement by, Gartner or its affiliates.

Tags Peer insights, metadata management, gartner

Data Intelligence erwin Expert Blog

The Top 8 Benefits of Data Lineage

Post author By David Loshin
Post date August 1, 2019
No Comments on The Top 8 Benefits of Data Lineage

It’s important we recognize the benefits of data lineage.

As corporate data governance programs have matured, the inventory of agreed-to data policies has grown rapidly. These include guidelines for data quality assurance, regulatory compliance and data democratization, among other information utilization initiatives.

Organizations that are challenged by translating their defined data policies into implemented processes and procedures are starting to identify tools and technologies that can supplement the ways organizational data policies can be implemented and practiced.

One such technique, data lineage, is gaining prominence as a core operational business component of the data governance technology architecture. Data lineage encompasses processes and technology to provide full-spectrum visibility into the ways that data flow across the enterprise.

To data-driven businesses, the benefits of data lineage are significant. Data lineage tools are used to survey, document and enable data stewards to query and visualize the end-to-end flow of information units from their origination points through the series of transformation and processing stages to their final destination.

The Benefits of Data Lineage

Data stewards are attracted to data lineage because the benefits of data lineage help in a number of different governance practices, including:

1. Operational intelligence

At its core, data lineage captures the mappings of the rapidly growing number of data pipelines in the organization. Visualizing the information flow landscape provides insight into the “demographics” of data consumption and use, answering questions such as “what data sources feed the greatest number of downstream sources” or “which data analysts use data that is ingested from a specific data source.” Collecting this intelligence about the data landscape better positions the data stewards for enforcing governance policies.

2. Business terminology consistency

One of the most confounding data governance challenges is understanding the semantics of business terminology within data management contexts. Because application development was traditionally isolated within each business function, the same (or similar) terms are used in different data models, even though the designers did not take the time to align definitions and meanings. Data lineage allows the data stewards to find common business terms, review their definitions, and determine where there are inconsistencies in the ways the terms are used.

3. Data incident root cause analysis

It has long been asserted that when a data consumer finds a data error, the error most likely was introduced into the environment at an earlier stage of processing. Yet without a “roadmap” that indicates the processing stages through which the data were processed, it is difficult to speculate where the error was actually introduced. Using data lineage, though, a data steward can insert validation probes within the information flow to validate data values and determine the stage in the data pipeline where an error originated.

4. Data quality remediation assessment

Root cause analysis is just the first part of the data quality process. Once the data steward has determined where the data flaw was introduced, the next step is to determine why the error occurred. Again, using a data lineage mapping, the steward can trace backward through the information flow to examine the standardizations and transformations applied to the data, validate that transformations were correctly performed, or identify one (or more) performed incorrectly, resulting in the data flaw.

5. Impact analysis

The enterprise is always subject to changes; externally-imposed requirements (such as regulatory compliance) evolve, internal business directives may affect user expectations, and ingested data source models may change unexpectedly. When there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape. In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

6. Performance assessment

Not only does lineage provide a collection of mappings of data pipelines, it allows for the identification of potential performance bottlenecks. Data pipeline stages with many incoming paths are candidate bottlenecks. Using a set of data lineage mappings, the performance analyst can profile execution times across different pipelines and redistribute processing to eliminate bottlenecks.

7. Policy compliance

Data policies can be implemented through the specification of business rules. Compliance with these business rules can be facilitated using data lineage by embedding business rule validation controls across the data pipelines. These controls can generate alerts when there are noncompliant data instances.

8. Auditability of data pipelines

In many cases, regulatory compliance is a combination of enforcing a set of defined data policies along with a capability for demonstrating that the overall process is compliant. Data lineage provides visibility into the data pipelines and information flows that can be audited thereby supporting the compliance process.

Evaluating Enterprise Data Lineage Tools

While data lineage benefits are obvious, large organizations with complex data pipelines and data flows do face challenges in embracing the technology to document the enterprise data pipelines. These include:

Surveying the enterprise – Gathering information about the sources, flows and configurations of data pipelines.
Maintenance – Configuring a means to maintain an up-to-date view of the data pipelines.
Deliverability – Providing a way to give data consumers visibility to the lineage maps.
Sustainability – Ensuring sustainability of the processes for producing data lineage mappings.

Producing a collection of up-to-date data lineage mappings that are easily reviewed by different data consumers depends on addressing these challenges. When considering data lineage tools, keep these issues in mind when evaluating how well the tools can meet your data governance needs.

erwin Data Intelligence (erwin DI) helps organizations automate their data lineage initiatives. Learn more about data lineage with erwin DI.

Tags data lineage gdpr, operational intelligence, data quality assurance, benefits of data lineage, data lineage benefits, data democratization, data steward, impact analysis, regulatory compliance, business glossary, data lineage, data quality, data governance

Data Governance Data Intelligence erwin Expert Blog

Demystifying Data Lineage: Tracking Your Data’s DNA

Post author By Danny Sandwell
Post date November 1, 2018
No Comments on Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.