Categories
Data Intelligence Data Governance erwin Expert Blog

Data Governance Definition, Best Practices and Benefits

Any organziation with a data-driven strategy should understand the definition of data governance. In fact, in light of increasingly stringent data regulations, any organzation that uses or even stores data, should understand the definition of data governance.

Organizations with a solid understanding of data governance (DG) are better equipped to keep pace with the speed of modern business.

In this post, the erwin Experts address:

 

 

Data Governance Definition

Data governance’s definition is broad as it describes a process, rather than a predetermined method. So an understanding of the process and the best practices associated with it are key to a successful data governance strategy.

Data governance is best defined as the strategic, ongoing and collaborative processes involved in managing data’s access, availability, usability, quality and security in line with established internal policies and relevant data regulations.

It’s often said that when we work together, we can achieve things greater than the sum of our parts. Collective, societal efforts have seen mankind move metaphorical mountains and land on the literal moon.

Such feats were made possible through effective government – or governance.

The same applies to data. A single unit of data in isolation can’t do much, but the sum of an organization’s data can prove invaluable.

Put simply, DG is about maximizing the potential of an organization’s data and minimizing the risk. In today’s data-driven climate, this dynamic is more important than ever.

That’s because data’s value depends on the context in which it exists: too much unstructured or poor-quality data and meaning is lost in a fog; too little insight into data’s lineage, where it is stored, or who has access and the organization becomes an easy target for cybercriminals and/or non-compliance penalties.

So DG is quite simply, about how an organization uses its data. That includes how it creates or collects data, as well as how its data is stored and accessed. It ensures that the right data of the right quality, regardless of where it is stored or what format it is stored in, is available for use – but only by the right people and for the right purpose.

With well governed data, organizations can get more out of their data by making it easier to manage, interpret and use.

Why Is Data Governance Important?

Although governing data is not a new practice, using it as a strategic program is and so are the expectations as to who is responsible for it.

Historically, governing data has been IT’s business because it primarily involved cataloging data to support search and discovery.

But now, governing data is everyone’s business. Both the data “keepers” in IT and the data users everywhere else within the organization have a role to play.

That makes sense, too. The sheer volume and importance of data the average organization now processes are too great to be effectively governed by a siloed IT department.

Think about it. If all the data you access as an employee of your organization had to be vetted by IT first, could you get anything done?

While the exponential increase in the volume and variety of data has provided unparalleled insights for some businesses, only those with the means to deal with the velocity of data have reaped the rewards.

By velocity, we mean the speed at which data can be processed and made useful. More on “The Three Vs of Data” here.

Data giants like Amazon, Netflix and Uber have reshaped whole industries, turning smart, proactive data governance into actionable and profitable insights.

And then, of course, there’s the regulatory side of things. The European Union’s General Data Protection Regulation (GDPR) mandates organization’s govern their data.

Poor data governance doesn’t just lead to breaches – although of course it does – but compliance audits also need an effective data governance initiative in order to pass.

Since non-compliance can be costly, good data governance not only helps organizations make money, it helps them save it too. And organizations are recognizing this fact.

In the lead up to GDPR, studies found that the biggest driver for initiatives for governing data was regulatory compliance. However, since GDPR’s implementation better decision-making and analytics are their top drivers for investing in data governance.

Other areas in where well governed data plays an important role include digital transformation, data standards and uniformity, self-service and customer trust and satisfaction.

For the full list of drivers and deeper insight into the state of data governance, get the free 2020 State of DGA report here.

What Is Good Data Governance?

We’re constantly creating new data whether we’re aware of it or not. Every new sale, every new inquiry, every website interaction, every swipe on social media generates data.

This means the work of governing data is ongoing, and organizations without it can become overwhelmed quickly.

Therefore good data governance is proactive not reactive.

In addition, good data governance requires organizations to encourage a culture that stresses the importance of data with effective policies for its use.

An organization must know who should have access to what, both internally and externally, before any technical solutions can effectively compartmentalize the data.

So good data governance requires both technical solutions and policies to ensure organizations stay in control of their data.

But culture isn’t built on policies alone. An often-overlooked element of good data governance is arguably philosophical. Effectively communicating the benefits of well governed data to employees – like improving the discoverability of data – is just as important as any policy or technology.

And it shouldn’t be difficult. In fact, it should make data-oriented employees’ jobs easier, not harder.

What Are the Key Benefits of Data Governance?

Organizations with a effectively governed data enjoy:

  • Better alignment with data regulations: Get a more holistic understanding of your data and any associated risks, plus improve data privacy and security through better data cataloging.
  • A greater ability to respond to compliance audits: Take the pain out of preparing reports and respond more quickly to audits with better documentation of data lineage.
  • Increased operational efficiency: Identify and eliminate redundancies and streamline operations.
  • Increased revenue: Uncover opportunities to both reduce expenses and discover/access new revenue streams.
  • More accurate analytics and improved decision-making: Be more confident in the quality of your data and the decisions you make based on it.
  • Improved employee data literacy: Consistent data standards help ensure employees are more data literate, and they reduce the risk of semantic misinterpretations of data.
  • Better customer satisfaction/trust and reputation management: Use data to provide a consistent, efficient and personalized customer experience, while avoiding the pitfalls and scandals of breaches and non-compliance.

For a more in-depth assessment of data governance benefits, check out The Top 6 Benefits of Data Governance.

The Best Data Governance Solution

Data has always been important to erwin; we’ve been a trusted data modeling brand for more than 30 years. But we’ve expanded our product portfolio to reflect customer needs and give them an edge, literally.

The erwin EDGE platform delivers an “enterprise data governance experience.” And at the heart of the erwin EDGE is the erwin Data Intelligence Suite (erwin DI).

erwin DI provides all the tools you need for the effective governance of your data. These include data catalog, data literacy and a host of built-in automation capabilities that take the pain out of data preparation.

With erwin DI, you can automatically harvest, transform and feed metadata from a wide array of data sources, operational processes, business applications and data models into a central data catalog and then make it accessible and understandable via role-based, contextual views.

With the broadest set of metadata connectors, erwin DI combines data management and DG processes to fuel an automated, real-time, high-quality data pipeline.

See for yourself why erwin DI is a DBTA 2020 Readers’ Choice Award winner for best data governance solution with your very own, very free demo of erwin DI.

data governance preparedness

Categories
Data Intelligence erwin Expert Blog

What Is Data Literacy?

Today, data literacy is more important than ever.

Data is now being used to support business decisions few executives thought they’d be making even six months ago.

With your employees connected and armed with data that paints a clear picture of the business, your organization is better prepared to turn its attention to whatever your strategic priority may be – i.e. digital transformation, customer experience, or withstanding this current (or future) crisis.

So, what is data literacy?

Data Literacy

Data Literacy Definition

Gartner defines data literacy as the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied — and the ability to describe the use case, application and resulting value.

Organizations use data literacy tools to improve data literacy across the organization. A good data literacy tool will include functionality such as business glossary management and self-service data discovery. The end result is an organization that’s more data fluent and efficient in how they store, discover and use their data.

What Is Data Literacy For?

For years, we’ve been saying that “we’re all data people.” When all stakeholders in an organization can effectively “speak data” they can:

  • Better understand and identify the data they require
  • Be more self-sufficient in accessing and preparing the data
  • Better articulate the gaps that exist in the data landscape
  • Share their knowledge and experience with data with other consumers to contribute to the greater good
  • Collaborate more effectively with their partners in data (management and governance) for greater efficiency and higher quality outcomes

Why is Data Literacy Important?

Without good data, it’s difficult to make good decisions.

Data access, literacy and knowledge leads to sound decision-making and that’s key to data governance and any other data-driven effort.

Data literacy enables collaboration and innovation. To determine if your organization is data literate you need to ask two questions:  

  1. Can your employees use data to effectively communicate with each other?
  2. Can you develop and circulate ideas that will help the business move forward?

data literacy and data intelligence

The Data Literacy and Data Intelligence Connection

Businesses that invest in data intelligence and data literacy are better positioned to weather any storm and chart a path forward because they have accurate, trusted data at their disposal.

erwin helps customers turn their data from a burden into a benefit by fueling an accurate, real-time, high-quality data pipeline they can mine for insights that lead to smart decisions for operational excellence.

erwin Data Intelligence (erwin DI) combines data catalog and data literacy capabilities for greater awareness of and access to available data assets, guidance on their use, and guardrails to ensure data policies and best practices are followed.

erwin Data Literacy (DL) is founded on enriched business glossaries and socializing data so all stakeholders can view and understand it within the context of their roles.

It allows both IT and business users to discover the data available to them and understand what it means in common, standardized terms, and automates common data curation processes, such as name matching, categorization and association, to optimize governance of the data pipeline including preparation processes.

erwin DL provides self-service, role-based, contextual data views. It also provides a business glossary for the collaborative definition of enterprise data in business terms.

It also includes built-in accountability and workflows to enable data consumers to define and discover data relevant to their roles, facilitate the understanding and use of data within a business context, and ensure the organization is data literate.

With erwin DL, your organization can build glossaries of terms in taxonomies with descriptions, synonyms, acronyms and their associations to data policies, rules and other critical governance artifacts. Other advantages are:

  • Data Visibility & Governance: Visualize and navigate any data from anywhere within a business-centric data asset framework that provides organizational alignment and robust, sustainable data governance.
  • Data Context & Enrichment: Put data in business context and enable stakeholders to share best practices and build communities by tagging/commenting on data assets, enriching the metadata.
  • Enterprise Collaboration & Empowerment: Break down IT and business silos to provide broad access to approved organizational information.
  • Greater Productivity: Reduce the time it takes to find data assets and therefore reliance on technical resources, plus streamline workflows for faster analysis and decision-making.
  • Accountability & Regulatory Peace of Mind: Create an integrated ecosystem of people, processes and technology to manage and protect data, mitigating a wide range of data-related risks and improving compliance.
  • Effective Change Management: Better manage change with the ability to identify data linkages, implications and impacts across the enterprise.
  • Data Literacy, Fluency & Knowledge: Enhance stakeholder discovery and understanding of and trust in data assets to underpin analysis leading to actionable insights.

Learn more about the importance of data literacy by requesting a free demo of erwin Data Intelligence.

erwin Data Intelligence

 

Categories
Data Intelligence erwin Expert Blog

Four Steps to Building a Data-Driven Culture

data-driven culture

Fostering organizational support for a data-driven culture might require a change in the organization’s culture. But how?

Recently, I co-hosted a webinar with our client E.ON, a global energy company that reinvented how it conducts business from branding to customer engagement – with data as the conduit.

There’s no doubt E.ON, based in Essen, Germany, has established one of the most comprehensive and successful data governance programs in modern business.

For E.ON, data governance is not just about data management but also about using information to increase efficiencies. The company needed to help its data scientists and engineers improve their knowledge of the data, find the best data for use at the best time, and put the data in the most appropriate business context.

As an example, E.ON was able to improve data quality, detect redundancies, and create a needs-based, data-use environment by applying a common set of business terms across the enterprise.

Avoiding Hurdles

Businesses have not been able to get as much mileage out of their data governance efforts as hoped, chiefly because of how it’s been handled. And data governance initiatives sometimes fail because organizations tend to treat them as siloed IT programs rather than multi-stakeholder imperatives.

Even when business groups recognize the value of a data governance program and the potential benefits to be derived from it, the IT group traditionally has owned the effort and paid for it.

Despite enterprise-wide awareness of the importance of data governance, a troublingly large number of organizations continue to stumble because of a lack of executive support.

IT and the business will need to take responsibility for selling the benefits of data governance across the enterprise and ensure all stakeholders are properly educated about it.

IT may have to go it alone, at least initially, educating the business on the risks and rewards of data governance and the expectations and accountabilities in implementing it. The business needs to have a role in the justification.

Being a Change Agent

Becoming a data-driven enterprise means making decisions based on facts. It requires a clear vision, strategy and disciplined execution. It also must be well thought out, understood and communicated to others – from the C-suite on down.

For E.ON, the board supported and drove a lot of the thinking that data has to be at the center of everything to reimagine the company. But the data team still needed to convince the head of every one of the company’s hundreds of legal entities to support the digital transformation journey. As a result, the team went on a mission to spread the message.

“The biggest challenge was change management — convincing people to be part of the journey. It is very often underestimated,” said Romina Medici, E.ON’s Program Manager for Data Management and Governance. “Technology is logical, so you can always understand it. Culture is more complex and more diverse.”

She said that ultimately the “communication (across the organization) was bottom up and top down.”

Four Steps to Building a Data-Driven Culture

1. Accelerate Time to Value: Data governance isn’t a one-off project with a defined endpoint. It’s an on-going initiative that requires active engagement from executives and business leaders. The ability to make faster decisions based on data is one way to make the organization pay attention.

2. Ensure Company-Wide Compliance: Compliance isn’t just about government regulations. In today’s business environment, we’re all data people. Everyone in the organization needs to commit to data compliance to ensure high-quality data.

3. Demand Trusted Insights Based on Data Truths: To make smart decisions, you can’t have multiple sets of numbers. Everyone needs to be in lockstep, using and basing decisions on the same data.

4. Foster Data-Driven Collaboration: We call this “social data governance,” meaning you foster collaboration across the business, all the time. 

A data-driven approach has never been more valuable to addressing the complex yet foundational questions enterprises must answer. Organizations that have their data management, data governance and data intelligence houses in order are much better positioned to respond to challenges and thrive moving forward.

As demonstrated by E.ON, data-driven cultures start at the top – but need to proliferate up and down, even sideways.

Business transformation has to be based on accurate data assets within the right context, so organizations have a reliable source of truth on which to base their decisions.

erwin provides a with the data catalog, lineage, glossary and visualization capabilities needed to evaluate the business in its current state and then evolve it to serve new objectives.

Request a demo of the erwin Data Intelligence Suite.

Data Intelligence Solution: Data Catalog, Data Literacy and Automation Tools

Categories
Data Intelligence Data Governance erwin Expert Blog

Do I Need a Data Catalog?

If you’re serious about a data-driven strategy, you’re going to need a data catalog.

Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner.

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer.

It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., legacy systems, data warehouses, flat files stored on individual desktops and laptops, and modern, cloud-based repositories.)

These fragmented data environments make data governance a challenge since business stakeholders, data analysts and other users are unable to discover data or run queries across an entire data set. This also diminishes the value of data as an asset.

Data catalogs combine physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals in certain circumstances.

You also can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

The data catalog is a searchable asset that enables all data – including even formerly siloed tribal knowledge – to be cataloged and more quickly exposed to users for analysis.

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process.

For example, before users can effectively and meaningfully engage with robust business intelligence (BI) platforms, they must have a way to ensure that the most relevant, important and valuable data set are included in analysis.

The most optimal and streamlined way to achieve this is by using a data catalog, which can provide a first stop for users ahead of working in BI platforms.

As a collective intelligent asset, a data catalog should include capabilities for collecting and continually enriching or curating the metadata associated with each data asset to make them easier to identify, evaluate and use properly.

Data Catalog Benefits

Three Types of Metadata in a Data Catalog

A data catalog uses metadata, data that describes or summarizes data, to create an informative and searchable inventory of all data assets in an organization.

These assets can include but are not limited to structured data, unstructured data (including documents, web pages, email, social media content, mobile data, images, audio, video and reports) and query results, etc. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

For example, Amazon handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, the company also provides detailed information about each product, the seller’s information, shipping times, reviews, and a list of companion products. Sales are measured down to a zip code territory level across product categories.

Another classic example is the online or card catalog at a library. Each card or listing contains information about a book or publication (e.g., title, author, subject, publication date, edition, location) that makes the publication easier for a reader to find and to evaluate.

There are many types of metadata, but a data catalog deals primarily with three: technical metadata, operational or “process” metadata, and business metadata.

Technical Metadata

Technical metadata describes how the data is organized, stored, its transformation and lineage. It is structural and describes data objects such as tables, columns, rows, indexes and connections.

This aspect of the metadata guides data experts on how to work with the data (e.g. for analysis and integration purposes).

Operational Metadata

Operational metadata describes systems that process data, the applications in those systems, and the rules in those applications. This is also called “process” metadata that describes the data asset’s creation, when, how and by whom it has been accessed, used, updated or changed.

Operational metadata provides information about the asset’s history and lineage, which can help an analyst decide if the asset is recent enough for the task at hand, if it comes from a reliable source, if it has been updated by trustworthy individuals, and so on.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Business Metadata

Business metadata is sometimes referred to as external metadata attributed to the business aspects of a data asset. It defines the functionality of the data captured, definition of the data, definition of the elements, and definition of how the data is used within the business.

This is the area which binds all users together in terms of consistency and usage of catalogued data asset.

Tools should be provided that enable data experts to explore the data catalogs, curate and enrich the metadata with tags, associations, ratings, annotations, and any other information and context that helps users find data faster and use it with confidence.

Why You Need a Data Catalog – Three Business Benefits of Data Catalogs

When data professionals can help themselves to the data they need—without IT intervention and having to rely on finding experts or colleagues for advice, limiting themselves to only the assets they know about, and having to worry about governance and compliance—the entire organization benefits.

Catalog critical systems and data elements plus enable the calculation and evaluation of key performance measures. It is also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Makes data accessible and usable, reducing operational costs while increasing time to value

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

Data asset need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance.

Automating the curation of data assets also accelerates the time to value for analytics/insights reporting and significantly reduces operational costs.

  1. Ensures regulatory compliance

Regulations like the California Consumer Privacy Act (CCPA ) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance or reputational damage are the last things you need to worry about, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all the above and much more.

Request your own demo of erwin DI.

Data Intelligence for Data Automation

Categories
Data Intelligence Data Governance erwin Expert Blog

Overcoming the 80/20 Rule – Finding More Time with Data Intelligence

The 80/20 rule is well known. It describes an unfortunate reality for many data stewards, who spend 80 percent of their time finding, cleaning and reorganizing huge amounts of data, and only 20 percent of their time on actual data analysis.

That’s a lot wasted of time.

Earlier this year, erwin released its 2020 State of Data Governance and Automation (DGA) report. About 70 percent of the DGA report respondents – a combination of roles from data architects to executive managers – say they spend an average of 10 or more hours per week on data-related activities.

COVID-19 has changed the way we work – essentially overnight – and may change how companies work moving forward. Companies like Twitter, Shopify and Box have announced that they are moving to a permanent work-from-home status as their new normal.

For much of our time as data stewards, collecting, revising and building consensus around our metadata has meant that we need to balance find time on multiple calendars against multiple competing priorities so that we can pull the appropriate data stakeholders into a room to discuss term definitions, the rules for measuring “clean” data, and identifying processes and applications that use the data.

Overcoming the 80/20 Rule - Analyzing Data

This style of data governance most often presents us with eight one-hour opportunities per day (40 one-hour opportunities per week) to meet.

As the 80/20 rule suggests, getting through hundreds, or perhaps thousands of individual business terms using this one-hour meeting model can take … a … long … time.

Now that pulling stakeholders into a room has been disrupted …  what if we could use this as 40 opportunities to update the metadata PER DAY?

What if we could buck the trend, and overcome the 80/20 rule?

Overcoming the 80/20 Rule with Micro Governance for Metadata

Micro governance is a strategy that leverages the native functionality around workflows.

erwin Data Intelligence (DI) offers Workflow Manager that creates a persistent, reusable role-based workflow such that edits to the metadata for any term can move from, for example, draft to under review to approved to published.

Using a defined workflow, it can eliminate the need for hour-long meetings with multiple stakeholders in a room. Now users can suggest edits, review changes, and approve changes on their own schedule! Using micro governance these steps should take less than 10 minutes per term:

  • Log on the DI Suite
  • Open your work queue to see items requiring your attention
  • Review and/or approve changes
  • Log out

That’s it!

And as a bonus, where stakeholders may need to discuss the edits to achieve consensus, the Collaboration Center within the Business Glossary Manager facilitates conversations between stakeholders that persistent and attached directly to the business term. No more searching through months of email conversations or forgetting to cc a key stakeholder.

Using the DI Suite Workflow Manager and the Collaboration Center, and assuming an 8-hour workday, we should each have 48 opportunities for 10 minutes of micro-governance stewardship each day.

A Culture of Micro Governance

In these days when we are all working at home, and face-to-face meetings are all but impossible, we should see this time as an opportunity to develop a culture of micro governance around our metadata.

This new way of thinking and acting will help us continuously improve our transparency and semantic understanding of our data while staying connected and collaborating with each other.

When we finally get back into the office, the micro governance ethos we’ve built while at home will help make our data governance programs more flexible, responsive and agile. And ultimately, we’ll take up less of our colleagues’ precious time.

Request a free demo of erwin DI.

Data Intelligence for Data Automation

Categories
Data Modeling erwin Expert Blog

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. After all, you can’t manage or govern what you can’t see, much less use it to make smart decisions.

Metadata management is the key to managing and governing your data and drawing intelligence from it. Beyond harvesting and cataloging metadata, it also must be visualized to break down the complexity of how data is organized and what data relationships there are so that meaning is explicit to all stakeholders in the data value chain.

Data Governance and Automation

Data models provide this visualization capability, create additional metadata and standardize the data design across the enterprise.

While modeling has always been the best way to understand complex data sources and automate design standards, modern data modeling goes well beyond these domains to ensure and accelerate the overall success of data governance in any organization.

You can’t overestimate the importance of success as data governance keeps the business in line with privacy mandates such as the General Data Protection Regulation (GDPR). It drives innovation too. Companies who want to advance AI initiatives, for instance, won’t get very far without quality data and well-defined data models.

Why Is Data Modeling the Building Block of Enterprise Data Management?

DM mitigates complexity and increases collaboration and literacy across a broad range of data stakeholders.

  • DM uncovers the connections between disparate data elements.

The DM process enables the creation and integration of business and semantic metadata to augment and accelerate data governance and intelligence efforts.

  • DM captures and shares how the business describes and uses data.

DM delivers design task automation and enforcement to ensure data integrity.

  • DM builds higher quality data sources with the appropriate structural veracity.

DM delivers design task standardization to improve business alignment and simplify integration.

  • DM builds a more agile and governable data architecture.

The DM process manages the design and maintenance lifecycle for data sources.

  • DM governs the design and deployment of data across the enterprise.

DM documents, standardizes and aligns any type of data no matter where it lives. 

Realizing the Data Governance Value from Data Modeling

Modeling becomes the point of true collaboration within an organization because it delivers a visual source of truth for everyone to follow – data management and business professionals – to conform to governance requirements.

Information is readily available within intuitive business glossaries, accessible to user roles according to parameters set by the business. The metadata repository behind these glossaries, populated by information stored in data models, serves up the key terms that are understandable and meaningful to every party in the enterprise.

The stage, then, is equally set for improved data intelligence, because stakeholders now can use, understand and trust relevant data to enhance decision-making across the enterprise.

The enterprise is coming to the point where both business and IT co-own data modeling processes and data models. Business analysts and other power users start to understand data complexities because they can grasp terms and contribute to making the data in their organization accurate and complete, and modeling grows in importance in the eyes of business users.

Bringing data to the business and making it easy to access and understand increases the value of  data assets, providing a return on investment and a return on opportunity. But neither would be possible without data modeling providing the backbone for metadata management and proper data governance.

For more information, check out our whitepaper, Drive Business Value and Underpin Data Governance with an Enterprise Data Model.

You also can take erwin DM, the world’s No. 1 data modeling software, for a free spin.

erwin Data Modeler Free Trial - Data Modeling

Categories
Data Governance erwin Expert Blog

What is Data Lineage? Top 5 Benefits of Data Lineage

What is Data Lineage and Why is it Important?

Data lineage is the journey data takes from its creation through its transformations over time. It describes a certain dataset’s origin, movement, characteristics and quality.

Tracing the source of data is an arduous task.

Many large organizations, in their desire to modernize with technology, have acquired several different systems with various data entry points and transformation rules for data as it moves into and across the organization.

data lineage

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (API)s, file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

With all these diverse data sources, and if systems are integrated, it is difficult to understand the complicated data web they form much less get a simple visual flow. This is why data’s lineage must be tracked and why its role is so vital to business operations, providing the ability to understand where data originates, how it is transformed, and how it moves into, across and outside a given organization.

Data Lineage Use Case: From Tracing COVID-19’s Origins to Data-Driven Business

A lot of theories have emerged about the origin of the coronavirus. A recent University of California San Francisco (UCSF) study conducted a genetic analysis of COVID-19 to determine how the virus was introduced specifically to California’s Bay Area.

It detected at least eight different viral lineages in 29 patients in February and early March, suggesting no regional patient zero but rather multiple independent introductions of the pathogen. The professor who directed the study said, “it’s like sparks entering California from various sources, causing multiple wildfires.”

Much like understanding viral lineage is key to stopping this and other potential pandemics, understanding the origin of data, is key to a successful data-driven business.

Top Five Data Lineage Benefits

From my perspective in working with customers of various sizes across multiple industries, I’d like to highlight five data lineage benefits:

1. Business Impact

Data is crucial to every organization’s survival. For that reason, businesses must think about the flow of data across multiple systems that fuel organizational decision-making.

For example, the marketing department uses demographics and customer behavior to forecast sales. The CEO also makes decisions based on performance and growth statistics. An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including:

  • How the report tables and columns are defined in the metadata?
  • Who are the data owners?
  • What are the transformation rules?

Without data lineage, these functions are irrelevant, so it makes sense for a business to have a clear understanding of where data comes from, who uses it, and how it transforms. Also, when there is a change to the environment, it is valuable to assess the impacts to the enterprise application landscape.

In the event of a change in data expectations, data lineage provides a way to determine which downstream applications and processes are affected by the change and helps in planning for application updates.

2. Compliance & Auditability

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Regulatory compliance places greater transparency demands on firms when it comes to tracing and auditing data. For example, capital markets trading firms must understand their data’s origins and history to support risk management, data governance and reporting for various regulations such as BCBS 239 and MiFID II.

Also, different organizational stakeholders (customers, employees and auditors) need to be able to understand and trust reported data. Data lineage offers proof that the data provided is reflected accurately.

3. Data Governance

An automated data lineage solution stitches together metadata for understanding and validating data usage, as well as mitigating the associated risks.

It can auto-document end-to-end upstream and downstream data lineage, revealing any changes that have been made, by whom and when.

This data ownership, accountability and traceability is foundational to a sound data governance program.

See: The Benefits of Data Governance

4. Collaboration

Analytics and reporting are data-dependent, making collaboration among different business groups and/or departments crucial.

The visualization of data lineage can help business users spot the inherent connections of data flows and thus provide greater transparency and auditability.

Seeing data pipelines and information flows further supports compliance efforts.

5. Data Quality

Data quality is affected by data’s movement, transformation, interpretation and selection through people, process and technology.

Root-cause analysis is the first step in repairing data quality. Once a data steward determines where a data flaw was introduced, the reason for the error can be determined.

With data lineage and mapping, the data steward can trace the information flow backward to examine the standardizations and transformations applied to confirm whether they were performed correctly.

See Data Lineage in Action

Data lineage tools document the flow of data into and out of an organization’s systems. They capture end-to-end lineage and ensure proper impact analysis can be performed in the event of problems or changes to data assets as they move across pipelines.

The erwin Data Intelligence Suite (erwin DI) automatically generates end-to-end data lineage, down to the column level and between repositories. You can view data flows from source systems to the reporting layers, including intermediate transformation and business logic.

Join us for the next live demo of erwin Data Intelligence (DI) to see metadata-driven, automated data lineage in action.

erwin data intelligence

Subscribe to the erwin Expert Blog

Once you submit the trial request form, an erwin representative will be in touch to verify your request and help you start data modeling.

Categories
erwin Expert Blog Data Intelligence

What is a Data Catalog?

The easiest way to understand a data catalog is to look at how libraries catalog books and manuals in a hierarchical structure, making it easy for anyone to find exactly what they need.

Similarly, a data catalog enables businesses to create a seamless way for employees to access and consume data and business assets in an organized manner.

By combining physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals, you can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Another foundational purpose of a data catalog is to streamline, organize and process the thousands, if not millions, of an organization’s data assets to help consumers/users search for specific datasets and understand metadata, ownership, data lineage and usage.

Look at Amazon and how it handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, they also give detailed information about each product, the seller’s information, shipping times, reviews and a list of companion products. The company measure sales down to a zip-code territory level across product categories.

Data Catalog Use Case Example: Crisis Proof Your Business

One of the biggest lessons we’re learning from the global COVID-19 pandemic is the importance of data, specifically using a data catalog to comply, collaborate and innovate to crisis-proof our businesses.

As COVID-19 continues to spread, organizations are evaluating and adjusting their operations in terms of both risk management and business continuity. Data is critical to these decisions, such as how to ramp up and support remote employees, re-engineer processes, change entire business models, and adjust supply chains.

Think about the pandemic itself and the numerous global entities involved in identifying it, tracking its trajectory, and providing guidance to governments, healthcare systems and the general public. One example is the European Union (EU) Open Data Portal, which is used to document, catalog and govern EU data related to the pandemic. This information has helped:

  • Provide daily updates
  • Give guidance to governments, health professionals and the public
  • Support the development and approval of treatments and vaccines
  • Help with crisis coordination, including repatriation and humanitarian aid
  • Put border controls in place
  • Assist with supply chain control and consular coordination

So one of the biggest lessons we’re learning from COVID-19 is the need for data collection, management and governance. What’s the best way to organize data and ensure it is supported by business policies and well-defined, governed systems, data elements and performance measures?

According to Gartner, “organizations that offer a curated catalog of internal and external data to diverse users will realize twice the business value from their data and analytics investments than those that do not.”

Data Catalog Benefits

5 Advantages of Using a Data Catalog for Crisis Preparedness & Business Continuity

The World Bank has been able to provide an array of real-time data, statistical indicators, and other types of data relevant to the coronavirus pandemic through its authoritative data catalogs. The World Bank data catalogs contain datasets, policies, critical data elements and measures useful for analysis and modeling the virus’ trajectory to help organizations measure the impact.

What can your organization learn from this example when it comes to crisis preparedness and business continuity? By developing and maintaining a data catalog as part of a larger data governance program supported by stakeholders across the organization, you can:

  1. Catalog and Share Information Assets

Catalog critical systems and data elements, plus enable the calculation and evaluation of key performance measures. It’s also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Clearly Document Data Policies and Rules

Managing a remote workforce creates new challenges and risks. Do employees have remote access to essential systems? Do they know what the company’s work-from-home policies are? Do employees understand how to handle sensitive data? Are they equipped to maintain data security and privacy? A data catalog with self-service access serves up the correct policies and procedures.

  1. Reduce Operational Costs While Increasing Time to Value

Datasets need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance. Automating the curation of data assets also accelerates the time to value for analytics/insights reporting significantly reduce operational costs.

  1. Make Data Accessible & Usable

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

  1. Ensure Regulatory Compliance

Regulations like the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance is the last thing you need on top of everything else your organization is dealing with, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all of the above and more.

Join us for the next live demo of erwin DI.

Data Intelligence for Data Automation

Categories
erwin Expert Blog

Data Governance for Smart Data Distancing

Hello from my home office! I hope you and your family are staying safe, practicing social distancing, and of course, washing your hands.

These are indeed strange days. During this coronavirus emergency, we are all being deluged by data from politicians, government agencies, news outlets, social media and websites, including valid facts but also opinions and rumors.

Happily for us data geeks, the general public is being told how important our efforts and those of data scientists are to analyzing, mapping and ultimately shutting down this pandemic.

Yay, data geeks!

Unfortunately though, not all of the incoming information is of equal value, ethically sourced, rigorously prepared or even good.

As we work to protect the health and safety of those around us, we need to understand the nuances of meaning for the received information as well as the motivations of information sources to make good decisions.

On a very personal level, separating the good information from the bad becomes a matter of life and potential death. On a business level, decisions based on bad external data may have the potential to cause business failures.

In business, data is the food that feeds the body or enterprise. Better data makes the body stronger and provides a foundation for the use of analytics and data science tools to reduce errors in decision-making. Ultimately, it gives our businesses the strength to deliver better products and services to our customers.

How then, as a business, can we ensure that the data we consume is of good quality?

Distancing from Third-Party Data

Just as we are practicing social distancing in our personal lives, so too we must practice data distancing in our professional lives.

In regard to third-party data, we should ask ourselves: How was the data created? What formulas were used? Does the definition (description, classification, allowable range of values, etc.) of incoming, individual data elements match our internal definitions of those data elements?

If we reflect on the coronavirus example, we can ask: How do individual countries report their data? Do individual countries use the same testing protocols? Are infections universally defined the same way (based on widely administered tests or only hospital admissions)? Are asymptomatic infections reported? Are all countries using the same methods and formulas to collect and calculate infections, recoveries and deaths?

In our businesses, it is vital that we work to develop a deeper understanding of the sources, methods and quality of incoming third-party data. This deeper understanding will help us make better decisions about the risks and rewards of using that external data.

Data Governance Methods for Data Distancing

We’ve received lots of instructions lately about how to wash our hands to protect ourselves from coronavirus. Perhaps we thought we already knew how to wash our hands, but nonetheless, a refresher course has been worthwhile.

Similarly, perhaps we think we know how to protect our business data, but maybe a refresher would be useful here as well?

Here are a few steps you can take to protect your business:

  • Establish comprehensive third-party data sharing guidelines (for both inbound and outbound data). These guidelines should include communicating with third parties about how they make changes to collection and calculation methods.
  • Rationalize external data dictionaries to our internal data dictionaries and understand where differences occur and how we will overcome those differences.
  • Ingest to a quarantined area where it can be profiled and measured for quality, completeness, and correctness, and where necessary, cleansed.
  • Periodically review all data ingestion or data-sharing policies, processes and procedures to ensure they remain aligned to business needs and goals.
  • Establish data-sharing training programs so all data stakeholders understand associated security considerations, contextual meaning, and when and when not to share and/or ingest third-party data.

erwin Data Intelligence for Data Governance and Distancing

With solutions like those in the erwin Data Intelligence Suite (erwin DI), organizations can auto-document their metadata; classify their data with respect to privacy, contractual and regulatory requirements; attach data-sharing and management policies; and implement an appropriate level of data security.

If you believe the management of your third-party data interfaces could benefit from a review or tune-up, feel free to reach out to me and my colleagues here at erwin.

We’d be happy to provide a demo of how to use erwin DI for data distancing.

erwin Data Intelligence

Categories
erwin Expert Blog

Data Intelligence and Its Role in Combating Covid-19

Data intelligence has a critical role to play in the supercomputing battle against Covid-19.

Last week, The White House announced the launch of the COVID-19 High Performance Computing Consortium, a public-private partnership to provide COVID-19 researchers worldwide with access to the world’s most powerful high performance computing resources that can significantly advance the pace of scientific discovery in the fight to stop the virus.

Rensselaer Polytechnic Institute (RPI) is one of the organizations that has joined the consortium to provide computing resources to help fight the pandemic.

Data Intelligence COVID-19

While leveraging supercomputing power is a tremendous asset in our fight to combat this global pandemic, in order to deliver life-saving insights, you really have to understand what data you have and where it came from. Answering these questions is at the heart of data intelligence.

Managing and Governing Data From Lots of Disparate Sources

Collecting and managing data from many disparate sources for the Covid-19 High Performance Computing Consortium is on a scale beyond comprehension and, quite frankly, it boggles the mind to even think about it.

To feed the supercomputers with epidemiological data, the information will flow-in from many different and heavily regulated data sources, including population health, demographics, outbreak hotspots and economic impacts.

This data will be collected from organizations such as, the World Health Organization (WHO), the Centers for Disease Control (CDC), and state and local governments across the globe.

Privately it will come from hospitals, labs, pharmaceutical companies, doctors and private health insurers. It also will come from HL7 hospital data, claims administration systems, care management systems, the Medicaid Management Information System, etc.

These numerous data types and data sources most definitely weren’t designed to work together. As a result, the data may be compromised, rendering faulty analyses and insights.

To marry the epidemiological data to the population data it will require a tremendous amount of data intelligence about the:

  • Source of the data;
  • Currency of the data;
  • Quality of the data; and
  • How it can be used from an interoperability standpoint.

To do this, the consortium will need the ability to automatically scan and catalog the data sources and apply strict data governance and quality practices.

Unraveling Data Complexities with Metadata Management

Collecting and understanding this vast amount of epidemiological data in the fight against Covid-19 will require data governance oversite and data intelligence to unravel the complexities of the underlying data sources. To be successful and generate quality results, this consortium will need to adhere to strict disciplines around managing the data that comes into the study.

Metadata management will be critical to the process for cataloging data via automated scans. Essentially, metadata management is the administration of data that describes other data, with an emphasis on associations and lineage. It involves establishing policies and processes to ensure information can be integrated, accessed, shared, linked, analyzed and maintained.

While supercomputing can be used to process incredible amounts of data, a comprehensive data governance strategy plus technology will enable the consortium to determine master data sets, discover the impact of potential glossary changes, audit and score adherence to rules and data quality, discover risks, and appropriately apply security to data flows, as well as publish data to the right people.

Metadata management delivers the following capabilities, which are essential in building an automated, real-time, high-quality data pipeline:

  • Reference data management for capturing and harmonizing shared reference data domains
  • Data profiling for data assessment, metadata discovery and data validation
  • Data quality management for data validation and assurance
  • Data mapping management to capture the data flows, reconstruct data pipelines, and visualize data lineage
  • Data lineage to support impact analysis
  • Data pipeline automation to help develop and implement new data pipelines
  • Data cataloging to capture object metadata for identified data assets
  • Data discovery facilitated via a shared environment allowing data consumers to understand the use of data from a wide array of sources

Supercomputing will be very powerful in helping fight the COVID-19 virus. However, data scientists need access to quality data harvested from many disparate data sources that weren’t designed to work together to deliver critical insights and actionable intelligence.

Automated metadata harvesting, data cataloging, data mapping and data lineage combined with integrated business glossary management and self-service data discovery can give this important consortium data asset visibility and context so they have the relevant information they need to help us stop this virus effecting all of us around the globe.

To learn more about more about metadata management capabilities, download this white paper, Metadata Management: The Hero in Unleashing Enterprise Data’s Value.

COVID-19 Resources