erwin Expert Blog Data Governance Data Intelligence

Do I Need a Data Catalog?

If you’re serious about a data-driven strategy, you’re going to need a data catalog.

Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner.

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer.

It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., legacy systems, data warehouses, flat files stored on individual desktops and laptops, and modern, cloud-based repositories.)

These fragmented data environments make data governance a challenge since business stakeholders, data analysts and other users are unable to discover data or run queries across an entire data set. This also diminishes the value of data as an asset.

Data catalogs combine physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals in certain circumstances.

You also can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

The data catalog is a searchable asset that enables all data – including even formerly siloed tribal knowledge – to be cataloged and more quickly exposed to users for analysis.

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process.

For example, before users can effectively and meaningfully engage with robust business intelligence (BI) platforms, they must have a way to ensure that the most relevant, important and valuable data set are included in analysis.

The most optimal and streamlined way to achieve this is by using a data catalog, which can provide a first stop for users ahead of working in BI platforms.

As a collective intelligent asset, a data catalog should include capabilities for collecting and continually enriching or curating the metadata associated with each data asset to make them easier to identify, evaluate and use properly.

Data Catalog Benefits

Three Types of Metadata in a Data Catalog

A data catalog uses metadata, data that describes or summarizes data, to create an informative and searchable inventory of all data assets in an organization.

These assets can include but are not limited to structured data, unstructured data (including documents, web pages, email, social media content, mobile data, images, audio, video and reports) and query results, etc. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

For example, Amazon handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, the company also provides detailed information about each product, the seller’s information, shipping times, reviews, and a list of companion products. Sales are measured down to a zip code territory level across product categories.

Another classic example is the online or card catalog at a library. Each card or listing contains information about a book or publication (e.g., title, author, subject, publication date, edition, location) that makes the publication easier for a reader to find and to evaluate.

There are many types of metadata, but a data catalog deals primarily with three: technical metadata, operational or “process” metadata, and business metadata.

Technical Metadata

Technical metadata describes how the data is organized, stored, its transformation and lineage. It is structural and describes data objects such as tables, columns, rows, indexes and connections.

This aspect of the metadata guides data experts on how to work with the data (e.g. for analysis and integration purposes).

Operational Metadata

Operational metadata describes systems that process data, the applications in those systems, and the rules in those applications. This is also called “process” metadata that describes the data asset’s creation, when, how and by whom it has been accessed, used, updated or changed.

Operational metadata provides information about the asset’s history and lineage, which can help an analyst decide if the asset is recent enough for the task at hand, if it comes from a reliable source, if it has been updated by trustworthy individuals, and so on.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Business Metadata

Business metadata is sometimes referred to as external metadata attributed to the business aspects of a data asset. It defines the functionality of the data captured, definition of the data, definition of the elements, and definition of how the data is used within the business.

This is the area which binds all users together in terms of consistency and usage of catalogued data asset.

Tools should be provided that enable data experts to explore the data catalogs, curate and enrich the metadata with tags, associations, ratings, annotations, and any other information and context that helps users find data faster and use it with confidence.

Why You Need a Data Catalog – Three Business Benefits of Data Catalogs

When data professionals can help themselves to the data they need—without IT intervention and having to rely on finding experts or colleagues for advice, limiting themselves to only the assets they know about, and having to worry about governance and compliance—the entire organization benefits.

Catalog critical systems and data elements plus enable the calculation and evaluation of key performance measures. It is also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Makes data accessible and usable, reducing operational costs while increasing time to value

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

Data asset need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance.

Automating the curation of data assets also accelerates the time to value for analytics/insights reporting and significantly reduces operational costs.

  1. Ensures regulatory compliance

Regulations like the California Consumer Privacy Act (CCPA ) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance or reputational damage are the last things you need to worry about, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all the above and much more.

Request your own demo of erwin DI.

Data Intelligence for Data Automation

erwin Expert Blog Data Governance Data Intelligence

Overcoming the 80/20 Rule – Finding More Time with Data Intelligence

The 80/20 rule is well known. It describes an unfortunate reality for many data stewards, who spend 80 percent of their time finding, cleaning and reorganizing huge amounts of data, and only 20 percent of their time on actual data analysis.

That’s a lot wasted of time.

Earlier this year, erwin released its 2020 State of Data Governance and Automation (DGA) report. About 70 percent of the DGA report respondents – a combination of roles from data architects to executive managers – say they spend an average of 10 or more hours per week on data-related activities.

COVID-19 has changed the way we work – essentially overnight – and may change how companies work moving forward. Companies like Twitter, Shopify and Box have announced that they are moving to a permanent work-from-home status as their new normal.

For much of our time as data stewards, collecting, revising and building consensus around our metadata has meant that we need to balance find time on multiple calendars against multiple competing priorities so that we can pull the appropriate data stakeholders into a room to discuss term definitions, the rules for measuring “clean” data, and identifying processes and applications that use the data.

Overcoming the 80/20 Rule - Analyzing Data

This style of data governance most often presents us with eight one-hour opportunities per day (40 one-hour opportunities per week) to meet.

As the 80/20 rule suggests, getting through hundreds, or perhaps thousands of individual business terms using this one-hour meeting model can take … a … long … time.

Now that pulling stakeholders into a room has been disrupted …  what if we could use this as 40 opportunities to update the metadata PER DAY?

What if we could buck the trend, and overcome the 80/20 rule?

Overcoming the 80/20 Rule with Micro Governance for Metadata

Micro governance is a strategy that leverages the native functionality around workflows.

erwin Data Intelligence (DI) offers Workflow Manager that creates a persistent, reusable role-based workflow such that edits to the metadata for any term can move from, for example, draft to under review to approved to published.

Using a defined workflow, it can eliminate the need for hour-long meetings with multiple stakeholders in a room. Now users can suggest edits, review changes, and approve changes on their own schedule! Using micro governance these steps should take less than 10 minutes per term:

  • Log on the DI Suite
  • Open your work queue to see items requiring your attention
  • Review and/or approve changes
  • Log out

That’s it!

And as a bonus, where stakeholders may need to discuss the edits to achieve consensus, the Collaboration Center within the Business Glossary Manager facilitates conversations between stakeholders that persistent and attached directly to the business term. No more searching through months of email conversations or forgetting to cc a key stakeholder.

Using the DI Suite Workflow Manager and the Collaboration Center, and assuming an 8-hour workday, we should each have 48 opportunities for 10 minutes of micro-governance stewardship each day.

A Culture of Micro Governance

In these days when we are all working at home, and face-to-face meetings are all but impossible, we should see this time as an opportunity to develop a culture of micro governance around our metadata.

This new way of thinking and acting will help us continuously improve our transparency and semantic understanding of our data while staying connected and collaborating with each other.

When we finally get back into the office, the micro governance ethos we’ve built while at home will help make our data governance programs more flexible, responsive and agile. And ultimately, we’ll take up less of our colleagues’ precious time.

Request a free demo of erwin DI.

Data Intelligence for Data Automation

erwin Expert Blog Data Intelligence

The Top Five Data Intelligence Benefits

Data intelligence benefits data-driven organizations immensely. Primarily, it’s about helping organizations make more intelligent decisions based on their data.

It does this by affording organizations greater visibility and control over “data at rest” in databases, data lakes and data warehouses and “data in motion” as it’s integrated with and used by key applications.

For more context, see: What is Data Intelligence?

The Top 5 Data Intelligence Benefits

Through a better understanding of what data an organization has available – including its lineage, associated metadata and access permissions – organization’s data-driven decisions are afforded more context and ultimately, a greater likelihood of successful implementation.

Considering this, the benefits of data intelligence are huge, and include:

1. Improved consumer profiling and segmentation

Customer profiling and segmentation enables businesses and marketers to better understand their target consumer and group them together according to common characteristics and behavior.

Businesses will be able to cluster and classify consumers according to demographics, purchasing behavior, experience with product and services, and so much more. Having a holistic view of the customers’ preferences, transactions, and purchasing behavior enables businesses to make better decisions regarding the products and services they provide. Great examples are BMW Mini, Comfort Keepers, and Teleflora.

2. A greater understanding of company investments

Data intelligence is able to provide business data with a greater context in regard to the progress and effectiveness of their investments. Businesses that partner with IT companies can develop data intelligence that is tailored to monitoring and evaluating their current investments, as well as forecast potential future investments.

If the current investments that a business has is not as effective, then data intelligence tools can provide guidance on the best avenues to invest in. Big IT companies even have off-the-shelf data analytics software ready to be configured by a company to their needs.

3. The ability to apply real-time data in marketing strategies

With real-time analytics, businesses are able to utilize information such as regional or local sales patterns, inventory level summaries, local event trends, sales history, or seasonal factors in reviving marketing models and strategies and directing them to better serve their customers.

Real-time data analytics can be used by businesses to better meet customer needs as it arises and improve customer satisfaction. Dickey’s BBQ Pit was able to utilize data analytics across all its stores and, using the resulting information, adjust their promotions strategy from weekly to around every 12 to 24 hours.

4. A greater opportunity to enhance logistical and operational planning

Data intelligence can also enable businesses to enhance their operational and logistical planning. Insights on things such as delivery times, optimal event dates, potential external factors, potential route roadblocks, and optimal warehousing locations can help optimize operations and logistics.

Data intelligence can take raw, untimely, and incomprehensible data and present it in an aggregated, condensed, digestible, and usable information. UPS employed the Orion route optimization system and was able to cut down 364 million miles from its routes globally.

5. An enhanced capacity to improve customer experience

To keep pace with technology, businesses have been employing more tools and methods that incorporate modern technology like, Machine Learning, and the Internet of Things(IoT) to enhance the consumer experience.

Information derived from tools like customer profiling analyses is able to provide insight into consumer purchasing behavior, which the business then uses to tailor their products and services to match the needs of their target consumers. Businesses are also able to use such information to provide customers with user-centric customer experience.

digital transformation data intelligence

Transforming Industries with Data Intelligence

With big data, and tools such as Artificial Intelligence, Machine Learning, and Data Mining, organizations collect and analyze large amounts of data reliably and more efficiently. From Amazon to Airbnb, over the last decade, we’ve seen orgnaizations that take advantage of the aforementioned data intelligence benefits to manage large data volumes, rise to the pole position in their industry.

Now, in 2020, the benefits of data intelligence are enjoyed by organizations from a plethora of different markets and industries.

Data intelligence transforms the way industries operate by enabling businesses to hasten the process of analyzing and understanding the derived information with its more understandable models and aggregated trends.

Here’s how data intelligence is benefiting some of the most common industries:


The travel industry has found enhanced quality and range of products and services to provide travelers, as well as optimization of travel pricing strategies for future travel offerings.

Businesses in the travel industry can analyze historical trends on travel peak travel seasons and customer Key Performance Indicators (KPI) and can adjust services, amenities, and packages to match customer needs.


Educators can provide a more valuable learning experience and environment for students. With the use of data intelligence tools, educational institutes can provide teachers with a more holistic view of a student’s academic performance.

Teachers can spot avenues for academic improvement, provide their students with support in aspects that need their help.


Several hospitals have also employed data intelligence tools in their services and operational processes. These hospitals are making use of dashboards that provide summary information on hospital patient trends, treatment costs, and waiting times.

Aside from these, these data intelligence tools also provide healthcare institutions with an encompassing view of the hospital and care critical data that hospitals can use to improve the quality and level of service and increase their economic efficiency.


The retail industry has also employed data intelligence in developing tools to better forecast and plan according to supply and demand trends and consumer Key Performance Indicators (KPI).

Businesses, both small and large, have made use of dashboards to monitor and illustrate transaction trends and product consumption rates. Tools such as these dashboards provide insight into customer purchasing patterns and transaction value that businesses such as Teleflora are leveraging to provide better products and services.

Data Intelligence Trends

With its rate of success evident among many of the most successful organizations in history, data intelligence is clearly no fad. Therefore, it’s important to keep an eye on both the current and upcoming data intelligence trends:

Real-time enterprise is the market.

Businesses, small and big, will be employing real-time data analytics and data-driven products and services as it will be what consumers will demand from businesses going forward.

Expanding big data.

Not moving from big data but instead expanding big data and incorporating more multifaceted data and data analytic methods and tools for more well-rounded insights and information.

Graph analytics and associative technology for better results.

This is where businesses and IT companies move forward with using natural associations within the data and use associative technology to derive better data for decision making.

DataOps and self-service.

DataOps will make business data processes more efficient and agile. This will make the business’s customer engagement and communication able to provide self-service interactions in their transactions and services.

Data literacy as a service.

Even more, businesses will be integrating data intelligence, hence the increasing demand for the skills and experienced dedicated development teams. Data literacy and data intelligence will further become an in-demand service.

Expanding search to multiform interaction.

Simple searches will be expanded to incorporate multifaceted search technology, from analyzing human expressions to transaction pattern analysis, and provide more robust search capabilities.

Ethical computing becomes crucial.

As technology becomes more ingrained in our day-to-day activities and consumes even more personal data, ethics and responsible computing will become essential in safeguarding consumer privacy and rights.

Incorporating blockchain technology into more industries.

Blockchain enables more secure and complex transaction record-keeping for businesses. More businesses employing data intelligence will be incorporating blockchain to support its processes.

Data quality management.

As exponential amounts of data will be consumed and processed, quality data governance and management will be essential. Overseeing the data collection and processing and implementing governance of these is important.

Enhanced data discovery and visualization.

With improved tools to process large volumes of data, numerous tools geared towards transforming this data into understandable and digestible information will be highly coveted.

As a Data-Driven Global Society, We Must Adapt

Data is what drives all of our actions, from individually trying to decide what to eat in the morning to entire global enterprises deciding what the next big global product will be. How we collect, process, and use the data for is what differs. Businesses will eventually move towards data-driven strategies and business models and with it the increased partnership with IT companies or hiring in-house dedicated development teams.

With a global market at hand, businesses can also employ a remote team and be assured that the same quality work will be provided. How businesses go about it may be diverse, but the direction is towards data-driven enterprises providing consumer-centric products and services.

This is a guest post from IT companies in Ukraine, a Ukraine-based software development company that provides top-level outsourcing services. 

erwin Expert Blog Data Modeling

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. After all, you can’t manage or govern what you can’t see, much less use it to make smart decisions.

Metadata management is the key to managing and governing your data and drawing intelligence from it. Beyond harvesting and cataloging metadata, it also must be visualized to break down the complexity of how data is organized and what data relationships there are so that meaning is explicit to all stakeholders in the data value chain.

Data Governance and Automation

Data models provide this visualization capability, create additional metadata and standardize the data design across the enterprise.

While modeling has always been the best way to understand complex data sources and automate design standards, modern data modeling goes well beyond these domains to ensure and accelerate the overall success of data governance in any organization.

You can’t overestimate the importance of success as data governance keeps the business in line with privacy mandates such as the General Data Protection Regulation (GDPR). It drives innovation too. Companies who want to advance AI initiatives, for instance, won’t get very far without quality data and well-defined data models.

Why Is Data Modeling the Building Block of Enterprise Data Management?

DM mitigates complexity and increases collaboration and literacy across a broad range of data stakeholders.

  • DM uncovers the connections between disparate data elements.

The DM process enables the creation and integration of business and semantic metadata to augment and accelerate data governance and intelligence efforts.

  • DM captures and shares how the business describes and uses data.

DM delivers design task automation and enforcement to ensure data integrity.

  • DM builds higher quality data sources with the appropriate structural veracity.

DM delivers design task standardization to improve business alignment and simplify integration.

  • DM builds a more agile and governable data architecture.

The DM process manages the design and maintenance lifecycle for data sources.

  • DM governs the design and deployment of data across the enterprise.

DM documents, standardizes and aligns any type of data no matter where it lives. 

Realizing the Data Governance Value from Data Modeling

Modeling becomes the point of true collaboration within an organization because it delivers a visual source of truth for everyone to follow – data management and business professionals – to conform to governance requirements.

Information is readily available within intuitive business glossaries, accessible to user roles according to parameters set by the business. The metadata repository behind these glossaries, populated by information stored in data models, serves up the key terms that are understandable and meaningful to every party in the enterprise.

The stage, then, is equally set for improved data intelligence, because stakeholders now can use, understand and trust relevant data to enhance decision-making across the enterprise.

The enterprise is coming to the point where both business and IT co-own data modeling processes and data models. Business analysts and other power users start to understand data complexities because they can grasp terms and contribute to making the data in their organization accurate and complete, and modeling grows in importance in the eyes of business users.

Bringing data to the business and making it easy to access and understand increases the value of  data assets, providing a return on investment and a return on opportunity. But neither would be possible without data modeling providing the backbone for metadata management and proper data governance.

For more information, check out our whitepaper, Drive Business Value and Underpin Data Governance with an Enterprise Data Model.

You also can take erwin DM, the world’s No. 1 data modeling software, for a free spin.

erwin Data Modeler Free Trial - Data Modeling

erwin Expert Blog Data Intelligence

What is a Data Catalog?

The easiest way to understand a data catalog is to look at how libraries catalog books and manuals in a hierarchical structure, making it easy for anyone to find exactly what they need.

Similarly, a data catalog enables businesses to create a seamless way for employees to access and consume data and business assets in an organized manner.

By combining physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals, you can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Another foundational purpose of a data catalog is to streamline, organize and process the thousands, if not millions, of an organization’s data assets to help consumers/users search for specific datasets and understand metadata, ownership, data lineage and usage.

Look at Amazon and how it handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, they also give detailed information about each product, the seller’s information, shipping times, reviews and a list of companion products. The company measure sales down to a zip-code territory level across product categories.

Data Catalog Use Case Example: Crisis Proof Your Business

One of the biggest lessons we’re learning from the global COVID-19 pandemic is the importance of data, specifically using a data catalog to comply, collaborate and innovate to crisis-proof our businesses.

As COVID-19 continues to spread, organizations are evaluating and adjusting their operations in terms of both risk management and business continuity. Data is critical to these decisions, such as how to ramp up and support remote employees, re-engineer processes, change entire business models, and adjust supply chains.

Think about the pandemic itself and the numerous global entities involved in identifying it, tracking its trajectory, and providing guidance to governments, healthcare systems and the general public. One example is the European Union (EU) Open Data Portal, which is used to document, catalog and govern EU data related to the pandemic. This information has helped:

  • Provide daily updates
  • Give guidance to governments, health professionals and the public
  • Support the development and approval of treatments and vaccines
  • Help with crisis coordination, including repatriation and humanitarian aid
  • Put border controls in place
  • Assist with supply chain control and consular coordination

So one of the biggest lessons we’re learning from COVID-19 is the need for data collection, management and governance. What’s the best way to organize data and ensure it is supported by business policies and well-defined, governed systems, data elements and performance measures?

According to Gartner, “organizations that offer a curated catalog of internal and external data to diverse users will realize twice the business value from their data and analytics investments than those that do not.”

Data Catalog Benefits

5 Advantages of Using a Data Catalog for Crisis Preparedness & Business Continuity

The World Bank has been able to provide an array of real-time data, statistical indicators, and other types of data relevant to the coronavirus pandemic through its authoritative data catalogs. The World Bank data catalogs contain datasets, policies, critical data elements and measures useful for analysis and modeling the virus’ trajectory to help organizations measure the impact.

What can your organization learn from this example when it comes to crisis preparedness and business continuity? By developing and maintaining a data catalog as part of a larger data governance program supported by stakeholders across the organization, you can:

  1. Catalog and Share Information Assets

Catalog critical systems and data elements, plus enable the calculation and evaluation of key performance measures. It’s also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Clearly Document Data Policies and Rules

Managing a remote workforce creates new challenges and risks. Do employees have remote access to essential systems? Do they know what the company’s work-from-home policies are? Do employees understand how to handle sensitive data? Are they equipped to maintain data security and privacy? A data catalog with self-service access serves up the correct policies and procedures.

  1. Reduce Operational Costs While Increasing Time to Value

Datasets need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance. Automating the curation of data assets also accelerates the time to value for analytics/insights reporting significantly reduce operational costs.

  1. Make Data Accessible & Usable

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

  1. Ensure Regulatory Compliance

Regulations like the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance is the last thing you need on top of everything else your organization is dealing with, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all of the above and more.

Join us for the next live demo of erwin DI.

Data Intelligence for Data Automation

erwin Expert Blog

Data Governance for Smart Data Distancing

Hello from my home office! I hope you and your family are staying safe, practicing social distancing, and of course, washing your hands.

These are indeed strange days. During this coronavirus emergency, we are all being deluged by data from politicians, government agencies, news outlets, social media and websites, including valid facts but also opinions and rumors.

Happily for us data geeks, the general public is being told how important our efforts and those of data scientists are to analyzing, mapping and ultimately shutting down this pandemic.

Yay, data geeks!

Unfortunately though, not all of the incoming information is of equal value, ethically sourced, rigorously prepared or even good.

As we work to protect the health and safety of those around us, we need to understand the nuances of meaning for the received information as well as the motivations of information sources to make good decisions.

On a very personal level, separating the good information from the bad becomes a matter of life and potential death. On a business level, decisions based on bad external data may have the potential to cause business failures.

In business, data is the food that feeds the body or enterprise. Better data makes the body stronger and provides a foundation for the use of analytics and data science tools to reduce errors in decision-making. Ultimately, it gives our businesses the strength to deliver better products and services to our customers.

How then, as a business, can we ensure that the data we consume is of good quality?

Distancing from Third-Party Data

Just as we are practicing social distancing in our personal lives, so too we must practice data distancing in our professional lives.

In regard to third-party data, we should ask ourselves: How was the data created? What formulas were used? Does the definition (description, classification, allowable range of values, etc.) of incoming, individual data elements match our internal definitions of those data elements?

If we reflect on the coronavirus example, we can ask: How do individual countries report their data? Do individual countries use the same testing protocols? Are infections universally defined the same way (based on widely administered tests or only hospital admissions)? Are asymptomatic infections reported? Are all countries using the same methods and formulas to collect and calculate infections, recoveries and deaths?

In our businesses, it is vital that we work to develop a deeper understanding of the sources, methods and quality of incoming third-party data. This deeper understanding will help us make better decisions about the risks and rewards of using that external data.

Data Governance Methods for Data Distancing

We’ve received lots of instructions lately about how to wash our hands to protect ourselves from coronavirus. Perhaps we thought we already knew how to wash our hands, but nonetheless, a refresher course has been worthwhile.

Similarly, perhaps we think we know how to protect our business data, but maybe a refresher would be useful here as well?

Here are a few steps you can take to protect your business:

  • Establish comprehensive third-party data sharing guidelines (for both inbound and outbound data). These guidelines should include communicating with third parties about how they make changes to collection and calculation methods.
  • Rationalize external data dictionaries to our internal data dictionaries and understand where differences occur and how we will overcome those differences.
  • Ingest to a quarantined area where it can be profiled and measured for quality, completeness, and correctness, and where necessary, cleansed.
  • Periodically review all data ingestion or data-sharing policies, processes and procedures to ensure they remain aligned to business needs and goals.
  • Establish data-sharing training programs so all data stakeholders understand associated security considerations, contextual meaning, and when and when not to share and/or ingest third-party data.

erwin Data Intelligence for Data Governance and Distancing

With solutions like those in the erwin Data Intelligence Suite (erwin DI), organizations can auto-document their metadata; classify their data with respect to privacy, contractual and regulatory requirements; attach data-sharing and management policies; and implement an appropriate level of data security.

If you believe the management of your third-party data interfaces could benefit from a review or tune-up, feel free to reach out to me and my colleagues here at erwin.

We’d be happy to provide a demo of how to use erwin DI for data distancing.

erwin Data Intelligence

erwin Expert Blog

Data Intelligence and Its Role in Combating Covid-19

Data intelligence has a critical role to play in the supercomputing battle against Covid-19.

Last week, The White House announced the launch of the COVID-19 High Performance Computing Consortium, a public-private partnership to provide COVID-19 researchers worldwide with access to the world’s most powerful high performance computing resources that can significantly advance the pace of scientific discovery in the fight to stop the virus.

Rensselaer Polytechnic Institute (RPI) is one of the organizations that has joined the consortium to provide computing resources to help fight the pandemic.

Data Intelligence COVID-19

While leveraging supercomputing power is a tremendous asset in our fight to combat this global pandemic, in order to deliver life-saving insights, you really have to understand what data you have and where it came from. Answering these questions is at the heart of data intelligence.

Managing and Governing Data From Lots of Disparate Sources

Collecting and managing data from many disparate sources for the Covid-19 High Performance Computing Consortium is on a scale beyond comprehension and, quite frankly, it boggles the mind to even think about it.

To feed the supercomputers with epidemiological data, the information will flow-in from many different and heavily regulated data sources, including population health, demographics, outbreak hotspots and economic impacts.

This data will be collected from organizations such as, the World Health Organization (WHO), the Centers for Disease Control (CDC), and state and local governments across the globe.

Privately it will come from hospitals, labs, pharmaceutical companies, doctors and private health insurers. It also will come from HL7 hospital data, claims administration systems, care management systems, the Medicaid Management Information System, etc.

These numerous data types and data sources most definitely weren’t designed to work together. As a result, the data may be compromised, rendering faulty analyses and insights.

To marry the epidemiological data to the population data it will require a tremendous amount of data intelligence about the:

  • Source of the data;
  • Currency of the data;
  • Quality of the data; and
  • How it can be used from an interoperability standpoint.

To do this, the consortium will need the ability to automatically scan and catalog the data sources and apply strict data governance and quality practices.

Unraveling Data Complexities with Metadata Management

Collecting and understanding this vast amount of epidemiological data in the fight against Covid-19 will require data governance oversite and data intelligence to unravel the complexities of the underlying data sources. To be successful and generate quality results, this consortium will need to adhere to strict disciplines around managing the data that comes into the study.

Metadata management will be critical to the process for cataloging data via automated scans. Essentially, metadata management is the administration of data that describes other data, with an emphasis on associations and lineage. It involves establishing policies and processes to ensure information can be integrated, accessed, shared, linked, analyzed and maintained.

While supercomputing can be used to process incredible amounts of data, a comprehensive data governance strategy plus technology will enable the consortium to determine master data sets, discover the impact of potential glossary changes, audit and score adherence to rules and data quality, discover risks, and appropriately apply security to data flows, as well as publish data to the right people.

Metadata management delivers the following capabilities, which are essential in building an automated, real-time, high-quality data pipeline:

  • Reference data management for capturing and harmonizing shared reference data domains
  • Data profiling for data assessment, metadata discovery and data validation
  • Data quality management for data validation and assurance
  • Data mapping management to capture the data flows, reconstruct data pipelines, and visualize data lineage
  • Data lineage to support impact analysis
  • Data pipeline automation to help develop and implement new data pipelines
  • Data cataloging to capture object metadata for identified data assets
  • Data discovery facilitated via a shared environment allowing data consumers to understand the use of data from a wide array of sources

Supercomputing will be very powerful in helping fight the COVID-19 virus. However, data scientists need access to quality data harvested from many disparate data sources that weren’t designed to work together to deliver critical insights and actionable intelligence.

Automated metadata harvesting, data cataloging, data mapping and data lineage combined with integrated business glossary management and self-service data discovery can give this important consortium data asset visibility and context so they have the relevant information they need to help us stop this virus effecting all of us around the globe.

To learn more about more about metadata management capabilities, download this white paper, Metadata Management: The Hero in Unleashing Enterprise Data’s Value.

COVID-19 Resources

erwin Expert Blog

Talk Data to Me: Why Employee Data Literacy Matters  

Organizations are flooded with data, so they’re scrambling to find ways to derive meaningful insights from it – and then act on them to improve the bottom line.

In today’s data-driven business, enabling employees to access and understand the data that’s relevant to their roles allows them to use data and put those insights into action. To do this, employees need to “talk data,” aka data literacy.

However, Gartner predicts that this year 50 percent of organizations will lack sufficient AI and data literacy skills to achieve business value. This requires organizations to invest in ensuring their employees are data literate.

Data Literacy & the Rise of the Citizen Analyst

According to Gartner, “data literacy is the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied — and the ability to describe the use case, application and resulting value.”

Today, your employees are essentially data consumers. There are three technological advances driving this data consumption and, in turn, the ability for employees to leverage this data to deliver business value 1) exploding data production 2) scalable big data computation, and 3) the accessibility of advanced analytics, machine learning (ML) and artificial intelligence (AI).

The confluence of this data explosion has created a fertile environment for data innovation and transformation. As a result, we’re seeing the rise of the “citizen analyst,” who brings business knowledge and subject-matter expertise to data-driven insights.

Some examples of citizen analysts include the VP of finance who may be looking for opportunities to optimize the top- and bottom-line results for growth and profitability. Or the product line manager who wants to understand enterprise impact of pricing changes.

David Loshin explores this concept in an erwin-sponsored whitepaper, Data Intelligence: Empowering the Citizen Analyst with Democratized Data.

In the whitepaper he states, the priority of the citizen analyst is straightforward: find the right data to develop reports and analyses that support a larger business case. However, some practical data management issues contribute to a growing need for enterprise data governance, including:

  • Increasing data volumes that challenge the traditional enterprise’s ability to store, manage and ultimately find data
  • Increased data variety, balancing structured, semi-structured and unstructured data, as well as data originating from a widening array of external sources
  • Reducing the IT bottleneck that creates barriers to data accessibility
  • Desire for self-service to free the data consumers from strict predefined data transformations and organizations
  • Hybrid on-premises/cloud environments that complicate data integration and preparation
  • Privacy and data protection laws from many countries that influence the ways data assets may be accessed and used

Data Democratization Requires Data Intelligence

According to Loshin, organizations need to empower their citizen analysts. A fundamental component of data literacy involves data democratization, sharing data assets with a broad set of data consumer communities in a governed way.

  • The objectives of governed data democratization include:
  • Raising data awareness
  • Improving data literacy
  • Supporting observance of data policies to support regulatory compliance
  • Simplifying data accessibility and use

Effective data democratization requires data intelligence. This is dependent on accumulating, documenting and publishing information about the data assets used across the entire enterprise data landscape.

Here are the steps to effective data intelligence:

  • Reconnaissance: Understanding the data environment and the corresponding business contexts and collecting as much information as possible
  • Surveillance: Monitoring the environment for changes to data sources
  • Logistics and Planning: Mapping the collected information production flows and mapping how data moves across the enterprise
  • Impact Assessment: Using what you have learned to assess how external changes impact the environment
  • Synthesis: Empowering data consumers by providing a holistic perspective associated with specific business terms
  • Sustainability: Embracing automation to always provide up-to-date and correct intelligence
  • Auditability: Providing oversight and being able to explain what you have learned and why

Data Literacy: The Heart of Data-Driven Innovation

Data literacy is at the heart of successful data-driven innovation and accelerating the realization of actionable data-driven insights.

It can reduce data source discovery and analyses cycles, improve accuracy in results, reduce the reliance expensive technical resources, assure the “right” data is used the first time reducing deployed errors and the need for expensive re-work.

Ultimately, a successful data literacy program will empower your employees to:

  • Better understand and identify the data they require
  • Be more self-sufficient in accessing and preparing the data they require
  • Better articulate the gaps that exist in the data landscape when it comes to fulfilling their data needs
  • Share their knowledge and experience with data with other consumers to contribute to the greater good
  • Collaborate more effectively with their partners in data (management and governance) for greater efficiency and higher quality outcomes

erwin offers a data intelligence software suite combining the capabilities of erwin Data Catalog with erwin Data Literacy to fuel an automated, real-time, high-quality data pipeline.

Then all enterprise stakeholders – data scientists, data stewards, ETL developers, enterprise architects, business analysts, compliance officers, citizen analysts, CDOs and CEOs – can access data relevant to their roles for insights they can put into action.

Click here to request a demo of erwin Data Intelligence.

erwin Data Intelligence

erwin Expert Blog

Automation Gives DevOps More Horsepower

Almost 70 percent of CEOs say they expect their companies to change their business models in the next three years, and 62 percent report they have management initiatives or transformation programs underway to make their businesses more digital, according to Gartner.

Wouldn’t it be advantageous for these organizations to accelerate these digital transformation efforts? They have that option with automation, shifting DevOps away from dependence on manual processes. Just like with cars, more horsepower in DevOps translates to greater speed.

DevOps Automation

Doing More with Less

We have clients looking to do more with existing resources, and others looking to reduce full-time employee count on their DevOps teams. With metadata-driven automation, many DevOps processes can be automated, adding more “horsepower” to increase their speed and accuracy. For example:

Auto-documentation of data mappings and lineage: By using data harvesting templates, organizations can eliminate time spent updating and maintaining data mappings, creating them directly from code written by the ETL staff. Such automation can save close to 100 percent of the time usually spent on this type of documentation.

  • Data lineage and impact analysis views for ‘data in motion’ also stay up to date with no additional effort.
  • Human errors are eliminated, leading to higher quality documentation and output.

Automatic updates/changes reflected throughout each release cycle: Updates can be picked up and the ETL job/package generated with 100-percent accuracy. An ETL developer is not required to ‘hand code’ mappings from a spreadsheet – greatly reducing the time spent on the ETL process, and perhaps the total number of resources required to manage that process month over month.

  • ETL skills are still necessary for validation and to compile and execute the automated jobs, but the overall quality of these jobs (machine-generated code) will be much higher, also eliminating churn and rework.

Auto-scanning of source and target data assets with synchronized mappings: This automation eliminates the need for a resource or several resources dealing with manual updates to the design mappings, creating additional time savings and cost reductions associated with data preparation.

  • A change in the source-column header may impact 1,500 design mappings. Managed manually, this process – opening the mapping document, making the change, saving the file with a new version, and placing it into a shared folder for development – could take an analyst several days. But synchronization instantly updates the mappings, correctly versioned, and can be picked up and packaged into an ETL job/package within the same hour. Whether using agile or classic waterfall development, these processes will see exponential improvement and time reduction. 

Data Intelligence: Speed and Quality Without Compromise

Our clients often understand that incredible DevOps improvements are possible, but they fear the “work” it will take to get there.

It really comes down to deciding to embrace change a la automation or continue down the same path. But isn’t the definition of insanity doing the same thing over and over, expecting but never realizing different results?

With traditional means, you may improve speed but sacrifice quality. On the flipside, you may improve quality but sacrifice speed.

However, erwin’s technology shifts this paradigm. You can have both speed and quality.

The erwin Data Intelligence Suite (erwin DI) combines the capabilities of erwin Data Catalog with erwin Data Literacy to fuel an automated, real-time, high-quality data pipeline.

Then all enterprise stakeholders – data scientists, data stewards, ETL developers, enterprise architects, business analysts, compliance officers, CDOs and CEOs – can access data relevant to their roles for insights they can put into action.

It creates the fastest path to value, with an automation framework and metadata connectors configured by our team to deliver the data harvesting and preparation features that make capturing enterprise data assets fast and accurate.

Click here to request a free demo of erwin DI.

erwin Data Intelligence

erwin Expert Blog

Data Governance Makes Data Security Less Scary

Happy Halloween!

Do you know where your data is? What data you have? Who has had access to it?

These can be frightening questions for an organization to answer.

Add to the mix the potential for a data breach followed by non-compliance, reputational damage and financial penalties and a real horror story could unfold.

In fact, we’ve seen some frightening ones play out already:

  1. Google’s record GDPR fine – France’s data privacy enforcement agency hit the tech giant with a $57 million penalty in early 2019 – more than 80 times the steepest fine the U.K.’s Information Commissioner’s Office had levied against both Facebook and Equifax for their data breaches.
  2. In July 2019, British Airways received the biggest GDPR fine to date ($229 million) because the data of more than 500,000 customers was compromised.
  3. Marriot International was fined $123 million, or 1.5 percent of its global annual revenue, because 330 million hotel guests were affected by a breach in 2018.

Now, as Cybersecurity Awareness Month comes to a close – and ghosts and goblins roam the streets – we thought it a good time to resurrect some guidance on how data governance can make data security less scary.

We don’t want you to be caught off guard when it comes to protecting sensitive data and staying compliant with data regulations.

Data Governance Makes Data Security Less Scary

Don’t Scream; You Can Protect Your Sensitive Data

It’s easier to protect sensitive data when you know what it is, where it’s stored and how it needs to be governed.

Data security incidents may be the result of not having a true data governance foundation that makes it possible to understand the context of data – what assets exist and where, the relationship between them and enterprise systems and processes, and how and by what authorized parties data is used.

That knowledge is critical to supporting efforts to keep relevant data secure and private.

Without data governance, organizations don’t have visibility of the full data landscape – linkages, processes, people and so on – to propel more context-sensitive security architectures that can better assure expectations around user and corporate data privacy. In sum, they lack the ability to connect the dots across governance, security and privacy – and to act accordingly.

This addresses these fundamental questions:

  1. What private data do we store and how is it used?
  2. Who has access and permissions to the data?
  3. What data do we have and where is it?

Where Are the Skeletons?

Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Knowing where sensitive data is located and properly governing it with policy rules, impact analysis and lineage views is critical for risk management, data audits and regulatory compliance.

However, when key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed and therefore your organization is at risk.

Sensitive data – at rest or in motion – that exists in various forms across multiple systems must be automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its usage across workflows easily traced.

Thankfully, tools are available to help automate the scanning, detection and tagging of sensitive data by:

  • Monitoring and controlling sensitive data: Better visibility and control across the enterprise to identify data security threats and reduce associated risks
  • Enriching business data elements for sensitive data discovery: Comprehensively defining business data element for PII, PHI and PCI across database systems, cloud and Big Data stores to easily identify sensitive data based on a set of algorithms and data patterns
  • Providing metadata and value-based analysis: Discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data including PII, PHI, PCI and other sensitive information.

No Hocus Pocus

Truly understanding an organization’s data, including its value and quality, requires a harmonized approach embedded in business processes and enterprise architecture.

Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

An ounce of prevention is worth a pound of cure  – from the painstaking process of identifying what happened and why to notifying customers their data and thus their trust in your organization has been compromised.

A well-formed security architecture that is driven by and aligned by data intelligence is your best defense. However, if there is nefarious intent, a hacker will find a way. So being prepared means you can minimize your risk exposure and the damage to your reputation.

Multiple components must be considered to effectively support a data governance, security and privacy trinity. They are:

  1. Data models
  2. Enterprise architecture
  3. Business process models

Creating policies for data handling and accountability and driving culture change so people understand how to properly work with data are two important components of a data governance initiative, as is the technology for proactively managing data assets.

Without the ability to harvest metadata schemas and business terms; analyze data attributes and relationships; impose structure on definitions; and view all data in one place according to each user’s role within the enterprise, businesses will be hard pressed to stay in step with governance standards and best practices around security and privacy.

As a consequence, the private information held within organizations will continue to be at risk.

Organizations suffering data breaches will be deprived of the benefits they had hoped to realize from the money spent on security technologies and the time invested in developing data privacy classifications.

They also may face heavy fines and other financial, not to mention PR, penalties.

Gartner Magic Quadrant Metadata Management