Categories
erwin Expert Blog Metadata Management

7 Benefits of Metadata Management

Metadata management is key to wringing all the value possible from data assets.

However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives.

What Is Metadata?

Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”

Quite simply, metadata is data about data. It’s generated every time data is captured at a source, accessed by users, moved through an organization, integrated or augmented with other data from other sources, profiled, cleansed and analyzed.

It’s valuable because it provides information about the attributes of data elements that can be used to guide strategic and operational decision-making. Metadata management is the administration of data that describes other data, with an emphasis on associations and lineage. It involves establishing policies and processes to ensure information can be integrated, accessed, shared, linked, analyzed and maintained across an organization.

Metadata Answers Key Questions

A strong data management strategy and supporting technology enables the data quality the business requires, including data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

Metadata answers a lot of important questions:

  • What data do we have?
  • Where did it come from?
  • Where is it now?
  • How has it changed since it was originally created or captured?
  • Who is authorized to use it and how?
  • Is it sensitive or are there any risks associated with it?

Metadata also helps your organization to:

  • Discover data. Identify and interrogate metadata from various data management silos.
  • Harvest data. Automate the collection of metadata from various data management silos and consolidate it into a single source.
  • Structure and deploy data sources. Connect physical metadata to specific data models, business terms, definitions and reusable design standards.
  • Analyze metadata. Understand how data relates to the business and what attributes it has.
  • Map data flows. Identify where to integrate data and track how it moves and transforms.
  • Govern data. Develop a governance model to manage standards, policies and best practices and associate them with physical assets.
  • Socialize data. Empower stakeholders to see data in one place and in the context of their roles.

Metadata management

The Benefits of Metadata Management

1. Better data quality. With automation, data quality is systemically assured with the data pipeline seamlessly governed and operationalized to the benefit of all stakeholders. Data issues and inconsistencies within integrated data sources or targets are identified in real time to improve overall data quality by increasing time to insights and/or repair. It’s easier to map, move and test data for regular maintenance of existing structures, movement from legacy systems to new systems during a merger or acquisition or a modernization effort.

2. Quicker project delivery. Automated enterprise metadata management provides greater accuracy and up to 70 percent acceleration in project delivery for data movement and/or deployment projects. It harvests metadata from various data sources and maps any data element from source to target and harmonize data integration across platforms. With this accurate picture of your metadata landscape, you can accelerate Big Data deployments, Data Vaults, data warehouse modernization, cloud migration, etc.

3. Faster speed to insights. High-paid knowledge workers like data scientists spend up to 80 percent of their time finding and understanding source data and resolving errors or inconsistencies, rather than analyzing it for real value. That equation can be reversed with stronger data operations and analytics leading to insights more quickly, with access/connectivity to underlying metadata and its lineage. Technical resources are free to concentrate on the highest-value projects, while business analysts, data architects, ETL developers, testers and project managers can collaborate more easily for faster decision-making.

4. Greater productivity & reduced costs. Being able to rely on automated and repeatable metadata management processes results in greater productivity. For example, one erwin DI customer has experienced a steep improvement in productivity – more than 85 percent – because manually intensive and complex coding efforts have been automated and 70+ percent because of seamless access to and visibility of all metadata, including end-to-end lineage. Significant data design and conversion savings, up to 50 percent and 70 percent respectively, also are possible with data mapping costs going down as much as 80 percent.

5. Regulatory compliance. Regulations such as the General Data Protection Regulation (GDPR), Health Insurance and Portability Accountability Act (HIPAA), Basel Committee on Banking Supervision (BCBS) and The California Consumer Privacy Act (CCPA) particularly affect sectors such as finance, retail, healthcare and pharmaceutical/life sciences. When key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed. Sensitive data is automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its use across workflows easily traced.

6. Digital transformation. Knowing what data exists and its value potential promotes digital transformation by 1) improving digital experiences because you understand how the organization interacts with and supports customers, 2) enhancing digital operations because data preparation and analysis projects happen faster, 3) driving digital innovation because data can be used to deliver new products and services, and 4) building digital ecosystems because organizations need to establish platforms and partnerships to scale and grow.

7. An enterprise data governance experience. Stakeholders include both IT and business users in collaborative relationships, so that makes data governance everyone’s business. Modern, strategic data governance must be an ongoing initiative, and it requires everyone from executives on down to rethink their data duties and assume new levels of cooperation and accountability. With business data stakeholders driving alignment between data governance and strategic enterprise goals and IT handling the technical mechanics of data management, the door opens to finding, trusting and using data to effectively meet any organizational objective.

An Automated Solution

When approached manually, metadata management is expensive, time-consuming, error-prone and can’t keep pace with a dynamic enterprise data management infrastructure.

And while integrating and automating data management and data governance is still a new concept for many organizations, its advantages are clear.

erwin’s metadata management offering, the erwin Data Intelligence Suite (erwin DI), includes data catalogdata literacy and automation capabilities for greater awareness of and access to data assets, guidance on their use, and guardrails to ensure data policies and best practices are followed. Its automated, metadata-driven framework gives organizations visibility and control over their disparate data streams – from harvesting to aggregation and integration, including transformation with complete upstream and downstream lineage and all the associated documentation.

erwin has been named a leader in the Gartner 2020 “Magic Quadrant for Metadata Management Solutions” for two consecutive years. Click here to download the full Gartner 2020 “Magic Quadrant for Metadata Management Solutions” report.

Categories
erwin Expert Blog Data Governance

Are Data Governance Bottlenecks Holding You Back?

Better decision-making has now topped compliance as the primary driver of data governance. However, organizations still encounter a number of bottlenecks that may hold them back from fully realizing the value of their data in producing timely and relevant business insights.

While acknowledging that data governance is about more than risk management and regulatory compliance may indicate that companies are more confident in their data, the data governance practice is nonetheless growing in complexity because of more:

  • Data to handle, much of it unstructured
  • Sources, like IoT
  • Points of integration
  • Regulations

Without an accurate, high-quality, real-time enterprise data pipeline, it will be difficult to uncover the necessary intelligence to make optimal business decisions.

So what’s holding organizations back from fully using their data to make better, smarter business decisions?

Data Governance Bottlenecks

erwin’s 2020 State of Data Governance and Automation report, based on a survey of business and technology professionals at organizations of various sizes and across numerous industries, examined the role of automation in  data governance and intelligence  efforts.  It uncovered a number of obstacles that organizations have to overcome to improve their data operations.

The No.1 bottleneck, according to 62 percent of respondents, was documenting complete data lineage. Understanding the quality of source data is the next most serious bottleneck (58 percent); followed by finding, identifying, and harvesting data (55 percent); and curating assets with business context (52 percent).

The report revealed that all but two of the possible bottlenecks were marked by more than 50 percent of respondents. Clearly, there’s a massive need for a data governance framework to keep these obstacles from stymying enterprise innovation.

As we zeroed in on the bottlenecks of day-to-day operations, 25 percent of respondents said length of project/delivery time was the most significant challenge, followed by data quality/accuracy is next at 24 percent, time to value at 16 percent, and reliance on developer and other technical resources at 13 percent.

Are Data Governance Bottlenecks Holding You Back?

Overcoming Data Governance Bottlenecks

The 80/20 rule describes the unfortunate reality for many data stewards: they spend 80 percent of their time finding, cleaning and reorganizing huge amounts of data and only 20 percent on actual data analysis.

In fact, we found that close to 70 percent of our survey respondents spent an average of 10 or more hours per week on data-related activities, most of it searching for and preparing data.

What can you do to reverse the 80/20 rule and subsequently overcome data governance bottlenecks?

1. Don’t ignore the complexity of data lineage: It’s a risky endeavor to support data lineage using a manual approach, and businesses that attempt it that way will find it’s not sustainable given data’s constant movement from one place to another via multiple routes – and doing it correctly down to the column level. Adopting automated end-to-end lineage makes it possible to view data movement from the source to reporting structures, providing a comprehensive and detailed view of data in motion.

2. Automate code generation: Alleviate the need for developers to hand code connections from data sources to target schema. Mapping data elements to their sources within a single repository to determine data lineage and harmonize data integration across platforms reduces the need for specialized, technical resources with knowledge of ETL and database procedural code. It also makes it easier for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

3. Use an integrated impact analysis solution: By automating data due diligence for IT you can deliver operational intelligence to the business. Business users benefit from automating impact analysis to better examine value and prioritize individual data sets. Impact analysis has equal importance to IT for automatically tracking changes and understanding how data from one system feeds other systems and reports. This is an aspect of data lineage, created from technical metadata, ensuring nothing “breaks” along the change train.

4. Put data quality first: Users must have confidence in the data they use for analytics. Automating and matching business terms with data assets and documenting lineage down to the column level are critical to good decision-making. If this approach hasn’t been the case to date, enterprises should take a few steps back to review data quality measures before jumping into automating data analytics.

5. Catalog data using a solution with a broad set of metadata connectors: All data sources will be leveraged, including big data, ETL platforms, BI reports, modeling tools, mainframe, and relational data as well as data from many other types of systems. Don’t settle for a data catalog from an emerging vendor that only supports a narrow swath of newer technologies, and don’t rely on a catalog from a legacy provider that may supply only connectors for standard, more mature data sources.

6. Stress data literacy: You want to ensure that data assets are used strategically. Automation expedites the benefits of data cataloging. Curated internal and external datasets for a range of content authors doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives. There’s a clear connection to data literacy here because of its foundation in business glossaries and socializing data so all stakeholders can view and understand it within the context of their roles.

7. Make automation the norm across all data governance processes: Too many companies still live in a world where data governance is a high-level mandate, not practically implemented. To fully realize the advantages of data governance and the power of data intelligence, data operations must be automated across the board. Without automated data management, the governance housekeeping load on the business will be so great that data quality will inevitably suffer. Being able to account for all enterprise data and resolve disparity in data sources and silos using manual approaches is wishful thinking.

8. Craft your data governance strategy before making any investments: Gather multiple stakeholders—both business and IT— with multiple viewpoints to discover where their needs mesh and where they diverge and what represents the greatest pain points to the business. Solve for these first, but build buy-in by creating a layered, comprehensive strategy that ultimately will address most issues. From there, it’s on to matching your needs to an automated data governance solution that squares with business and IT – both for immediate requirements and future plans.

Register now for the first of a new, six-part webinar series on the practice of data governance and how to proactively deal with the complexities. “The What & Why of Data Governance” webinar on Tuesday, Feb. 23rd at 3 pm GMT/10 am ET.

Categories
erwin Expert Blog Data Governance Data Intelligence

Do I Need a Data Catalog?

If you’re serious about a data-driven strategy, you’re going to need a data catalog.

Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner.

Given the value this sort of data-driven insight can provide, the reason organizations need a data catalog should become clearer.

It’s no surprise that most organizations’ data is often fragmented and siloed across numerous sources (e.g., legacy systems, data warehouses, flat files stored on individual desktops and laptops, and modern, cloud-based repositories.)

These fragmented data environments make data governance a challenge since business stakeholders, data analysts and other users are unable to discover data or run queries across an entire data set. This also diminishes the value of data as an asset.

Data catalogs combine physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals in certain circumstances.

You also can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

The data catalog is a searchable asset that enables all data – including even formerly siloed tribal knowledge – to be cataloged and more quickly exposed to users for analysis.

Organizations with particularly deep data stores might need a data catalog with advanced capabilities, such as automated metadata harvesting to speed up the data preparation process.

For example, before users can effectively and meaningfully engage with robust business intelligence (BI) platforms, they must have a way to ensure that the most relevant, important and valuable data set are included in analysis.

The most optimal and streamlined way to achieve this is by using a data catalog, which can provide a first stop for users ahead of working in BI platforms.

As a collective intelligent asset, a data catalog should include capabilities for collecting and continually enriching or curating the metadata associated with each data asset to make them easier to identify, evaluate and use properly.

Data Catalog Benefits

Three Types of Metadata in a Data Catalog

A data catalog uses metadata, data that describes or summarizes data, to create an informative and searchable inventory of all data assets in an organization.

These assets can include but are not limited to structured data, unstructured data (including documents, web pages, email, social media content, mobile data, images, audio, video and reports) and query results, etc. The metadata provides information about the asset that makes it easier to locate, understand and evaluate.

For example, Amazon handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, the company also provides detailed information about each product, the seller’s information, shipping times, reviews, and a list of companion products. Sales are measured down to a zip code territory level across product categories.

Another classic example is the online or card catalog at a library. Each card or listing contains information about a book or publication (e.g., title, author, subject, publication date, edition, location) that makes the publication easier for a reader to find and to evaluate.

There are many types of metadata, but a data catalog deals primarily with three: technical metadata, operational or “process” metadata, and business metadata.

Technical Metadata

Technical metadata describes how the data is organized, stored, its transformation and lineage. It is structural and describes data objects such as tables, columns, rows, indexes and connections.

This aspect of the metadata guides data experts on how to work with the data (e.g. for analysis and integration purposes).

Operational Metadata

Operational metadata describes systems that process data, the applications in those systems, and the rules in those applications. This is also called “process” metadata that describes the data asset’s creation, when, how and by whom it has been accessed, used, updated or changed.

Operational metadata provides information about the asset’s history and lineage, which can help an analyst decide if the asset is recent enough for the task at hand, if it comes from a reliable source, if it has been updated by trustworthy individuals, and so on.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Business Metadata

Business metadata is sometimes referred to as external metadata attributed to the business aspects of a data asset. It defines the functionality of the data captured, definition of the data, definition of the elements, and definition of how the data is used within the business.

This is the area which binds all users together in terms of consistency and usage of catalogued data asset.

Tools should be provided that enable data experts to explore the data catalogs, curate and enrich the metadata with tags, associations, ratings, annotations, and any other information and context that helps users find data faster and use it with confidence.

Why You Need a Data Catalog – Three Business Benefits of Data Catalogs

When data professionals can help themselves to the data they need—without IT intervention and having to rely on finding experts or colleagues for advice, limiting themselves to only the assets they know about, and having to worry about governance and compliance—the entire organization benefits.

Catalog critical systems and data elements plus enable the calculation and evaluation of key performance measures. It is also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Makes data accessible and usable, reducing operational costs while increasing time to value

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

Data asset need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance.

Automating the curation of data assets also accelerates the time to value for analytics/insights reporting and significantly reduces operational costs.

  1. Ensures regulatory compliance

Regulations like the California Consumer Privacy Act (CCPA ) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance or reputational damage are the last things you need to worry about, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all the above and much more.

Request your own demo of erwin DI.

Data Intelligence for Data Automation

Categories
erwin Expert Blog Data Intelligence

What is a Data Catalog?

The easiest way to understand a data catalog is to look at how libraries catalog books and manuals in a hierarchical structure, making it easy for anyone to find exactly what they need.

Similarly, a data catalog enables businesses to create a seamless way for employees to access and consume data and business assets in an organized manner.

By combining physical system catalogs, critical data elements, and key performance measures with clearly defined product and sales goals, you can manage the effectiveness of your business and ensure you understand what critical systems are for business continuity and measuring corporate performance.

As illustrated above, a data catalog is essential to business users because it synthesizes all the details about an organization’s data assets across multiple data sources. It organizes them into a simple, easy- to-digest format and then publishes them to data communities for knowledge-sharing and collaboration.

Another foundational purpose of a data catalog is to streamline, organize and process the thousands, if not millions, of an organization’s data assets to help consumers/users search for specific datasets and understand metadata, ownership, data lineage and usage.

Look at Amazon and how it handles millions of different products, and yet we, as consumers, can find almost anything about everything very quickly.

Beyond Amazon’s advanced search capabilities, they also give detailed information about each product, the seller’s information, shipping times, reviews and a list of companion products. The company measure sales down to a zip-code territory level across product categories.

Data Catalog Use Case Example: Crisis Proof Your Business

One of the biggest lessons we’re learning from the global COVID-19 pandemic is the importance of data, specifically using a data catalog to comply, collaborate and innovate to crisis-proof our businesses.

As COVID-19 continues to spread, organizations are evaluating and adjusting their operations in terms of both risk management and business continuity. Data is critical to these decisions, such as how to ramp up and support remote employees, re-engineer processes, change entire business models, and adjust supply chains.

Think about the pandemic itself and the numerous global entities involved in identifying it, tracking its trajectory, and providing guidance to governments, healthcare systems and the general public. One example is the European Union (EU) Open Data Portal, which is used to document, catalog and govern EU data related to the pandemic. This information has helped:

  • Provide daily updates
  • Give guidance to governments, health professionals and the public
  • Support the development and approval of treatments and vaccines
  • Help with crisis coordination, including repatriation and humanitarian aid
  • Put border controls in place
  • Assist with supply chain control and consular coordination

So one of the biggest lessons we’re learning from COVID-19 is the need for data collection, management and governance. What’s the best way to organize data and ensure it is supported by business policies and well-defined, governed systems, data elements and performance measures?

According to Gartner, “organizations that offer a curated catalog of internal and external data to diverse users will realize twice the business value from their data and analytics investments than those that do not.”

Data Catalog Benefits

5 Advantages of Using a Data Catalog for Crisis Preparedness & Business Continuity

The World Bank has been able to provide an array of real-time data, statistical indicators, and other types of data relevant to the coronavirus pandemic through its authoritative data catalogs. The World Bank data catalogs contain datasets, policies, critical data elements and measures useful for analysis and modeling the virus’ trajectory to help organizations measure the impact.

What can your organization learn from this example when it comes to crisis preparedness and business continuity? By developing and maintaining a data catalog as part of a larger data governance program supported by stakeholders across the organization, you can:

  1. Catalog and Share Information Assets

Catalog critical systems and data elements, plus enable the calculation and evaluation of key performance measures. It’s also important to understand data linage and be able to analyze the impacts to critical systems and essential business processes if a change occurs.

  1. Clearly Document Data Policies and Rules

Managing a remote workforce creates new challenges and risks. Do employees have remote access to essential systems? Do they know what the company’s work-from-home policies are? Do employees understand how to handle sensitive data? Are they equipped to maintain data security and privacy? A data catalog with self-service access serves up the correct policies and procedures.

  1. Reduce Operational Costs While Increasing Time to Value

Datasets need to be properly scanned, documented, tagged and annotated with their definitions, ownership, lineage and usage. Automating the cataloging of data assets saves initial development time and streamlines its ongoing maintenance and governance. Automating the curation of data assets also accelerates the time to value for analytics/insights reporting significantly reduce operational costs.

  1. Make Data Accessible & Usable

Open your organization’s data door, making it easier to access, search and understand information assets. A data catalog is the core of data analysis for decision-making, so automating its curation and access with the associated business context will enable stakeholders to spend more time analyzing it for meaningful insights they can put into action.

  1. Ensure Regulatory Compliance

Regulations like the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) require organizations to know where all their customer, prospect and employee data resides to ensure its security and privacy.

A fine for noncompliance is the last thing you need on top of everything else your organization is dealing with, so using a data catalog centralizes data management and the associated usage policies and guardrails.

See a Data Catalog in Action

The erwin Data Intelligence Suite (erwin DI) provides data catalog and data literacy capabilities with built-in automation so you can accomplish all of the above and more.

Join us for the next live demo of erwin DI.

Data Intelligence for Data Automation

Categories
erwin Expert Blog

Automation Gives DevOps More Horsepower

Almost 70 percent of CEOs say they expect their companies to change their business models in the next three years, and 62 percent report they have management initiatives or transformation programs underway to make their businesses more digital, according to Gartner.

Wouldn’t it be advantageous for these organizations to accelerate these digital transformation efforts? They have that option with automation, shifting DevOps away from dependence on manual processes. Just like with cars, more horsepower in DevOps translates to greater speed.

DevOps Automation

Doing More with Less

We have clients looking to do more with existing resources, and others looking to reduce full-time employee count on their DevOps teams. With metadata-driven automation, many DevOps processes can be automated, adding more “horsepower” to increase their speed and accuracy. For example:

Auto-documentation of data mappings and lineage: By using data harvesting templates, organizations can eliminate time spent updating and maintaining data mappings, creating them directly from code written by the ETL staff. Such automation can save close to 100 percent of the time usually spent on this type of documentation.

  • Data lineage and impact analysis views for ‘data in motion’ also stay up to date with no additional effort.
  • Human errors are eliminated, leading to higher quality documentation and output.

Automatic updates/changes reflected throughout each release cycle: Updates can be picked up and the ETL job/package generated with 100-percent accuracy. An ETL developer is not required to ‘hand code’ mappings from a spreadsheet – greatly reducing the time spent on the ETL process, and perhaps the total number of resources required to manage that process month over month.

  • ETL skills are still necessary for validation and to compile and execute the automated jobs, but the overall quality of these jobs (machine-generated code) will be much higher, also eliminating churn and rework.

Auto-scanning of source and target data assets with synchronized mappings: This automation eliminates the need for a resource or several resources dealing with manual updates to the design mappings, creating additional time savings and cost reductions associated with data preparation.

  • A change in the source-column header may impact 1,500 design mappings. Managed manually, this process – opening the mapping document, making the change, saving the file with a new version, and placing it into a shared folder for development – could take an analyst several days. But synchronization instantly updates the mappings, correctly versioned, and can be picked up and packaged into an ETL job/package within the same hour. Whether using agile or classic waterfall development, these processes will see exponential improvement and time reduction. 

Data Intelligence: Speed and Quality Without Compromise

Our clients often understand that incredible DevOps improvements are possible, but they fear the “work” it will take to get there.

It really comes down to deciding to embrace change a la automation or continue down the same path. But isn’t the definition of insanity doing the same thing over and over, expecting but never realizing different results?

With traditional means, you may improve speed but sacrifice quality. On the flipside, you may improve quality but sacrifice speed.

However, erwin’s technology shifts this paradigm. You can have both speed and quality.

The erwin Data Intelligence Suite (erwin DI) combines the capabilities of erwin Data Catalog with erwin Data Literacy to fuel an automated, real-time, high-quality data pipeline.

Then all enterprise stakeholders – data scientists, data stewards, ETL developers, enterprise architects, business analysts, compliance officers, CDOs and CEOs – can access data relevant to their roles for insights they can put into action.

It creates the fastest path to value, with an automation framework and metadata connectors configured by our team to deliver the data harvesting and preparation features that make capturing enterprise data assets fast and accurate.

Click here to request a free demo of erwin DI.

erwin Data Intelligence

Categories
erwin Expert Blog

Data Governance Makes Data Security Less Scary

Happy Halloween!

Do you know where your data is? What data you have? Who has had access to it?

These can be frightening questions for an organization to answer.

Add to the mix the potential for a data breach followed by non-compliance, reputational damage and financial penalties and a real horror story could unfold.

In fact, we’ve seen some frightening ones play out already:

  1. Google’s record GDPR fine – France’s data privacy enforcement agency hit the tech giant with a $57 million penalty in early 2019 – more than 80 times the steepest fine the U.K.’s Information Commissioner’s Office had levied against both Facebook and Equifax for their data breaches.
  2. In July 2019, British Airways received the biggest GDPR fine to date ($229 million) because the data of more than 500,000 customers was compromised.
  3. Marriot International was fined $123 million, or 1.5 percent of its global annual revenue, because 330 million hotel guests were affected by a breach in 2018.

Now, as Cybersecurity Awareness Month comes to a close – and ghosts and goblins roam the streets – we thought it a good time to resurrect some guidance on how data governance can make data security less scary.

We don’t want you to be caught off guard when it comes to protecting sensitive data and staying compliant with data regulations.

Data Governance Makes Data Security Less Scary

Don’t Scream; You Can Protect Your Sensitive Data

It’s easier to protect sensitive data when you know what it is, where it’s stored and how it needs to be governed.

Data security incidents may be the result of not having a true data governance foundation that makes it possible to understand the context of data – what assets exist and where, the relationship between them and enterprise systems and processes, and how and by what authorized parties data is used.

That knowledge is critical to supporting efforts to keep relevant data secure and private.

Without data governance, organizations don’t have visibility of the full data landscape – linkages, processes, people and so on – to propel more context-sensitive security architectures that can better assure expectations around user and corporate data privacy. In sum, they lack the ability to connect the dots across governance, security and privacy – and to act accordingly.

This addresses these fundamental questions:

  1. What private data do we store and how is it used?
  2. Who has access and permissions to the data?
  3. What data do we have and where is it?

Where Are the Skeletons?

Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Knowing where sensitive data is located and properly governing it with policy rules, impact analysis and lineage views is critical for risk management, data audits and regulatory compliance.

However, when key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed and therefore your organization is at risk.

Sensitive data – at rest or in motion – that exists in various forms across multiple systems must be automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its usage across workflows easily traced.

Thankfully, tools are available to help automate the scanning, detection and tagging of sensitive data by:

  • Monitoring and controlling sensitive data: Better visibility and control across the enterprise to identify data security threats and reduce associated risks
  • Enriching business data elements for sensitive data discovery: Comprehensively defining business data element for PII, PHI and PCI across database systems, cloud and Big Data stores to easily identify sensitive data based on a set of algorithms and data patterns
  • Providing metadata and value-based analysis: Discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data including PII, PHI, PCI and other sensitive information.

No Hocus Pocus

Truly understanding an organization’s data, including its value and quality, requires a harmonized approach embedded in business processes and enterprise architecture.

Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

An ounce of prevention is worth a pound of cure  – from the painstaking process of identifying what happened and why to notifying customers their data and thus their trust in your organization has been compromised.

A well-formed security architecture that is driven by and aligned by data intelligence is your best defense. However, if there is nefarious intent, a hacker will find a way. So being prepared means you can minimize your risk exposure and the damage to your reputation.

Multiple components must be considered to effectively support a data governance, security and privacy trinity. They are:

  1. Data models
  2. Enterprise architecture
  3. Business process models

Creating policies for data handling and accountability and driving culture change so people understand how to properly work with data are two important components of a data governance initiative, as is the technology for proactively managing data assets.

Without the ability to harvest metadata schemas and business terms; analyze data attributes and relationships; impose structure on definitions; and view all data in one place according to each user’s role within the enterprise, businesses will be hard pressed to stay in step with governance standards and best practices around security and privacy.

As a consequence, the private information held within organizations will continue to be at risk.

Organizations suffering data breaches will be deprived of the benefits they had hoped to realize from the money spent on security technologies and the time invested in developing data privacy classifications.

They also may face heavy fines and other financial, not to mention PR, penalties.

Gartner Magic Quadrant Metadata Management

Categories
erwin Expert Blog

Benefits of Data Vault Automation

The benefits of Data Vault automation from the more abstract – like improving data integrity – to the tangible – such as clearly identifiable savings in cost and time.

So Seriously … You Should Automate Your Data Vault

 By Danny Sandwell

Data Vault is a methodology for architecting and managing data warehouses in complex data environments where new data types and structures are constantly introduced.

Without Data Vault, data warehouses are difficult and time consuming to change causing latency issues and slowing time to value. In addition, the queries required to maintain historical integrity are complex to design and run slow causing performance issues and potentially incorrect results because the ability to understand relationships between historical snap shots of data is lacking.

In his blog, Dan Linstedt, the creator of Data Vault methodology, explains that Data Vaults “are extremely scalable, flexible architectures” enabling the business to grow and change without “the agony and pain of high costs, long implementation and test cycles, and long lists of impacts across the enterprise warehouse.”

With a Data Vault, new functional areas typically are added quickly and easily, with changes to existing architecture taking less than half the traditional time with much less impact on the downstream systems, he notes.

Astonishingly, nearly 20 years since the methodology’s creation, most Data Vault design, development and deployment phases are still handled manually. But why?

Traditional manual efforts to define the Data Vault population and create ETL code from scratch can take weeks or even months. The entire process is time consuming slowing down the data pipeline and often riddled with human errors.

On the flipside, automating the development and deployment of design changes and the resulting data movement processing code ensures companies can accelerate dev and deployment in a timely and cost-effective manner.

Benefits of Data Vault Automation

Benefits of Data Vault Automation – A Case Study …

Global Pharma Company Saves Considerable Time and Money with Data Vault Automation

Let’s take a look at a large global pharmaceutical company that switched to Data Vault automation with staggering results.

Like many pharmaceutical companies, it manages a massive data warehouse combining clinical trial, supply chain and other mission-critical data. They had chosen a Data Vault schema for its flexibility in handling change but found creating the hubs and satellite structure incredibly laborious.

They needed to accelerate development, as well as aggregate data from different systems for internal customers to access and share. Additionally, the company needed lineage and traceability for regulatory compliance efforts.

With this ability, they can identify data sources, transformations and usage to safeguard protected health information (PHI) for clinical trials.

After an initial proof of concept, they deployed erwin Data Vault Automation and generated more than 200 tables, jobs and processes with 10 to 12 scripts. The highly schematic structure of the models enabled large portions of the modeling process to be automated, dramatically accelerating Data Vault projects and optimizing data warehouse management.

erwin Data Vault Automation helped this pharma customer automate the complete lifecycle – accelerating development while increasing consistency, simplicity and flexibility – to save considerable time and money.

For this customer the benefits of data vault automation were as such:

  • Saving an estimated 70% of the costs of manual development
  • Generating 95% of the production code with “zero touch,” improving the time to business value and significantly reduced costly re-work associated with error-prone manual processes
  • Increasing data integrity, including for new requirements and use cases regardless of changes to the warehouse structure because legacy source data doesn’t degrade
  • Creating a sustainable approach to Data Vault deployment, ensuring the agile, adaptable and timely delivery of actionable insights to the business in a well-governed facility for regulatory compliance, including full transparency and ease of auditability

Homegrown Tools Never Provide True Data Vault Automation

Many organizations use some form of homegrown tool or standalone applications. However, they don’t integrate with other tools and components of the architecture, they’re expensive, and quite frankly, they make it difficult to derive any meaningful results.

erwin Data Vault Automation centralizes the specification and deployment of Data Vault architectures for better control and visibility of the software development lifecycle. erwin Data Catalog makes it easy to discover, organize, curate and govern data being sourced for and managed in the warehouse.

With this solution, users select data sets to be included in the warehouse and fully automate the loading of Data Vault structures and ETL operations.

erwin Data Vault Smart Connectors eliminate the need for a business analyst and ETL developers to repeat mundane tasks, so they can focus on choosing and using the desired data instead. This saves considerable development time and effort plus delivers a high level of standardization and reuse.

After the Data Vault processes have been automated, the warehouse is well documented with traceability from the marts back to the operational data to speed the investigation of issues and analyze the impact of changes.

Bottom line: if your Data Vault integration is not automated, you’re already behind.

If you’d like to get started with erwin Data Vault Automation or request a quote, you can email consulting@erwin.com.

Data Modeling Drives Business Value

Categories
erwin Expert Blog

Business Process Can Make or Break Data Governance

Data governance isn’t a one-off project with a defined endpoint. It’s an on-going initiative that requires active engagement from executives and business leaders.

Data governance, today, comes back to the ability to understand critical enterprise data within a business context, track its physical existence and lineage, and maximize its value while ensuring quality and security.

Free Data Modeling Best Practice Guide

Historically, little attention has focused on what can literally make or break any data governance initiative — turning it from a launchpad for competitive advantage to a recipe for disaster. Data governance success hinges on business process modeling and enterprise architecture.

To put it even more bluntly, successful data governance* must start with business process modeling and analysis.

*See: Three Steps to Successful & Sustainable Data Governance Implementation

Business Process Data Governance

Passing the Data Governance Ball

For years, data governance was the volleyball passed back and forth over the net between IT and the business, with neither side truly owning it. However, once an organization understands that IT and the business are both responsible for data, it needs to develop a comprehensive, holistic strategy for data governance that is capable of four things:

  1. Reaching every stakeholder in the process
  2. Providing a platform for understanding and governing trusted data assets
  3. Delivering the greatest benefit from data wherever it lives, while minimizing risk
  4. Helping users understand the impact of changes made to a specific data element across the enterprise.

To accomplish this, a modern data governance strategy needs to be interdisciplinary to break down traditional silos. Enterprise architecture is important because it aligns IT and the business, mapping a company’s applications and the associated technologies and data to the business functions and value streams they enable.

Ovum Market Radar: Enterprise Architecture

The business process and analysis component is vital because it defines how the business operates and ensures employees understand and are accountable for carrying out the processes for which they are responsible. Enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Slow Down, Ask Questions

In a rush to implement a data governance methodology and system, organizations can forget that a system must serve a process – and be governed/controlled by one.

To choose the correct system and implement it effectively and efficiently, you must know – in every detail – all the processes it will impact. You need to ask these important questions:

  1. How will it impact them?
  2. Who needs to be involved?
  3. When do they need to be involved?

These questions are the same ones we ask in data governance. They involve impact analysis, ownership and accountability, control and traceability – all of which effectively documented and managed business processes enable.

Data sets are not important in and of themselves. Data sets become important in terms of how they are used, who uses them and what their use is – and all this information is described in the processes that generate, manipulate and use them. So unless we know what those processes are, how can any data governance implementation be complete or successful?

Processes need to be open and shared in a concise, consistent way so all parts of the organization can investigate, ask questions, and then add their feedback and information layers. In other words, processes need to be alive and central to the organization because only then will the use of data and data governance be truly effective.

A Failure to Communicate

Consider this scenario: We’ve perfectly captured our data lineage, so we know what our data sets mean, how they’re connected, and who’s responsible for them – not a simple task but a massive win for any organization. Now a breach occurs. Will any of the above information tell us why it happened? Or where? No! It will tell us what else is affected and who can manage the data layer(s), but unless we find and address the process failure that led to the breach, it is guaranteed to happen again.

By knowing where data is used – the processes that use and manage it – we can quickly, even instantly, identify where a failure occurs. Starting with data lineage (meaning our forensic analysis starts from our data governance system), we can identify the source and destination processes and the associated impacts throughout the organization.

We can know which processes need to change and how. We can anticipate the pending disruptions to our operations and, more to the point, the costs involved in mitigating and/or addressing them.

But knowing all the above requires that our processes – our essential and operational business architecture – be accurately captured and modelled. Instituting data governance without processes is like building a castle on sand.

Rethinking Business Process Modeling and Analysis

Modern organizations need a business process modeling and analysis tool with easy access to all the operational layers across the organization – from high-level business architecture all the way down to data.

Such a system should be flexible, adjustable, easy-to-use and capable of supporting multiple layers simultaneously, allowing users to start in their comfort zones and mature as they work toward their organization’s goals.

The erwin EDGE is one of the most comprehensive software platforms for managing an organization’s data governance and business process initiatives, as well as the whole data architecture. It allows natural, organic growth throughout the organization and the assimilation of data governance and business process management under the same platform provides a unique data governance experience because of its integrated, collaborative approach.

Start your free, cloud-based trial of erwin Business Process and see how some of the world’s largest enterprises have benefited from its centralized repository and integrated, role-based views.

We’d also be happy to show you our data governance software, which includes data cataloging and data literacy capabilities.

Enterprise Architecture Business Process Trial

Categories
erwin Expert Blog

Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Catalog Benefits

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

  • First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
  • Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
  • Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

Value of Data Intelligence IDC Report

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

Categories
erwin Expert Blog

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

In light of recent, high-profile data breaches, it’s past-time we re-examined strategic data governance and its role in managing regulatory requirements.

News broke earlier this week of British Airways being fined 183 million pounds – or $228 million – by the U.K. for alleged violations of the European Union’s General Data Protection Regulation (GDPR). While not the first, it is the largest penalty levied since the GDPR went into effect in May 2018.

Given this, Oppenheimer & Co. cautions:

“European regulators could accelerate the crackdown on GDPR violators, which in turn could accelerate demand for GDPR readiness. Although the CCPA [California Consumer Privacy Act, the U.S. equivalent of GDPR] will not become effective until 2020, we believe that new developments in GDPR enforcement may influence the regulatory framework of the still fluid CCPA.”

With all the advance notice and significant chatter for GDPR/CCPA,  why aren’t organizations more prepared to deal with data regulations?

In a word? Complexity.

The complexity of regulatory requirements in and of themselves is aggravated by the complexity of the business and data landscapes within most enterprises.

So it’s important to understand how to use strategic data governance to manage the complexity of regulatory compliance and other business objectives …

Designing and Operationalizing Regulatory Compliance Strategy

It’s not easy to design and deploy compliance in an environment that’s not well understood and difficult in which to maneuver. First you need to analyze and design your compliance strategy and tactics, and then you need to operationalize them.

Modern, strategic data governance, which involves both IT and the business, enables organizations to plan and document how they will discover and understand their data within context, track its physical existence and lineage, and maximize its security, quality and value. It also helps enterprises put these strategic capabilities into action by:

  • Understanding their business, technology and data architectures and their inter-relationships, aligning them with their goals and defining the people, processes and technologies required to achieve compliance.
  • Creating and automating a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand lineage.
  • Activating their metadata to drive agile data preparation and governance through integrated data glossaries and dictionaries that associate policies to enable stakeholder data literacy.

Strategic Data Governance for GDPR/CCPA

Five Steps to GDPR/CCPA Compliance

With the right technology, GDPR/CCPA compliance can be automated and accelerated in these five steps:

  1. Catalog systems

Harvest, enrich/transform and catalog data from a wide array of sources to enable any stakeholder to see the interrelationships of data assets across the organization.

  1. Govern PII “at rest”

Classify, flag and socialize the use and governance of personally identifiable information regardless of where it is stored.

  1. Govern PII “in motion”

Scan, catalog and map personally identifiable information to understand how it moves inside and outside the organization and how it changes along the way.

  1. Manage policies and rules

Govern business terminology in addition to data policies and rules, depicting relationships to physical data catalogs and the applications that use them with lineage and impact analysis views.

  1. Strengthen data security

Identify regulatory risks and guide the fortification of network and encryption security standards and policies by understanding where all personally identifiable information is stored, processed and used.

How erwin Can Help

erwin is the only software provider with a complete, metadata-driven approach to data governance through our integrated enterprise modeling and data intelligence suites. We help customers overcome their data governance challenges, with risk management and regulatory compliance being primary concerns.

However, the erwin EDGE also delivers an “enterprise data governance experience” in terms of agile innovation and business transformation – from creating new products and services to keeping customers happy to generating more revenue.

Whatever your organization’s key drivers are, a strategic data governance approach – through  business process, enterprise architecture and data modeling combined with data cataloging and data literacy – is key to success in our modern, digital world.

If you’d like to get a handle on handling your data, you can sign up for a free, one-on-one demo of erwin Data Intelligence.

For more information on GDPR/CCPA, we’ve also published a white paper on the Regulatory Rationale for Integrating Data Management and Data Governance.

GDPR White Paper