Categories
erwin Expert Blog

Top 5 Data Catalog Benefits

A data catalog benefits organizations in a myriad of ways. With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects.

Data cataloging helps curate internal and external datasets for a range of content authors. Gartner says this doubles business benefits and ensures effective management and monetization of data assets in the long-term if linked to broader data governance, data quality and metadata management initiatives.

But even with this in mind, the importance of data cataloging is growing. In the regulated data world (GDPR, HIPAA etc) organizations need to have a good understanding of their data lineage – and the data catalog benefits to data lineage are substantial.

Data lineage is a core operational business component of data governance technology architecture, encompassing the processes and technology to provide full-spectrum visibility into the ways data flows across an enterprise.

There are a number of different approaches to data lineage. Here, I outline the common approach, and the approach incorporating data cataloging – including the top 5 data catalog benefits for understanding your organization’s data lineage.

Data Catalog Benefits

Data Lineage – The Common Approach

The most common approach for assembling a collection of data lineage mappings traces data flows in a reverse manner. The process begins with the target or data end-point, and then traversing the processes, applications, and ETL tasks in reverse from the target.

For example, to determine the mappings for the data pipelines populating a data warehouse, a data lineage tool might begin with the data warehouse and examine the ETL tasks that immediately proceed the loading of the data into the target warehouse.

The data sources that feed the ETL process are added to a “task list,” and the process is repeated for each of those sources. At each stage, the discovered pieces of lineage are documented. At the end of the sequence, the process will have reverse-mapped the pipelines for populating that warehouse.

While this approach does produce a collection of data lineage maps for selected target systems, there are some drawbacks.

  • First, this approach focuses only on assembling the data pipelines populating the selected target system but does not necessarily provide a comprehensive view of all the information flows and how they interact.
  • Second, this process produces the information that can be used for a static view of the data pipelines, but the process needs to be executed on a regular basis to account for changes to the environment or data sources.
  • Third, and probably most important, this process produces a technical view of the information flow, but it does not necessarily provide any deeper insights into the semantic lineage, or how the data assets map to the corresponding business usage models.

A Data Catalog Offers an Alternate Data Lineage Approach

An alternate approach to data lineage combines data discovery and the use of a data catalog that captures data asset metadata with a data mapping framework that documents connections between the data assets.

This data catalog approach also takes advantage of automation, but in a different way: using platform-specific data connectors, the tool scans the environment for storing each data asset and imports data asset metadata into the data catalog.

When data asset structures are similar, the tool can compare data element domains and value sets, and automatically create the data mapping.

In turn, the data catalog approach performs data discovery using the same data connectors to parse the code involved in data movement, such as major ETL environments and procedural code – basically any executable task that moves data.

The information collected through this process is reverse engineered to create mappings from source data sets to target data sets based on what was discovered.

For example, you can map the databases used for transaction processing, determine that subsets of the transaction processing database are extracted and moved to a staging area, and then parse the ETL code to infer the mappings.

These direct mappings also are documented in the data catalog. In cases where the mappings are not obvious, a tool can help a data steward manually map data assets into the catalog.

The result is a data catalog that incorporates the structural and semantic metadata associated with each data asset as well as the direct mappings for how that data set is populated.

Learn more about data cataloging.

Value of Data Intelligence IDC Report

And this is a powerful representative paradigm – instead of capturing a static view of specific data pipelines, it allows a data consumer to request a dynamically-assembled lineage from the documented mappings.

By interrogating the catalog, the current view of any specific data lineage can be rendered on the fly that shows all points of the data lineage: the origination points, the processing stages, the sequences of transformations, and the final destination.

Materializing the “current active lineage” dynamically reduces the risk of having an older version of the lineage that is no longer relevant or correct. When new information is added to the data catalog (such as a newly-added data source of a modification to the ETL code), dynamically-generated views of the lineage will be kept up-to-date automatically.

Top 5 Data Catalog Benefits for Understanding Data Lineage

A data catalog benefits data lineage in the following five distinct ways:

1. Accessibility

The data catalog approach allows the data consumer to query the tool to materialize specific data lineage mappings on demand.

2. Currency

The data lineage is rendered from the most current data in the data catalog.

3. Breadth

As the number of data assets documented in the data catalog increases, the scope of the materializable lineage expands accordingly. With all corporate data assets cataloged, any (or all!) data lineage mappings can be produced on demand.

4. Maintainability and Sustainability

Since the data lineage mappings are not managed as distinct artifacts, there are no additional requirements for maintenance. As long as the data catalog is kept up to date, the data lineage mappings can be materialized.

5. Semantic Visibility

In addition to visualizing the physical movement of data across the enterprise, the data catalog approach allows the data steward to associate business glossary terms, data element definitions, data models, and other semantic details with the different mappings. Additional visualization methods can demonstrate where business terms are used, how they are mapped to different data elements in different systems, and the relationships among these different usage points.

One can impose additional data governance controls with project management oversight, which allows you to designate data lineage mappings in terms of the project life cycle (such as development, test or production).

Aside from these data catalog benefits, this approach allows you to reduce the amount of manual effort for accumulating the information for data lineage and continually reviewing the data landscape to maintain consistency, thus providing a greater return on investment for your data intelligence budget.

Learn more about data cataloging.

Categories
erwin Expert Blog

Choosing the Right Data Modeling Tool

The need for an effective data modeling tool is more significant than ever.

For decades, data modeling has provided the optimal way to design and deploy new relational databases with high-quality data sources and support application development. But it provides even greater value for modern enterprises where critical data exists in both structured and unstructured formats and lives both on premise and in the cloud.

In today’s hyper-competitive, data-driven business landscape, organizations are awash with data and the applications, databases and schema required to manage it.

For example, an organization may have 300 applications, with 50 different databases and a different schema for each. Additional challenges, such as increasing regulatory pressures – from the General Data Protection Regulation (GDPR) to the Health Insurance Privacy and Portability Act (HIPPA) – and growing stores of unstructured data also underscore the increasing importance of a data modeling tool.

Data modeling, quite simply, describes the process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model. There’s an expression: measure twice, cut once. Data modeling is the upfront “measuring tool” that helps organizations reduce time and avoid guesswork in a low-cost environment.

From a business-outcome perspective, a data modeling tool is used to help organizations:

  • Effectively manage and govern massive volumes of data
  • Consolidate and build applications with hybrid architectures, including traditional, Big Data, cloud and on premise
  • Support expanding regulatory requirements, such as GDPR and the California Consumer Privacy Act (CCPA)
  • Simplify collaboration across key roles and improve information alignment
  • Improve business processes for operational efficiency and compliance
  • Empower employees with self-service access for enterprise data capability, fluency and accountability

Data Modeling Tool

Evaluating a Data Modeling Tool – Key Features

Organizations seeking to invest in a new data modeling tool should consider these four key features.

  1. Ability to visualize business and technical database structures through an integrated, graphical model.

Due to the amount of database platforms available, it’s important that an organization’s data modeling tool supports a sufficient (to your organization) array of platforms. The chosen data modeling tool should be able to read the technical formats of each of these platforms and translate them into highly graphical models rich in metadata. Schema can be deployed from models in an automated fashion and iteratively updated so that new development can take place via model-driven design.

  1. Empowering of end-user BI/analytics by data source discovery, analysis and integration. 

A data modeling tool should give business users confidence in the information they use to make decisions. Such confidence comes from the ability to provide a common, contextual, easily accessible source of data element definitions to ensure they are able to draw upon the correct data; understand what it represents, including where it comes from; and know how it’s connected to other entities.

A data modeling tool can also be used to pull in data sources via self-service BI and analytics dashboards. The data modeling tool should also have the ability to integrate its models into whatever format is required for downstream consumption.

  1. The ability to store business definitions and data-centric business rules in the model along with technical database schemas, procedures and other information.

With business definitions and rules on board, technical implementations can be better aligned with the needs of the organization. Using an advanced design layer architecture, model “layers” can be created with one or more models focused on the business requirements that then can be linked to one or more database implementations. Design-layer metadata can also be connected from conceptual through logical to physical data models.

  1. Rationalize platform inconsistencies and deliver a single source of truth for all enterprise business data.

Many organizations struggle to breakdown data silos and unify data into a single source of truth, due in large part to varying data sources and difficulty managing unstructured data. Being able to model any data from anywhere accounts for this with on-demand modeling for non-relational databases that offer speed, horizontal scalability and other real-time application advantages.

With NoSQL support, model structures from non-relational databases, such as Couchbase and MongoDB can be created automatically. Existing Couchbase and MongoDB data sources can be easily discovered, understood and documented through modeling and visualization. Existing entity-relationship diagrams and SQL databases can be migrated to Couchbase and MongoDB too. Relational schema also will be transformed to query-optimized NoSQL constructs.

Other considerations include the ability to:

  • Compare models and databases.
  • Increase enterprise collaboration.
  • Perform impact analysis.
  • Enable business and IT infrastructure interoperability.

When it comes to data modeling, no one knows it better. For more than 30 years, erwin Data Modeler has been the market leader. It is built on the vision and experience of data modelers worldwide and is the de-facto standard in data model integration.

You can learn more about driving business value and underpinning governance with erwin DM in this free white paper.

Data Modeling Drives Business Value

Categories
erwin Expert Blog

Keeping Up with New Data Protection Regulations

Keeping up with new data protection regulations can be difficult, and the latest – the General Data Protection Regulation (GDPR) – isn’t the only new data protection regulation organizations should be aware of.

California recently passed a law that gives residents the right to control the data companies collect about them. Some suggest the California Consumer Privacy Act (CCPA), which takes effect January 1, 2020, sets a precedent other states will follow by empowering consumers to set limits on how companies can use their personal information.

In fact, organizations should expect increasing pressure on lawmakers to introduce new data protection regulations. A number of high-profile data breaches and scandals have increased public awareness of the issue.

Facebook was in the news again last week for another major problem around the transparency of its user data, and the tech-giant also is reportedly facing 10 GDPR investigations in Ireland – along with Apple, LinkedIn and Twitter.

Some industries, such as healthcare and financial services, have been subject to stringent data regulations for years: GDPR now joins the Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS) and the Basel Committee on Banking Supervision (BCBS).

Due to these pre-existing regulations, organizations operating within these sectors, as well as insurance, had some of the GDPR compliance bases covered in advance.

Other industries had their own levels of preparedness, based on the nature of their operations. For example, many retailers have robust, data-driven e-commerce operations that are international. Such businesses are bound to comply with varying local standards, especially when dealing with personally identifiable information (PII).

Smaller, more brick-and-mortar-focussed retailers may have had to start from scratch.

But starting position aside, every data-driven organization should strive for a better standard of data management — and not just for compliance sake. After all, organizations are now realizing that data is one of their most valuable assets.

New Data Protection Regulations – Always Be Prepared

When it comes to new data protection regulations in the face of constant data-driven change, it’s a matter of when, not if.

As they say, the best defense is a good offense. Fortunately, whenever the time comes, the first point of call will always be data governance, so organizations can prepare.

Effective compliance with new data protection regulations requires a robust understanding of the “what, where and who” in terms of data and the stakeholders with access to it (i.e., employees).

The Regulatory Rationale for Integrating Data Management & Data Governance

This is also true for existing data regulations. Compliance is an on-going requirement, so efforts to become compliant should not be treated as static events.

Less than four months before GDPR came into effect, only 6 percent of enterprises claimed they were prepared for it. Many of these organizations will recall a number of stressful weeks – or even months – tidying up their databases and their data management processes and policies.

This time and money was spent reactionarily, at the behest of proactive efforts to grow the business.

The implementation and subsequent observation of a strong data governance initiative ensures organizations won’t be put on the spot going forward. Should an audit come up, current projects aren’t suddenly derailed as they reenact pre-GDPR panic.

New Data Regulations

Data Governance: The Foundation for Compliance

The first step to compliance with new – or old – data protection regulations is data governance.

A robust and effective data governance initiative ensures an organization understands where security should be focussed.

By adopting a data governance platform that enables you to automatically tag sensitive data and track its lineage, you can ensure nothing falls through the cracks.

Your chosen data governance solution should enable you to automate the scanning, detection and tagging of sensitive data by:

  • Monitoring and controlling sensitive data – Gain better visibility and control across the enterprise to identify data security threats and reduce associated risks.
  • Enriching business data elements for sensitive data discovery – By leveraging a comprehensive mechanism to define business data elements for PII, PHI and PCI across database systems, cloud and Big Data stores, you can easily identify sensitive data based on a set of algorithms and data patterns.
  • Providing metadata and value-based analysis – Simplify the discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data, including PII, PHI and PCI.

With these precautionary steps, organizations are primed to respond if a data breach occurs. Having a well governed data ecosystem with data lineage capabilities means issues can be quickly identified.

Additionally, if any follow-up is necessary –  such as with GDPR’s data breach reporting time requirements – it can be handles swiftly and in accordance with regulations.

It’s also important to understand that the benefits of data governance don’t stop with regulatory compliance.

A better understanding of what data you have, where it’s stored and the history of its use and access isn’t only beneficial in fending off non-compliance repercussions. In fact, such an understanding is arguably better put to use proactively.

Data governance improves data quality standards, it enables better decision-making and ensures businesses can have more confidence in the data informing those decisions.

The same mechanisms that protect data by controlling its access also can be leveraged to make data more easily discoverable to approved parties – improving operational efficiency.

All in all, the cumulative result of data governance’s influence on data-driven businesses both drives revenue (through greater efficiency) and reduces costs (less errors, false starts, etc.).

To learn more about data governance and the regulatory rationale for its implementation, get our free guide here.

DG RediChek

Categories
erwin Expert Blog

Google’s Record GDPR Fine: Avoiding This Fate with Data Governance

The General Data Protection Regulation (GDPR) made its first real impact as Google’s record GDPR fine dominated news cycles.

Historically, fines had peaked at six figures with the U.K.’s Information Commissioner’s Office (ICO) fines of 500,000 pounds ($650,000 USD) against both Facebook and Equifax for their data protection breaches.

Experts predicted an uptick in GDPR enforcement in 2019, and Google’s recent record GDPR fine has brought that to fruition. France’s data privacy enforcement agency hit the tech giant with a $57 million penalty – more than 80 times the steepest ICO fine.

If it can happen to Google, no organization is safe. Many in fact still lag in the GDPR compliance department. Cisco’s 2019 Data Privacy Benchmark Study reveals that only 59 percent of organizations are meeting “all or most” of GDPR’s requirements.

So many more GDPR violations are likely to come to light. And even organizations that are currently compliant can’t afford to let their data governance standards slip.

Data Governance for GDPR

Google’s record GDPR fine makes the rationale for better data governance clear enough. However, the Cisco report offers even more insight into the value of achieving and maintaining compliance.

Organizations with GDPR-compliant security measures are not only less likely to suffer a breach (74 percent vs. 89 percent), but the breaches suffered are less costly too, with fewer records affected.

However, applying such GDPR-compliant provisions can’t be done on a whim; organizations must expand their data governance practices to include compliance.

GDPR White Paper

A robust data governance initiative provides a comprehensive picture of an organization’s systems and the units of data contained or used within them. This understanding encompasses not only the original instance of a data unit but also its lineage and how it has been handled and processed across an organization’s ecosystem.

With this information, organizations can apply the relevant degrees of security where necessary, ensuring expansive and efficient protection from external (i.e., breaches) and internal (i.e., mismanaged permissions) data security threats.

Although data security cannot be wholly guaranteed, these measures can help identify and contain breaches to minimize the fallout.

Looking at Google’s Record GDPR Fine as An Opportunity

The tertiary benefits of GDPR compliance include greater agility and innovation and better data discovery and management. So arguably, the “tertiary” benefits of data governance should take center stage.

While once exploited by such innovators as Amazon and Netflix, data optimization and governance is now on everyone’s radar.

So organization’s need another competitive differentiator.

An enterprise data governance experience (EDGE) provides just that.

THE REGULATORY RATIONALE FOR INTEGRATING DATA MANAGEMENT & DATA GOVERNANCE

This approach unifies data management and data governance, ensuring that the data landscape, policies, procedures and metrics stem from a central source of truth so data can be trusted at any point throughout its enterprise journey.

With an EDGE, the Any2 (any data from anywhere) data management philosophy applies – whether structured or unstructured, in the cloud or on premise. An organization’s data preparation (data mapping), enterprise modeling (business, enterprise and data) and data governance practices all draw from a single metadata repository.

In fact, metadata from a multitude of enterprise systems can be harvested and cataloged automatically. And with intelligent data discovery, sensitive data can be tagged and governed automatically as well – think GDPR as well as HIPAA, BCBS and CCPA.

Organizations without an EDGE can still achieve regulatory compliance, but data silos and the associated bottlenecks are unavoidable without integration and automation – not to mention longer timeframes and higher costs.

To get an “edge” on your competition, consider the erwin EDGE platform for greater control over and value from your data assets.

Data preparation/mapping is a great starting point and a key component of the software portfolio. Join us for a weekly demo.

Automate Data Mapping

Categories
erwin Expert Blog

Who to Follow in 2019 for Big Data, Data Governance and GDPR Advice

Experts are predicting a surge in GDPR enforcement in 2019 as regulators begin to crackdown on organizations still lagging behind compliance standards.

With this in mind, the erwin team has compiled a list of the most valuable data governance, GDPR and Big data blogs and news sources for data management and data governance best practice advice from around the web.

From regulatory compliance (GDPR, HIPPA, etc.,) to driving revenue through proactive data governance initiatives and Big Data strategies, these accounts cover it all.

Top 7 Data Governance, GDPR and Big Data Blogs and News Sources from Around the Web

Honorable Mention: @BigDataBatman

The Twitter account data professionals deserve, but probably not the one you need right now.

This quirky Twitter bot trawls the web for big data tweets and news stories, and substitutes “big data” for “Batman”. If data is the Bane of your existence, this account will serve up some light relief.

 

1. The erwin Expert Network

Twitter| LinkedIn | Facebook | Blog

For anything data management and data governance related, the erwin Experts should be your first point of call.

The team behind the most connected data management and data governance solutions on the market regularly share best practice advice in guide, whitepaper, blog and social media update form.

 

2. GDPR For Online Entrepreneurs (UK, US, CA, AU)

This community-driven Facebook group is a consistent source of insightful information for data-driven businesses.

In addition to sharing data and GDPR-focused articles from around the web, GDPR For Online Entrepreneurs encourages members to seek GDPR advice from its community’s members.

 

3. GDPR General Data Protection Regulation Technology

LinkedIn also has its own community-driven GDPR advice groups. The most active of these is the, “GDPR General Data Protection Regulation Technology”.

The group aims to be an information hub for anybody responsible for company data, including company leaders, GDPR specialists and consultants, business analysts and process experts. 

Data governance, GDPR, big data blogs

 

 

4. DBTA

Twitter | LinkedIn | Facebook

Database Trends and Applications is a publication that should be on every data professionals’ radar. Alongside news and editorials covering big data, database management, data integrations and more, DBTA is also a great source of advice for professionals looking to research buying options.

Their yearly “Trend-Setting Products in Data and Information Management” list and Product Spotlight featurettes can help data professionals put together proposals, and help give decision makers piece of mind.

 

5. Dataversity

Twitter | LinkedIn

Dataversity is another excellent source for data management and data governance related best practices and think-pieces.

In addition to hosting and sponsoring a number of live events throughout the year, the platform is a regular provider of data leadership webinars and training with a library full of webinars available on-demand.

 

6. WIRED

Twitter | LinkedIn | Facebook

Wired is a physical and digital tech magazine that covers all the bases.

For data professionals that are after the latest news and editorials pertaining to data security and a little extra – from innovations in transport to the applications of Blockchain – Wired is a great publication to keep on your radar.

 

7. TDAN

Twitter | LinkedIn | Facebook

For those looking for something a little more focused, check out TDAN. A subsidiary of Dataversity, TDAN regularly publish new editorial content covering data governance, data management, data modeling and Big Data.

Categories
erwin Expert Blog

Healthcare Data Governance: What’s the Prognosis?

Healthcare data governance has far more applications than just meeting compliance standards. Healthcare costs are always a topic of discussion, as is the state of health insurance and policies like the Affordable Care Act (ACA).

Costs and policy are among a number of significant trends called out in the executive summary of the Stanford Medicine 2017 Health Trend Report. But the summary also included a common thread that connects them all:

“Behind these trends is one fundamental force driving health care transformation: the power of data.”

Indeed, data is essential to healthcare in areas like:

  • Medical research – Collecting and reviewing increasingly large data sets has the potential to introduce new levels of speed and efficiency into what has been an often slow and laborious process.
  • Preventative care – Wearable devices help consumers track exercise, diet, weight and nutrition, as well as clinical applications like genetic sequencing.
  • The patient experience – Healthcare is not immune to issues of customer service and the need to provide timely, accurate responses to questions or complaints.
  • Disease and outbreak prevention – Data and analysis can help spot patterns, so clinicians get ahead of big problems before they become epidemics.

Data Management and Data Governance

Data Vulnerabilities in Healthcare

Data is valuable to the healthcare industry. But it also carries risks because of the volume and velocity with which it is collected and stored. Foremost among these are regulatory compliance and security.

Because healthcare data is so sensitive, the ways in which it is secured and shared are watched closely by regulators. HIPAA (Health Information Portability and Accountability Act) is probably the most recognized regulation governing data in healthcare, but it is not the only one.

In addition to privacy and security policies, other challenges that prevent the healthcare industry from maximizing the ways it puts data to work include:

  • High costs, which are further exacerbated by expected lower commercial health insurance payouts and higher payouts from low-margin services like Medicare, as well as rising labor costs. Data and analytics can potentially help hospitals better plan for these challenges, but thin margins might prevent the investments necessary in this area.
  • Electronic medical records, which the Stanford report cited as a cause of frustration that negatively impacts relationships between patients and healthcare providers.
  • Silos of data, which often are caused by mergers and acquisitions within the industry, but that are also emblematic of the number of platforms and applications used by providers, insurers and other players in the healthcare market.

Early 2018 saw a number of mergers and acquisitions in the healthcare industry, including hospital systems in New England, as well as in the Philadelphia area of the United States. The $69 billion dollar merger of Aetna and CVS also was approved by shareholders in early 2018, making it one of the most significant deals of the past decade.

Each merger and acquisition requires careful and difficult decisions concerning the application portfolio and data of each organization. Redundancies need to identified, as do gaps, so the patient experience and care continues without serious disruption.

Truly understanding healthcare data requires a holistic approach to data governance that is embedded in business processes and enterprise architecture. When implemented properly, data governance initiatives help healthcare organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

Healthcare Data Governance

Improving Healthcare Analytics and Patient Care with Healthcare Data Governance

Data governance plays a vital role in compliance because data is easier to protect when you know where it is stored, what it is, and how it needs to be governed. According to a 2017 survey by erwin, Inc. and UBM, 60 percent of organizations said compliance was driving their data governance initiatives.

With a solid understand of their data and the ways it is collected and consumed throughout their organizations, healthcare players are better positioned to reap the benefits of analytics. As Deloitte pointed out in a perspectives piece about healthcare analytics, the shift to value-based care makes analytics within the industry more essential than ever.

With increasing pressure on margins, the combination of data governance and analytics is critical to creating value and finding efficiencies. Investments in analytics are only as valuable as the data they are fed, however.

Poor decisions based on poor data will lead to bad outcomes, but they also diminish trust in the analytics platform, which will ruin the ROI as it is used less and less.

Most important, healthcare data governance plays a critical role in helping improve patient outcomes and value. In healthcare, the ability to make timely, accurate decisions based on quality data can be a matter of life or death.

In areas like preventative care and the patient experience, good data can mean better advice to patients, more accurate programs for follow-up care, and the ability to meet their medical and lifestyle needs within a healthcare facility or beyond.

As healthcare organizations look to improve efficiencies, lower costs and provide quality, value-based care, healthcare data governance will be essential to better outcomes for patients, providers and the industry at large.

For more information, please download our latest whitepaper, The Regulatory Rationale for Integrating Data Management and Data Governance.

If you’re interested in healthcare data governance, or evaluating new data governance technologies for another industry, you can schedule a demo of erwin’s data mapping and data governance solutions.

Data Mapping Demo CTA

Michael Pastore is the Director, Content Services at QuinStreet B2B Tech.