Categories
erwin Expert Blog Data Modeling

Data Modeling Best Practices for Data-Driven Organizations

As data-driven business becomes increasingly prominent, an understanding of data modeling and data modeling best practices is crucial. This posts outlines just that, and other key questions related to data modeling such as “SQL vs. NoSQL.”

What is Data Modeling?

Data modeling is a process that enables organizations to discover, design, visualize, standardize and deploy high-quality data assets through an intuitive, graphical interface.

Data models provide visualization, create additional metadata and standardize data design across the enterprise.

As the value of data and the way it is used by organizations has changed over the years, so too has data modeling.

In the modern context, data modeling is a function of data governance.

While data modeling has always been the best way to understand complex data sources and automate design standards, modern data modeling goes well beyond these domains to accelerate and ensure the overall success of data governance in any organization.

 

 

As well as keeping the business in compliance with data regulations, data governance – and data modeling – also drive innovation.

Companies that want to advance artificial intelligence (AI) initiatives, for instance, won’t get very far without quality data and well-defined data models.

With the right approach, data modeling promotes greater cohesion and success in organizations’ data strategies.

But what is the right data modeling approach?

Data Modeling Data Goverance

Data Modeling Best Practices

The right approach to data modeling is one in which organizations can make the right data available at the right time to the right people. Otherwise, data-driven initiatives can stall.

Thanks to organizations like Amazon, Netflix and Uber, businesses have changed how they leverage their data and are transforming their business models to innovate – or risk becoming obsolete.

According to a 2018 survey by Tech Pro Research, 70 percent of survey respondents said their companies either have a digital transformation strategy in place or are working on one. And 60% of companies that have undertaken digital transformation have created new business models.

But data-driven business success doesn’t happen by accident. Organizations that adapt that strategy without the necessary processes, platforms and solutions quickly realize that data creates a lot of noise but not necessarily the right insights.

This phenomenon is perhaps best articulated through the lens of the “three Vs” of data: volume, variety and velocity.

Data Modeling Tool

Any2 Data Modeling and Navigating Data Chaos

The three Vs describe the volume (amount), variety (type) and velocity (speed at which it must be processed) of data.

Data’s value grows with context, and such context is found within data. That means there’s an incentive to generate and store higher volumes of data.

Typically, an increase in the volume of data leads to more data sources and types. And higher volumes and varieties of data become increasingly difficult to manage in a way that provides insight.

Without due diligence, the above factors can lead to a chaotic environment for data-driven organizations.

Therefore, the data modeling best practice is one that allows users to view any data from anywhere – a data governance and management best practice we dub “any-squared” (Any2).

Organizations that adopt the Any2 approach can expect greater consistency, clarity and artifact reuse across large-scale data integration, master data management, metadata management, Big Data and business intelligence/analytics initiatives.

SQL or NoSQL? The Advantages of NoSQL Data Modeling

For the most part, databases use “structured query language” (SQL) for maintaining and manipulating data. This structured approach and its proficiency in handling complex queries has led to its widespread use.

But despite the advantages of such structure, its inherent sequential nature (“this, then “this”), means it can be hard to operate holistically and deal with large amounts of data at once.

Additionally, as alluded to earlier, the nature of modern, data-driven business and the three VS means organizations are dealing with increasing amounts of unstructured data.

As such in a modern business context, the three Vs have become somewhat of an Achilles’ heel for SQL databases.

The sheer rate at which businesses collect and store data – as well as the various types of data stored – mean organizations have to adapt and adopt databases that can be maintained with greater agility.

That’s where NoSQL comes in.

Benefits of NoSQL

Despite what many might assume, adopting a NoSQL database doesn’t mean abandoning SQL databases altogether. In fact, NoSQL is actually a contraction of “not only SQL.”

The NoSQL approach builds on the traditional SQL approach, bringing old (but still relevant) ideas in line with modern needs.

NoSQL databases are scalable, promote greater agility, and handle changes to data and the storing of new data more easily.

They’re better at dealing with other non-relational data too. NoSQL supports JavaScript Object Notation (JSON), log messages, XML and unstructured documents.

Data Modeling Is Different for Every Organization

It perhaps goes without saying, but different organizations have different needs.

For some, the legacy approach to databases meets the needs of their current data strategy and maturity level.

For others, the greater flexibility offered by NoSQL databases makes NoSQL databases, and by extension NoSQL data modeling, a necessity.

Some organizations may require an approach to data modeling that promotes collaboration.

Bringing data to the business and making it easy to access and understand increases the value of data assets, providing a return-on-investment and a return-on-opportunity. But neither would be possible without data modeling providing the backbone for metadata management and proper data governance.

Whatever the data modeling need, erwin can help you address it.

erwin DM is available in several versions, including erwin DM NoSQL, with additional options to improve the quality and agility of data capabilities.

And we just announced a new version of erwin DM, with a modern and customizable modeling environment, support for Amazon Redshift; updated support for the latest DB2 releases; time-saving modeling task automation, and more.

New to erwin DM? You can try the new erwin Data Modeler for yourself for free!

erwin Data Modeler Free Trial - Data Modeling

Categories
erwin Expert Blog

Choosing the Right Data Modeling Tool

The need for an effective data modeling tool is more significant than ever.

For decades, data modeling has provided the optimal way to design and deploy new relational databases with high-quality data sources and support application development. But it provides even greater value for modern enterprises where critical data exists in both structured and unstructured formats and lives both on premise and in the cloud.

In today’s hyper-competitive, data-driven business landscape, organizations are awash with data and the applications, databases and schema required to manage it.

For example, an organization may have 300 applications, with 50 different databases and a different schema for each. Additional challenges, such as increasing regulatory pressures – from the General Data Protection Regulation (GDPR) to the Health Insurance Privacy and Portability Act (HIPPA) – and growing stores of unstructured data also underscore the increasing importance of a data modeling tool.

Data modeling, quite simply, describes the process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model. There’s an expression: measure twice, cut once. Data modeling is the upfront “measuring tool” that helps organizations reduce time and avoid guesswork in a low-cost environment.

From a business-outcome perspective, a data modeling tool is used to help organizations:

  • Effectively manage and govern massive volumes of data
  • Consolidate and build applications with hybrid architectures, including traditional, Big Data, cloud and on premise
  • Support expanding regulatory requirements, such as GDPR and the California Consumer Privacy Act (CCPA)
  • Simplify collaboration across key roles and improve information alignment
  • Improve business processes for operational efficiency and compliance
  • Empower employees with self-service access for enterprise data capability, fluency and accountability

Data Modeling Tool

Evaluating a Data Modeling Tool – Key Features

Organizations seeking to invest in a new data modeling tool should consider these four key features.

  1. Ability to visualize business and technical database structures through an integrated, graphical model.

Due to the amount of database platforms available, it’s important that an organization’s data modeling tool supports a sufficient (to your organization) array of platforms. The chosen data modeling tool should be able to read the technical formats of each of these platforms and translate them into highly graphical models rich in metadata. Schema can be deployed from models in an automated fashion and iteratively updated so that new development can take place via model-driven design.

  1. Empowering of end-user BI/analytics by data source discovery, analysis and integration. 

A data modeling tool should give business users confidence in the information they use to make decisions. Such confidence comes from the ability to provide a common, contextual, easily accessible source of data element definitions to ensure they are able to draw upon the correct data; understand what it represents, including where it comes from; and know how it’s connected to other entities.

A data modeling tool can also be used to pull in data sources via self-service BI and analytics dashboards. The data modeling tool should also have the ability to integrate its models into whatever format is required for downstream consumption.

  1. The ability to store business definitions and data-centric business rules in the model along with technical database schemas, procedures and other information.

With business definitions and rules on board, technical implementations can be better aligned with the needs of the organization. Using an advanced design layer architecture, model “layers” can be created with one or more models focused on the business requirements that then can be linked to one or more database implementations. Design-layer metadata can also be connected from conceptual through logical to physical data models.

  1. Rationalize platform inconsistencies and deliver a single source of truth for all enterprise business data.

Many organizations struggle to breakdown data silos and unify data into a single source of truth, due in large part to varying data sources and difficulty managing unstructured data. Being able to model any data from anywhere accounts for this with on-demand modeling for non-relational databases that offer speed, horizontal scalability and other real-time application advantages.

With NoSQL support, model structures from non-relational databases, such as Couchbase and MongoDB can be created automatically. Existing Couchbase and MongoDB data sources can be easily discovered, understood and documented through modeling and visualization. Existing entity-relationship diagrams and SQL databases can be migrated to Couchbase and MongoDB too. Relational schema also will be transformed to query-optimized NoSQL constructs.

Other considerations include the ability to:

  • Compare models and databases.
  • Increase enterprise collaboration.
  • Perform impact analysis.
  • Enable business and IT infrastructure interoperability.

When it comes to data modeling, no one knows it better. For more than 30 years, erwin Data Modeler has been the market leader. It is built on the vision and experience of data modelers worldwide and is the de-facto standard in data model integration.

You can learn more about driving business value and underpinning governance with erwin DM in this free white paper.

Data Modeling Drives Business Value

Categories
erwin Expert Blog

Data Governance Stock Check: Using Data Governance to Take Stock of Your Data Assets

For regulatory compliance (e.g., GDPR) and to ensure peak business performance, organizations often bring consultants on board to help take stock of their data assets. This sort of data governance “stock check” is important but can be arduous without the right approach and technology. That’s where data governance comes in …

While most companies hold the lion’s share of operational data within relational databases, it also can live in many other places and various other formats. Therefore, organizations need the ability to manage any data from anywhere, what we call our “any-squared” (Any2) approach to data governance.

Any2 first requires an understanding of the ‘3Vs’ of data – volume, variety and velocity – especially in context of the data lifecycle, as well as knowing how to leverage the key  capabilities of data governance – data cataloging, data literacy, business process, enterprise architecture and data modeling – that enable data to be leveraged at different stages for optimum security, quality and value.

Following are two examples that illustrate the data governance stock check, including the Any2 approach in action, based on real consulting engagements.

Data Governance Stock Check

Data Governance “Stock Check” Case 1: The Data Broker

This client trades in information. Therefore, the organization needed to catalog the data it acquires from suppliers, ensure its quality, classify it, and then sell it to customers. The company wanted to assemble the data in a data warehouse and then provide controlled access to it.

The first step in helping this client involved taking stock of its existing data. We set up a portal so data assets could be registered via a form with basic questions, and then a central team received the registrations, reviewed and prioritized them. Entitlement attributes also were set up to identify and profile high-priority assets.

A number of best practices and technology solutions were used to establish the data required for managing the registration and classification of data feeds:

1. The underlying metadata is harvested followed by an initial quality check. Then the metadata is classified against a semantic model held in a business glossary.

2. After this classification, a second data quality check is performed based on the best-practice rules associated with the semantic model.

3. Profiled assets are loaded into a historical data store within the warehouse, with data governance tools generating its structure and data movement operations for data loading.

4. We developed a change management program to make all staff aware of the information brokerage portal and the importance of using it. It uses a catalog of data assets, all classified against a semantic model with data quality metrics to easily understand where data assets are located within the data warehouse.

5. Adopting this portal, where data is registered and classified against an ontology, enables the client’s customers to shop for data by asset or by meaning (e.g., “what data do you have on X topic?”) and then drill down through the taxonomy or across an ontology. Next, they raise a request to purchase the desired data.

This consulting engagement and technology implementation increased data accessibility and capitalization. Information is registered within a central portal through an approved workflow, and then customers shop for data either from a list of physical assets or by information content, with purchase requests also going through an approval workflow. This, among other safeguards, ensures data quality.

Benefits of Data Governance

Data Governance “Stock Check” Case 2: Tracking Rogue Data

This client has a geographically-dispersed organization that stored many of its key processes in Microsoft Excel TM spreadsheets. They were planning to move to Office 365TM and were concerned about regulatory compliance, including GDPR mandates.

Knowing that electronic documents are heavily used in key business processes and distributed across the organization, this company needed to replace risky manual processes with centralized, automated systems.

A key part of the consulting engagement was to understand what data assets were in circulation and how they were used by the organization. Then process chains could be prioritized to automate and outline specifications for the system to replace them.

This organization also adopted a central portal that allowed employees to register data assets. The associated change management program raised awareness of data governance across the organization and the importance of data registration.

For each asset, information was captured and reviewed as part of a workflow. Prioritized assets were then chosen for profiling, enabling metadata to be reverse-engineered before being classified against the business glossary.

Additionally, assets that were part of a process chain were gathered and modeled with enterprise architecture (EA) and business process (BP) modeling tools for impact analysis.

High-level requirements for new systems then could be defined again in the EA/BP tools and prioritized on a project list. For the others, decisions could be made on whether they could safely be placed in the cloud and whether macros would be required.

In this case, the adoption of purpose-built data governance solutions helped build an understanding of the data assets in play, including information about their usage and content to aid in decision-making.

This client then had a good handle of the “what” and “where” in terms of sensitive data stored in their systems. They also better understood how this sensitive data was being used and by whom, helping reduce regulatory risks like those associated with GDPR.

In both scenarios, we cataloged data assets and mapped them to a business glossary. It acts as a classification scheme to help govern data and located data, making it both more accessible and valuable. This governance framework reduces risk and protects its most valuable or sensitive data assets.

Focused on producing meaningful business outcomes, the erwin EDGE platform was pivotal in achieving these two clients’ data governance goals – including the infrastructure to undertake a data governance stock check. They used it to create an “enterprise data governance experience” not just for cataloging data and other foundational tasks, but also for a competitive “EDGE” in maximizing the value of their data while reducing data-related risks.

To learn more about the erwin EDGE data governance platform and how it aids in undertaking a data governance stock check, register for our free, 30-minute demonstration here.

Categories
erwin Expert Blog

Google’s Record GDPR Fine: Avoiding This Fate with Data Governance

The General Data Protection Regulation (GDPR) made its first real impact as Google’s record GDPR fine dominated news cycles.

Historically, fines had peaked at six figures with the U.K.’s Information Commissioner’s Office (ICO) fines of 500,000 pounds ($650,000 USD) against both Facebook and Equifax for their data protection breaches.

Experts predicted an uptick in GDPR enforcement in 2019, and Google’s recent record GDPR fine has brought that to fruition. France’s data privacy enforcement agency hit the tech giant with a $57 million penalty – more than 80 times the steepest ICO fine.

If it can happen to Google, no organization is safe. Many in fact still lag in the GDPR compliance department. Cisco’s 2019 Data Privacy Benchmark Study reveals that only 59 percent of organizations are meeting “all or most” of GDPR’s requirements.

So many more GDPR violations are likely to come to light. And even organizations that are currently compliant can’t afford to let their data governance standards slip.

Data Governance for GDPR

Google’s record GDPR fine makes the rationale for better data governance clear enough. However, the Cisco report offers even more insight into the value of achieving and maintaining compliance.

Organizations with GDPR-compliant security measures are not only less likely to suffer a breach (74 percent vs. 89 percent), but the breaches suffered are less costly too, with fewer records affected.

However, applying such GDPR-compliant provisions can’t be done on a whim; organizations must expand their data governance practices to include compliance.

GDPR White Paper

A robust data governance initiative provides a comprehensive picture of an organization’s systems and the units of data contained or used within them. This understanding encompasses not only the original instance of a data unit but also its lineage and how it has been handled and processed across an organization’s ecosystem.

With this information, organizations can apply the relevant degrees of security where necessary, ensuring expansive and efficient protection from external (i.e., breaches) and internal (i.e., mismanaged permissions) data security threats.

Although data security cannot be wholly guaranteed, these measures can help identify and contain breaches to minimize the fallout.

Looking at Google’s Record GDPR Fine as An Opportunity

The tertiary benefits of GDPR compliance include greater agility and innovation and better data discovery and management. So arguably, the “tertiary” benefits of data governance should take center stage.

While once exploited by such innovators as Amazon and Netflix, data optimization and governance is now on everyone’s radar.

So organization’s need another competitive differentiator.

An enterprise data governance experience (EDGE) provides just that.

THE REGULATORY RATIONALE FOR INTEGRATING DATA MANAGEMENT & DATA GOVERNANCE

This approach unifies data management and data governance, ensuring that the data landscape, policies, procedures and metrics stem from a central source of truth so data can be trusted at any point throughout its enterprise journey.

With an EDGE, the Any2 (any data from anywhere) data management philosophy applies – whether structured or unstructured, in the cloud or on premise. An organization’s data preparation (data mapping), enterprise modeling (business, enterprise and data) and data governance practices all draw from a single metadata repository.

In fact, metadata from a multitude of enterprise systems can be harvested and cataloged automatically. And with intelligent data discovery, sensitive data can be tagged and governed automatically as well – think GDPR as well as HIPAA, BCBS and CCPA.

Organizations without an EDGE can still achieve regulatory compliance, but data silos and the associated bottlenecks are unavoidable without integration and automation – not to mention longer timeframes and higher costs.

To get an “edge” on your competition, consider the erwin EDGE platform for greater control over and value from your data assets.

Data preparation/mapping is a great starting point and a key component of the software portfolio. Join us for a weekly demo.

Automate Data Mapping

Categories
erwin Expert Blog

Data Modeling and Data Mapping: Results from Any Data Anywhere

A unified approach to data modeling and data mapping could be the breakthrough that many data-driven organizations need.

In most of the conversations I have with clients, they express the need for a viable solution to model their data, as well as the ability to capture and document the metadata within their environments.

Data modeling is an integral part of any data management initiative. Organizations use data models to tame “data at rest” for business use, governance and technical management of databases of all types.

However, once an organization understands what data it has and how it’s structured via data models, it needs answers to other critical questions: Where did it come from? Did it change along the journey? Where does it go from here?

Data Mapping: Taming “Data in Motion”

Knowing how data moves throughout technical and business data architectures is key for true visibility, context and control of all data assets.

Managing data in motion has been a difficult, time-consuming task that involves mapping source elements to the data model, defining the required transformations, and/or providing the same for downstream targets.

Historically, it either has been outsourced to ETL/ELT developers who often create a siloed, technical infrastructure opaque to the business, or business-friendly mappings have been kept in an assortment of unwieldy spreadsheets difficult to consolidate and reuse much less capable of accommodating new requirements.

What if you could combine data at rest and data in motion to create an efficient, accurate and real-time data pipeline that also includes lineage? Then you can spend your time finding the data you need and using it to produce meaningful business outcomes.

Good news … you can.

erwin Mapping Manager: Connected Data Platform

Automated Data Mapping

Your data modelers can continue to use erwin Data Modeler (DM) as the foundation of your database management system, documenting, enforcing and improving those standards. But instead of relying on data models to disseminate metadata information, you can scan and integrate any data source and present it to all interested parties – automatically.

erwin Mapping Manager (MM) shifts the management of metadata away from data models to a dedicated, automated platform. It can collect metadata from any source, including JSON documents, erwin data models, databases and ERP systems, out of the box.

This functionality underscores our Any2 data approach by collecting any data from anywhere. And erwin MM can schedule data collection and create versions for comparison to clearly identify any changes.

Metadata definitions can be enhanced using extended data properties, and detailed data lineages can be created based on collected metadata. End users can quickly search for information and see specific data in the context of business processes.

To summarize the key features current data modeling customers seem to be most excited about:

  • Easy import of legacy mappings, plus share and reuse mappings and transformations
  • Metadata catalog to automatically harvest any data from anywhere
  • Comprehensive upstream and downstream data lineage
  • Versioning with comparison features
  • Impact analysis

And all of these features support and can be integrated with erwin Data Governance. The end result is knowing what data you have and where it is so you can fuel a fast, high-quality and complete pipeline of any data from anywhere to accomplish your organizational objectives.

Want to learn more about a unified approach to data modeling and data mapping? Join us for our weekly demo to see erwin MM in action for yourself.

erwin Mapping Manager

Categories
erwin Expert Blog

Data Governance Tackles the Top Three Reasons for Bad Data

In modern, data-driven busienss, it’s integral that organizations understand the reasons for bad data and how best to address them. Data has revolutionized how organizations operate, from customer relationships to strategic decision-making and everything in between. And with more emphasis on automation and artificial intelligence, the need for data/digital trust also has risen. Even minor errors in an organization’s data can cause massive headaches because the inaccuracies don’t involve just one corrupt data unit.

Inaccurate or “bad” data also affects relationships to other units of data, making the business context difficult or impossible to determine. For example, are data units tagged according to their sensitivity [i.e., personally identifiable information subject to the General Data Protection Regulation (GDPR)], and is data ownership and lineage discernable (i.e., who has access, where did it originate)?

Relying on inaccurate data will hamper decisions, decrease productivity, and yield suboptimal results. Given these risks, organizations must increase their data’s integrity. But how?

Integrated Data Governance

Modern, data-driven organizations are essentially data production lines. And like physical production lines, their associated systems and processes must run smoothly to produce the desired results. Sound data governance provides the framework to address data quality at its source, ensuring any data recorded and stored is done so correctly, securely and in line with organizational requirements. But it needs to integrate all the data disciplines.

By integrating data governance with enterprise architecture, businesses can define application capabilities and interdependencies within the context of their connection to enterprise strategy to prioritize technology investments so they align with business goals and strategies to produce the desired outcomes. A business process and analysis component enables an organization to clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

And data modeling remains the best way to design and deploy new relational databases with high-quality data sources and support application development. Being able to cost-effectively and efficiently discover, visualize and analyze “any data” from “anywhere” underpins large-scale data integration, master data management, Big Data and business intelligence/analytics with the ability to synthesize, standardize and store data sources from a single design, as well as reuse artifacts across projects.

Let’s look at some of the main reasons for bad data and how data governance helps confront these issues …

Reasons for Bad Data

Reasons for Bad Data: Data Entry

The concept of “garbage in, garbage out” explains the most common cause of inaccurate data: mistakes made at data entry. While this concept is easy to understand, totally eliminating errors isn’t feasible so organizations need standards and systems to limit the extent of their damage.

With the right data governance approach, organizations can ensure the right people aren’t left out of the cataloging process, so the right context is applied. Plus you can ensure critical fields are not left blank, so data is recorded with as much context as possible.

With the business process integration discussed above, you’ll also have a single metadata repository.

All of this ensures sensitive data doesn’t fall through the cracks.

Reasons for Bad Data: Data Migration

Data migration is another key reason for bad data. Modern organizations often juggle a plethora of data systems that process data from an abundance of disparate sources, creating a melting pot for potential issues as data moves through the pipeline, from tool to tool and system to system.

The solution is to introduce a predetermined standard of accuracy through a centralized metadata repository with data governance at the helm. In essence, metadata describes data about data, ensuring that no matter where data is in relation to the pipeline, it still has the necessary context to be deciphered, analyzed and then used strategically.

The potential fallout of using inaccurate data has become even more severe with the GDPR’s implementation. A simple case of tagging and subsequently storing personally identifiable information incorrectly could lead to a serious breach in compliance and significant fines.

Such fines must be considered along with the costs resulting from any PR fallout.

Reasons for Bad Data: Data Integration

The proliferation of data sources, types, and stores increases the challenge of combining data into meaningful, valuable information. While companies are investing heavily in initiatives to increase the amount of data at their disposal, most information workers are spending more time finding the data they need rather than putting it to work, according to Database Trends and Applications (DBTA). erwin is co-sponsoring a DBTA webinar on this topic on July 17. To register, click here.

The need for faster and smarter data integration capabilities is growing. At the same time, to deliver business value, people need information they can trust to act on, so balancing governance is absolutely critical, especially with new regulations.

Organizations often invest heavily in individual software development tools for managing projects, requirements, designs, development, testing, deployment, releases, etc. Tools lacking inter-operability often result in cumbersome manual processes and heavy time investments to synchronize data or processes between these disparate tools.

Data integration combines data from several various sources into a unified view, making it more actionable and valuable to those accessing it.

Getting the Data Governance “EDGE”

The benefits of integrated data governance discussed above won’t be realized if it is isolated within IT with no input from other stakeholders, the day-to-day data users – from sales and customer service to the C-suite. Every data citizen has DG roles and responsibilities to ensure data units have context, meaning they are labeled, cataloged and secured correctly so they can be analyzed and used properly. In other words, the data can be trusted.

Once an organization understands that IT and the business are both responsible for data, it can develop comprehensive, holistic data governance capable of:

  • Reaching every stakeholder in the process
  • Providing a platform for understanding and governing trusted data assets
  • Delivering the greatest benefit from data wherever it lives, while minimizing risk
  • Helping users understand the impact of changes made to a specific data element across the enterprise.

To reduce the risks of and tackle the reasons for bad data and realize larger organizational objectives, organizations must make data governance everyone’s business.

To learn more about the collaborative approach to data governance and how it helps compliance in addition to adding value and reducing costs, get the free e-book here.

Data governance is everyone's business

Categories
erwin Expert Blog

The Role of An Effective Data Governance Initiative in Customer Purchase Decisions

A data governance initiative will maximize the security, quality and value of data, all of which build customer trust.

Without data, modern business would cease to function. Data helps guide decisions about products and services, makes it easier to identify customers, and serves as the foundation for everything businesses do today. The problem for many organizations is that data enters from any number of angles and gets stored in different places by different people and different applications.

Getting the most out of your data requires that you know what you have, where you have it, and that you understand its quality and value to the organization. This is where data governance comes into play. You can’t optimize your data if it’s scattered across different silos and lurking in various applications.

For about 150 years, manufacturers relied on their machinery and its ability to run reliably, properly and safely, to keep customers happy and revenue flowing. A data governance initiative has a similar role today, except its aim is to maximize the security, quality and value of data instead of machinery.

Customers are increasingly concerned about the safety and privacy of their data. According to a survey by Research+Data Insights, 85 percent of respondents worry about technology compromising their personal privacy. In a survey of 2,000 U.S. adults in 2016, researchers from Vanson Bourne found that 76 percent of respondents said they would move away from companies with a high record of data breaches.

For years, buying decisions were driven mainly by cost and quality, says Danny Sandwell, director of product marketing at erwin, Inc. But today’s businesses must consider their reputations in terms of both cost/quality and how well they protect their customers’ data when trying to win business.

Once the reputation is tarnished because of a breach or misuse of data, customers will question those relationships.

Unfortunately for consumers, examples of companies failing to properly govern their data aren’t difficult to find. Look no further than Under Armour, which announced this spring that 150 million accounts at its MyFitnessPal diet and exercise tracking app were breached, and Facebook, where the data of millions of users was harvested by third parties hoping to influence the 2016 presidential election in the United States.

Customers Hate Breaches, But They Love Data

While consumers are quick to report concerns about data privacy, customers also yearn for (and increasingly expect) efficient, personalized and relevant experiences when they interact with businesses. These experiences are, of course, built on data.

In this area, customers and businesses are on the same page. Businesses want to collect data that helps them build the omnichannel, 360-degree customer views that make their customers happy.

These experiences allow businesses to connect with their customers and demonstrate how well they understand them and know their preferences, like and dislikes – essentially taking the personalized service of the neighborhood market to the internet.

The only way to manage that effectively at scale is to properly govern your data.

Delivering personalized service is also valuable to businesses because it helps turn customers into brand ambassadors, and it’s a fact that it’s much easier to build on existing customer relationships than to find new customers.

Here’s the upshot: If your organization is doing data governance right, it’s helping create happy, loyal customers, while at the same time avoiding the bad press and financial penalties associated with poor data practices.

Putting A Data Governance Initiative Into Action

The good news is that 76 percent of respondents to a November 2017 survey we conducted with UBM said understanding and governing the data assets in the organization was either important or very important to the executives in their organization. Nearly half (49 percent) of respondents said that customer trust/satisfaction was driving their data governance initiatives.

Importance of a data governance initiative

What stops organizations from creating an effective data governance initiative? At some businesses, it’s a cultural issue. Both the business and IT sides of the organization play important roles in data, with the IT side storing and protecting it, and the business side consuming data and analyzing it.

For years, however, data governance was the volleyball passed back and forth over the net between IT and the business, with neither side truly owning it. Our study found signs this is changing. More than half (57 percent) of the respondents said both and IT and the business/corporate teams were responsible for data in their organization.

Who's responsible for a data governance initiative

Once an organization understands that IT and the business are both responsible for data, it still needs to develop a comprehensive, holistic strategy for data governance that is capable of:

  • Reaching every stakeholder in the process
  • Providing a platform for understanding and governing trusted data assets
  • Delivering the greatest benefit from data wherever it lives, while minimizing risk
  • Helping users understand the impact of changes made to a specific data element across the enterprise.

To accomplish this, a modern data governance initiative needs to be interdisciplinary. It should include not only data governance, which is ongoing because organizations are constantly changing and transforming, but other disciples as well.

Enterprise architecture is important because it aligns IT and the business, mapping a company’s applications and the associated technologies and data to the business functions they enable.

By integrating data governance with enterprise architecture, businesses can define application capabilities and interdependencies within the context of their connection to enterprise strategy to prioritize technology investments so they align with business goals and strategies to produce the desired outcomes.

A business process and analysis component is also vital to modern data governance. It defines how the business operates and ensures employees understand and are accountable for carrying out the processes for which they are responsible.

Enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Finally, data modeling remains the best way to design and deploy new relational databases with high-quality data sources and support application development.

Being able to cost-effectively and efficiently discover, visualize and analyze “any data” from “anywhere” underpins large-scale data integration, master data management, Big Data and business intelligence/analytics with the ability to synthesize, standardize and store data sources from a single design, as well as reuse artifacts across projects.

Michael Pastore is the Director, Content Services at QuinStreet B2B Tech. This content originally appeared as a sponsored post on http://www.eweek.com/.

Read the previous post on how compliance concerns and the EU’s GDPR are driving businesses to implement data governance.

Determine how effective your current data governance initiative is by taking our DG RediChek.

Take the DG RediChek

Categories
erwin Expert Blog

Defining Data Governance: What Is Data Governance?

Data governance (DG) is one of the fastest growing disciplines, yet when it comes to defining data governance many organizations struggle.

Dataversity says DG is “the practices and processes which help to ensure the formal management of data assets within an organization.” These practices and processes can vary, depending on an organization’s needs. Therefore, when defining data governance for your organization, it’s important to consider the factors driving its adoption.

The General Data Protection Regulation (GDPR) has contributed significantly to data governance’s escalating prominence. In fact, erwin’s 2018 State of Data Governance Report found that 60% of organizations consider regulatory compliance to be their biggest driver of data governance.

Defining data governance: DG Drivers

Other significant drivers include improving customer trust/satisfaction and encouraging better decision-making, but they trail behind regulatory compliance at 49% and 45% respectively. Reputation management (30%), analytics (27%) and Big Data (21%) also are factors.

But data governance’s adoption is of little benefit without understanding how DG should be applied within these contexts. This is arguably one of the issues that’s held data governance back in the past.

With no set definition, and the historical practice of isolating data governance within IT, organizations often have had different ideas of what data governance is, even between departments. With this inter-departmental disconnect, it’s not hard to imagine why data governance has historically left a lot to be desired.

However, with the mandate for DG within GDPR, organizations must work on defining data governance organization-wide to manage its successful implementation, or face GDPR’s penalties.

Defining Data Governance: Desired Outcomes

A great place to start when defining an organization-wide DG initiative is to consider the desired business outcomes. This approach ensures that all parties involved have a common goal.

Past examples of Data Governance 1.0 were mainly concerned with cataloging data to support search and discovery. The nature of this approach, coupled with the fact that DG initiatives were typically siloed within IT departments without input from the wider business, meant the practice often struggled to add value.

Without input from the wider business, the data cataloging process suffered from a lack of context. By neglecting to include the organization’s primary data citizens – those that manage and or leverage data on a day-to-day basis for analysis and insight – organizational data was often plagued by duplications, inconsistencies and poor quality.

The nature of modern data-driven business means that such data citizens are spread throughout the organization. Furthermore, many of the key data citizens (think value-adding approaches to data use such as data-driven marketing) aren’t actively involved with IT departments.

Because of this, Data Governance 1.0 initiatives fizzled out at discouraging frequencies.

This is, of course, problematic for organizations that identify regulatory compliance as a driver of data governance. Considering the nature of data-driven business – with new data being constantly captured, stored and leveraged – meeting compliance standards can’t be viewed as a one-time fix, so data governance can’t be de-prioritized and left to fizzle out.

Even those businesses that manage to maintain the level of input data governance needs on an indefinite basis, will find the Data Governance 1.0 approach wanting. In terms of regulatory compliance, the lack of context associated with data governance 1.0, and the inaccuracies it leads to mean that potentially serious data governance issues could go unfounded and result in repercussions for non-compliance.

We recommend organizations look beyond just data cataloging and compliance as desired outcomes when implementing DG. In the data-driven business landscape, data governance finds its true potential as a value-added initiative.

Organizations that identify the desired business outcome of data governance as a value-added initiative should also consider data governance 1.0’s shortcomings and any organizations that hasn’t identified value-adding as a business outcome, should ask themselves, “why?”

Many of the biggest market disruptors of the 21st Century have been digital savvy start-ups with robust data strategies – think Airbnb, Amazon and Netflix. Without high data governance standards, such companies would not have the level of trust in their data to confidently action such digital-first strategies, making them difficult to manage.

Therefore, in the data-driven business era, organizations should consider a Data Governance 2.0 strategy, with DG becoming an organization-wide, strategic initiative that de-silos the practice from the confines of IT.

This collaborative take on data governance intrinsically involves data’s biggest beneficiaries and users in the governance process, meaning functions like data cataloging benefit from greater context, accuracy and consistency.

It also means that organizations can have greater trust in their data and be more assured of meeting the standards set for regulatory compliance. It means that organizations can better respond to customer needs through more accurate methods of profiling and analysis, improving rates of satisfaction. And it means that organizations are less likely to suffer data breaches and their associated damages.

Defining Data Governance: The Enterprise Data Governance Experience (EDGE)

The EDGE is the erwin approach to Data Governance 2.0, empowering an organization to:

  • Manage any data, anywhere (Any2)
  • Instil a culture of collaboration and organizational empowerment
  • Introduce an integrated ecosystem for data management that draws from one central repository and ensures data (including real-time changes) is consistent throughout the organization
  • Have visibility across domains by breaking down silos between business and IT and introducing a common data vocabulary
  • Have regulatory peace of mind through mitigation of a wide range of risks, from GDPR to cybersecurity. 

To learn more about implementing data governance, click here.

Take the DG RediChek

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – The Logical Data Warehouse

There’s debate surrounding the term “logical data warehouse.” Some argue that it is a new concept, while others argue that all well-designed data warehouses are logical and so the term is meaningless. This is a key point I’ll address in this post.

I’ll also discuss data warehousing that incorporates some of the technologies and approaches we’ve covered in previous installments of this series (1, 2, 3, 4, 5, 6 ) but with a different architecture that embraces “any data, anywhere.”

So what is a “logical data warehouse?”

Bill Inmon and Barry Devlin provide two oft-quoted definitions of a “data warehouse.” Inmon says “a data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.

Devlin stripped down the definition, saying “a data warehouse is simply a single, complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in a business context.

Although these definitions are widely adopted, there is some disparity in their interpretation. Some insist that such definitions imply a single repository, and thus a limitation.

On the other hand, some argue that a “collection of data” or a “single, complete and consistent store” could just as easily be virtual and therefore not inherently singular. They argue that the language is down to most early implementations only being single, physical data stores due to technology limitations.

Mark Beyer of Gartner is a prominent name in the former, singular repository camp. In 2011, he saidthe logical data warehouse (LDW) is a new data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategy,” and the work has since been widely circulated.

So proponents of the “logical data warehouse,” as defined by Mark Beyer, don’t disagree with the value of an integrated collection of data. They just feel that if said collection is managed and accessed as something other than a monolithic, single physical database, then it is something different and should be called a “logical data warehouse” instead of just a “data warehouse.”

As the author of a series of posts about a jargon-filled [data] world, who am I to argue with the introduction of more new jargon?

In fact, I’d be remiss if I didn’t point out that the notion of a logical data warehouse has numerous jargon-rich enabling technologies and synonyms, including Service Oriented Architecture (SOA), Enterprise Services Bus (ESB), Virtualization Layer, and Data Fabric, though the latter term also has other unrelated uses.

So the essence of a logical data warehouse approach is to integrate diverse data assets into a single, integrated virtual data warehouse, without the traditional batch ETL or ELT processes required to copy data into a single, integrated physical data warehouse.

One of the key attractions to proponents of the approach is the avoidance of recurring batch extraction, transformation and loading activities that, typically argued, cause delays and lead to decisions being made based on data that is not as current as it could be.

The idea is to use caching and other technologies to create a virtualization layer that enables information consumers to ask a question as though they were interrogating a single, integrated physical data warehouse and to have the virtualization layer (which together with the data resident in some combination of underlying application systems, IoT data streams, external data sources, blockchains, data lakes, data warehouses and data marts, constitutes the logical data warehouse) respond correctly with more current data and without having to extract, transform and load data into a centralized physical store.

Logical Data Warehouse

While the moniker may be new, the idea of bringing the query to the data set(s) and then assembling an integrated result is not a new idea. There have been numerous successful implementations in the past, though they often required custom coding and rigorous governance to ensure response times and logical correctness.

Some would argue that such previous implementations were also not at the leading edge of data warehousing in terms of data volume or scope.

What is generating renewed interest in this approach is the continued frustration on the part of numerous stakeholders with delays attributed to ETL/ELT in traditional data warehouse implementations.

When you compound this with the often high costs of large (physical) data warehouse implementations, it’s not hard to see why. Especially if it’s based on MPP hardware, juxtaposed against the promise of some new solutions from vendors like Denodo and Cisco that capitalize on the increasing prevalence of new technologies, such as the cloud and in-memory.

One topic that quickly becomes clear as one learns more about the various logical data warehouse vendor solutions is that metadata is a very important component. However, this shouldn’t be a surprise, as the objective is still to present a single, integrated view to the information consumer.

So a well-architected, comprehensive and easily understood data model is as important as ever, both to ensure that information consumers can easily access properly integrated data and because the virtualization technology itself must depend on a properly architected data model to accurately transform an information request into queries to multiple data sources and then correctly synthesize the result sets into an appropriate response to the original information request.

We hope you’ve enjoyed our series, Data Modeling in a Jargon-filled World, learning something from this post or one of the previous posts in the series (1, 2, 3, 4, 5, 6 ).

The underlying theme, as you’ve probably deduced, is that data modeling remains critical in a world in which the volume, variety and velocity of data continue to grow while information consumers find it difficult to synthesize the right data in the right context to help them draw the right conclusions.

We encourage you to read other blog posts on this site by erwin staff members and other guest bloggers and to participate in ongoing events and webinars.

If you’d like to know more about accelerating your data modeling efforts for specific industries, while reducing risk and benefiting from best practices and lessons learned by other similar organizations in your industry, please visit erwin partner ADRM Software.

Data-Driven Business Transformation

Categories
erwin Expert Blog

erwin Brings NoSQL into the Enterprise Data Modeling and Governance Fold

“NoSQL is not an option — it has become a necessity to support next-generation applications.”