Categories
erwin Expert Blog Data Governance Data Intelligence

Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

Data Lineage

Categories
erwin Expert Blog

Top 10 Reasons to Automate Data Mapping and Data Preparation

Data preparation is notorious for being the most time-consuming area of data management. It’s also expensive.

“Surveys show the vast majority of time is spent on this repetitive task, with some estimates showing it takes up as much as 80% of a data professional’s time,” according to Information Week. And a Trifacta study notes that overreliance on IT resources for data preparation costs organizations billions.

The power of collecting your data can come in a variety of forms, but most often in IT shops around the world, it comes in a spreadsheet, or rather a collection of spreadsheets often numbering in the hundreds or thousands.

Most organizations, especially those competing in the digital economy, don’t have enough time or money for data management using manual processes. And outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too.

Automate Data Mapping

Taking the Time and Pain Out of Data Preparation: 10 Reasons to Automate Data Preparation/Data Mapping

  1. Governance and Infrastructure

Data governance and a strong IT infrastructure are critical in the valuation, creation, storage, use, archival and deletion of data. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there is an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A design platform that allows for insights like data lineage, impact analysis, full history capture, and other data management features can provide a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault, or a traditional warehouse.

  1. Eliminating Human Error

In the traditional data management organization, excel spreadsheets are used to manage the incoming data design, or what is known as the “pre-ETL” mapping documentation – this does not lend to any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to creating accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.  

  1. Completeness

The ability to scan and import from a broad range of sources and formats, as well as automated change tracking, means that you will always be able to import your data from wherever it lives and track all of the changes to that data over time.

  1. Adaptability

Centralized design, immediate lineage and impact analysis, and change activity logging means that you will always have the answer readily available, or a few clicks away.  Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

  1. Accuracy

Out-of-the-box capabilities to map your data from source to report, make reconciliation and validation a snap, with auditability and traceability built-in.  Build a full array of validation rules that can be cross checked with the design mappings in a centralized repository.

  1. Timeliness

The ability to be agile and reactive is important – being good at being reactive doesn’t sound like a quality that deserves a pat on the back, but in the case of regulatory requirements, it is paramount.

  1. Comprehensiveness

Access to all of the underlying metadata, source-to-report design mappings, source and target repositories, you have the power to create reports within your reporting layer that have a traceable origin and can be easily explained to both IT, business, and regulatory stakeholders.

  1. Clarity

The requirements inform the design, the design platform puts those to action, and the reporting structures are fed the right data to create the right information at the right time via nearly any reporting platform, whether mainstream commercial or homegrown.

  1. Frequency

Adaptation is the key to meeting any frequency interval. Centralized designs, automated ETL patterns that feed your database schemas and reporting structures will allow for cyclical changes to be made and implemented in half the time of using conventional means. Getting beyond the spreadsheet, enabling pattern-based ETL, and schema population are ways to ensure you will be ready, whenever the need arises to show an audit trail of the change process and clearly articulate who did what and when through the system development lifecycle.

  1. Business-Friendly

A user interface designed to be business-friendly means there’s no need to be a data integration specialist to review the common practices outlined and “passively enforced” throughout the tool. Once a process is defined, rules implemented, and templates established, there is little opportunity for error or deviation from the overall process. A diverse set of role-based security options means that everyone can collaborate, learn and audit while maintaining the integrity of the underlying process components.

Faster, More Accurate Analysis with Fewer People

What if you could get more accurate data preparation 50% faster and double your analysis with less people?

erwin Mapping Manager (MM) is a patented solution that automates data mapping throughout the enterprise data integration lifecycle, providing data visibility, lineage and governance – freeing up that 80% of a data professional’s time to put that data to work.

With erwin MM, data integration engineers can design and reverse-engineer the movement of data implemented as ETL/ELT operations and stored procedures, building mappings between source and target data assets and designing the transformation logic between them. These designs then can be exported to most ETL and data asset technologies for implementation.

erwin MM is 100% metadata-driven and used to define and drive standards across enterprise integration projects, enable data and process audits, improve data quality, streamline downstream work flows, increase productivity (especially over geographically dispersed teams) and give project teams, IT leadership and management visibility into the ‘real’ status of integration and ETL migration projects.

If an automated data preparation/mapping solution sounds good to you, please check out erwin MM here.

Solving the Enterprise Data Dilemma

Categories
erwin Expert Blog

Compliance First: How to Protect Sensitive Data

The ability to more efficiently govern, discover and protect sensitive data is something that all prospering data-driven organizations are constantly striving for.

It’s been almost four months since the European Union’s General Data Protection Regulation (GDPR) took effect. While no fines have been issued yet, the Information Commissioner’s Office has received upwards of 500 calls per week since the May 25 effective date.

However, the fine-free streak may be ending soon with British Airways (BA) as the first large company to pay a GDPR penalty because of a data breach. The hack at BA in August and early September lasted for more than two weeks, with intruders getting away with account numbers and personal information of customers making reservations on the carrier’s website and mobile app. If regulators conclude that BA failed to take measures to prevent the incident— a significant fine may follow.

Additionally, complaints against Google in the EU have started. For example, internet browser provider Brave claims that Google and other advertising companies expose user data during a process called “bid request.” A data breach occurs because a bid request fails to protect sensitive data against unauthorized access, which is unlawful under the GDPR.

Per Brave’s announcement, bid request data can include the following personal data:

  • What you are reading or watching
  • Your location
  • Description of your device
  • Unique tracking IDs or a “cookie match,” which allows advertising technology companies to try to identify you the next time you are seen, so that a long-term profile can be built or consolidated with offline data about you
  • Your IP address,depending on the version of “real-time bidding” system
  • Data broker segment ID, if available, which could denote things like your income bracket, age and gender, habits, social media influence, ethnicity, sexual orientation, religion, political leaning, etc., depending on the version of bidding system

Obviously, GDPR isn’t the only regulation that organizations need to comply with. From HIPAA in healthcare to FINRA, PII and BCBS in financial services to the upcoming California Consumer Privacy Act (CCPA) taking effect January 1, 2020, regulatory compliance is part of running – and staying in business.

The common denominator in compliance across all industry sectors is the ability to protect sensitive data. But if organizations are struggling to understand what data they have and where it’s located, how do they protect it? Where do they begin?

Protect sensitive data

Discover and Protect Sensitive Data

Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Knowing where sensitive data is located and properly governing it with policy rules, impact analysis and lineage views is critical for risk management, data audits and regulatory compliance.

However, when key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed and therefore putting your organization at risk.

Sensitive data – at rest or in motion – that exists in various forms across multiple systems must be automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its usage across workflows easily traced.

Thankfully, tools are available to help automate the scanning, detection and tagging of sensitive data by:

  • Monitoring and controlling sensitive data: Better visibility and control across the enterprise to identify data security threats and reduce associated risks
  • Enriching business data elements for sensitive data discovery: Comprehensive mechanism to define business data element for PII, PHI and PCI across database systems, cloud and Big Data stores to easily identify sensitive data based on a set of algorithms and data patterns
  • Providing metadata and value-based analysis: Discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data including PII, PHI, PCI and other sensitive information.


A Regulatory Rationale for Integrating Data Management and Data Governance

Data management and data governance, together, play a vital role in compliance. It’s easier to protect sensitive data when you know where it’s stored, what it is, and how it needs to be governed.

Truly understanding an organization’s data, including the data’s value and quality, requires a harmonized approach embedded in business processes and enterprise architecture. Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

But how is all this possible? Again, it comes back to the right technology for IT and business collaboration that will enable you to:

  • Discover data: Identify and interrogate metadata from various data management silos
  • Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source
  • Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards
  • Analyze data: Understand how data relates to the business and what attributes it has
  • Map data flows: Identify where to integrate data and track how it moves and transforms
  • Govern data: Develop a governance model to manage standards and policies and set best practices
  • Socialize data: Enable all stakeholders to see data in one place in their own context
Categories
erwin Expert Blog

Automated Data Management: Stop Drowning in Your Data 

Due to the wealth of data data-driven organizations are tasked with handling, organizations are increasingly adopting automated data management.

There are 2.5 quintillion bytes of data being created every day, and that figure is increasing in tandem with the production of and demand for Internet of Things (IoT) devices. However, Forrester reports that between 60 and 73 percent of all data within an enterprise goes unused.

Collecting all that data is pointless if it’s not going to be used to deliver accurate and actionable insights.

But the reality is there’s not enough time, people and/or money for effective data management using manual processes. Organizations won’t be able to take advantage of analytics tools to become data-driven unless they establish a foundation for agile and complete data management. And organizations that don’t employ automated data management risk being left behind.

In addition to taking the burden off already stretched internal teams, automated data management’s most obvious benefit is that it’s a key enabler of data-driven business. Without it, a truly data-driven approach to business is either ineffective, or impossible, depending on the scale of data you’re working with.

This is because there’s either too much data left unaccounted for and too much potential revenue left on the table for the strategy to be considered effective. Or it’s because there’s so much disparity in the data sources and silos in where data is stored that data quality suffers to an insurmountable degree, rendering any analysis fundamentally flawed.

But simply enabling the strategy isn’t the most compelling use case, or organizations across the board would have implemented it already.

The Case for Automated Data Management

Business leaders and decision-makers want a business case for automated data management.

So here it is …

Without automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by taking a manual approach. And outsourcing these data management efforts to professional services firms only delays schedules and increases cost.

By automating data cataloging and data mapping inclusive of data at rest and data in motion through the integration lifecycle process, organizations will benefit from:

  • A metadata-driven automated framework for cataloging data assets and their flows across the business
  • An efficient, agile and dynamic way to generate data lineage from operational systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture
  • Easy access to what data aligns with specific business rules and policies
  • The ability to inform how data is transformed, integrated and federated throughout business processes – complete with full documentation
  • Faster project delivery and lower costs because data is managed internally, without the need to outsource data management efforts
  • Assurance of data quality, so analysis is reliable and new initiatives aren’t beleaguered with false starts
  • A seamlessly governed data pipeline, operationalized to the benefit of all stakeholders

erwin Data Intelligence

Categories
erwin Expert Blog

Data Governance Helps Build a Solid Foundation for Analytics

If your business is like many, it’s heavily invested in analytics. We’re living in a data-driven world. Data drives the recommendations we get from retailers, the coupons we get from grocers, and the decisions behind the products and services we’ll build and support at work.

None of the insights we draw from data are possible without analytics. We routinely slice, dice, measure and (try to) predict almost everything today because data is available to be analyzed. In theory, all this analysis should be helping the business. It should ensure we’re creating the right products and services, marketing them to the right people, and charging the right price. It should build a loyal base of customers who become brand ambassadors, amplifying existing marketing efforts to fuel more sales.

We hope all these things happen because all this analysis is expensive. It’s not just the cost of software licenses for the analytics software, but it’s also the people. Estimates for the average salary of data scientists, for example, can be upwards of $118,000 (Glassdoor) to $131,000 (Indeed). Many businesses also are exploring or already use next-generation analytics technology like predictive analytics or analytics supported by artificial intelligence or machine learning, which require even more investment.

If the underlying data your business is analyzing is bad, you’re throwing all this investment away. There’s a saying that scares everyone involved in analytics today: “Garbage in, garbage out.” When bad data is used to drive your strategic and operational decisions, your bad data suddenly becomes a huge problem for the business.

The goal, when it comes to the data you feed your analytics platforms, is what’s often referred to as the “single source of truth,” otherwise known as the data you can trust to analyze and create conclusions that drive your business forward.

“One source of truth means serving up consistent, high-quality data,” says Danny Sandwell, director of product marketing at erwin, Inc.

Despite all of the talk in the industry about data and analytics in recent years, many businesses still fail to reap the rewards of their analytics investments. In fact, Gartner reports that more than 60 percent of data and analytics projects fail. As with any software deployment, there are a number of reasons these projects don’t turn out the way they were planned. Among analytics, however, bad data can turn even a smooth deployment on the technology side into a disaster for the business.

What is bad data? It’s data that isn’t helping your business make the right decisions because it is:

  • Poor quality
  • Misunderstood
  • Incomplete
  • Misused

How Data Governance Helps Organizations Improve Their Analytics

More than one-quarter of the respondents to a November 2017 survey by erwin Inc. and UBM said analytics was one of the factors driving their data governance initiatives.

Reputation Management - What's Driving Data Governance

Data governance helps businesses understand what data they have, how good it is, where it is, and how it’s used. A lot of people are talking about data governance today, and some are putting that talk into action. The erwin-UBM survey found that 52 percent of respondents say data is critically important to their organization and they have a formal data governance strategy in place. But almost as many respondents (46 percent) say they recognize the value of data to their organizations but don’t have a formal governance strategy.

Data-driven Analytics: How Important is Data Governance

When data governance helps your organization develop high-quality data with demonstrated value, your IT organizations can build better analytics platforms for the business. Data governance helps enable self-service, which is an important part of analytics for many businesses today because it puts the power of data and analysis into the hands of the people who use the data on a daily basis. A well-functioning data governance program creates that single version of the truth by helping IT organizations identify and present the right data to users and eliminate confusion about the source or quality of the data.

Data governance also enables a system of best practices, subject matter experts, and collaboration that are the hallmarks of today’s analytics-driven businesses.

Like analytics, many early attempts at instituting data governance failed to deliver the expected results. They were narrowly focused, and their advocates often had difficulty articulating the value of data governance to the organization, which made it difficult to secure budget. Some organizations even viewed data governance as part of data security, securing their data to the point where the people who wanted to use it had trouble getting access.

Issues of ownership also hurt early data governance efforts, as IT and the business couldn’t agree on which side was responsible for a process that affects both on a regular basis. Today, organizations are better equipped to resolve these issues of ownership because many are adopting a new corporate structure that recognizes how important data is to modern businesses. Roles like chief data officer (CDO), which increasingly sits on the business side, and the data protection officer (DPO), are more common than they were a few years ago.

A modern data governance strategy weaves itself into the business and its infrastructure. It is present in the enterprise architecture, the business processes, and it helps organizations better understand the relationships between data assets using techniques like visualization. Perhaps most important, a modern approach to data governance is ongoing because organizations and their data are constantly changing and transforming, so their approach to data governance needs to adjust as they go.

When it comes to analytics, data governance is the best way to ensure you’re using the right data to drive your strategic and operational decisions. It’s easier said than done, especially when you consider all the data that’s flowing into a modern organization and how you’re going to sort through it all to find the good, the bad, and the ugly. But once you do, you’re on the way to using analytics to draw conclusions you can trust.

Previous posts:

You can determine how effective your current data governance initiative is by taking erwin’s DG RediChek.

Categories
erwin Expert Blog

Why Data Governance and Business Process Management Must Be Linked

Data governance and business process management must be linked.

Following the boom in data-driven business data governance (DG) has taken the modern enterprise by storm, garnering the attention of both the business and technical realms with an explosion of methodologies, targeted systems and training courses. That’s because a major gap needs to be addressed.

But despite all the admonitions and cautionary tales, little attention has focused on what can literally make or break any data governance initiative, turning it from a springboard for competitive advantage to a recipe for waste, anger and ultimately failure. The two key pivot points on which success hinges are business process management (BPM) and enterprise architecture. This article focuses on the critical connections between data governance and business process management.

Based on a True Story: Data Governance Without Process Is Not Data Governance

The following is based on a true story about a global pharmaceutical company implementing a cloud-based, enterprise-wide CRM system with a third-party provider.

Given the system’s nature, the data it would process, and the scope of the deployment, data security and governance was front and center. There were countless meetings – some with more than 50 participants – with protocols sent, reviewed, adjusted and so on. In fact, more than half a dozen outside security companies and advisors (and yes, data governance experts) came in to help design the perfect data protection system around which the CRM system would be implemented.

The framework was truly mind-boggling: hundreds of security measures, dozens of different file management protocols, data security software appearing every step of the way.  Looking at it as an external observer, it appeared to be an ironclad net of absolute safety and effective governance.

But as the CRM implementation progressed, holes began to appear. They were small at first but quickly grew to the size of trucks, effectively rendering months of preparatory work pointless.

Detailed data transfer protocols were subverted daily by consultants and company employees who thought speed was more important than safety. Software locks and systems were overridden with passwords freely communicated through emails and even written on Post-It Notes. And a two-factor authentication principle was reduced to one person entering half a password, with a piece of paper taped over half the computer screen, while another person entered the other half of the password before a third person read the entire password and pressed enter.

While these examples of security holes might seem funny – in a sad way – when you read them here, they represent a $500,000 failure that potentially could lead to a multi-billion-dollar security breach.

Why? Because there were no simple, effective and clearly defined processes to govern the immense investment in security protocols and software to ensure employees would follow them and management could audit and control them. Furthermore, the organization failed to realize how complex this implementation was and that process changes would be paramount.

Both such failures could have been avoided if the organization had a simple system of managing, adjusting and monitoring its processes. More to the point, the implementation of the entire security and governance framework would have cost less and been completed in half the time. Furthermore, if a failure or breach were discovered, it would be easy to trace and correct.

Gartner Magic Quadrant

Data Governance Starts with BPM

In a rush to implement a data governance methodology and system, you can forget that a system must serve a process – and be governed/controlled by one.

To choose the correct system and implement it effectively and efficiently, you must know – in every detail – all the processes it will impact, how it will impact them, who needs to be involved and when. Do these questions sound familiar? They should because they are the same ones we ask in data governance. They involve impact analysis, ownership and accountability, control and traceability – all of which effectively documented and managed business processes enable.

Data sets are not important in and of themselves. Data sets become important in terms of how they are used, who uses them and what their use is – and all this information is described in the processes that generate, manipulate and use them. So, unless we know what those processes are, how can any data governance implementation be complete or successful?

Consider this scenario: We’ve perfectly captured our data lineage, so we know what our data sets mean, how they’re connected, and who’s responsible for them – not a simple task but a massive win for any organization.  Now a breach occurs. Will any of the above information tell us why it happened? Or where? No! It will tell us what else is affected and who can manage the data layer(s), but unless we find and address the process failure that led to the breach, it is guaranteed to happen again.

By knowing where data is used – the processes that use and manage it – we can quickly, even instantly, identify where a failure occurs. Starting with data lineage (meaning our forensic analysis starts from our data governance system), we can identify the source and destination processes and the associated impacts throughout the organization. We can know which processes need to change and how. We can anticipate the pending disruptions to our operations and, more to the point, the costs involved in mitigating and/or addressing them.

But knowing all the above requires that our processes – our essential and operational business architecture – be accurately captured and modelled. Instituting data governance without processes is like building a castle on sand.

Rethinking Business Process Management

Modern organizations need a simple and easy-to-use BPM system with easy access to all the operational layers across the organization – from high-level business architecture all the way down to data. Sure, most organizations already have various solutions here and there, some with claims of being able to provide a comprehensive picture. But chances are they don’t, so you probably need to rethink your approach.

Modern BPM ecosystems are flexible, adjustable, easy-to-use and can support multiple layers simultaneously, allowing users to start in their comfort zones and mature as they work toward the organization’s goals.

Processes need to be open and shared in a concise, consistent way so all parts of the organization can investigate, ask questions, and then add their feedback and information layers. In other words, processes need to be alive and central to the organization because only then will the use of data and data governance be truly effective.

Are you willing to think outside the traditional boxes or silos that your organization’s processes and data live in?

The erwin EDGE is one of the most comprehensive software platforms for managing an organization’s data governance and business process initiatives, as well as the whole data architecture. It allows natural, organic growth throughout the organization and the assimilation of data governance and business process management under the same platform provides a unique data governance experience because of its integrated, collaborative approach.

To learn more about erwin EDGE, and how data governance underpins and ensures data quality throughout the wider data management-suite, download our resource: Data Governance Is Everyone’s Business.

Data Governance is Everyone's Business

Categories
erwin Expert Blog

Data Governance Tackles the Top Three Reasons for Bad Data

In modern, data-driven busienss, it’s integral that organizations understand the reasons for bad data and how best to address them. Data has revolutionized how organizations operate, from customer relationships to strategic decision-making and everything in between. And with more emphasis on automation and artificial intelligence, the need for data/digital trust also has risen. Even minor errors in an organization’s data can cause massive headaches because the inaccuracies don’t involve just one corrupt data unit.

Inaccurate or “bad” data also affects relationships to other units of data, making the business context difficult or impossible to determine. For example, are data units tagged according to their sensitivity [i.e., personally identifiable information subject to the General Data Protection Regulation (GDPR)], and is data ownership and lineage discernable (i.e., who has access, where did it originate)?

Relying on inaccurate data will hamper decisions, decrease productivity, and yield suboptimal results. Given these risks, organizations must increase their data’s integrity. But how?

Integrated Data Governance

Modern, data-driven organizations are essentially data production lines. And like physical production lines, their associated systems and processes must run smoothly to produce the desired results. Sound data governance provides the framework to address data quality at its source, ensuring any data recorded and stored is done so correctly, securely and in line with organizational requirements. But it needs to integrate all the data disciplines.

By integrating data governance with enterprise architecture, businesses can define application capabilities and interdependencies within the context of their connection to enterprise strategy to prioritize technology investments so they align with business goals and strategies to produce the desired outcomes. A business process and analysis component enables an organization to clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

And data modeling remains the best way to design and deploy new relational databases with high-quality data sources and support application development. Being able to cost-effectively and efficiently discover, visualize and analyze “any data” from “anywhere” underpins large-scale data integration, master data management, Big Data and business intelligence/analytics with the ability to synthesize, standardize and store data sources from a single design, as well as reuse artifacts across projects.

Let’s look at some of the main reasons for bad data and how data governance helps confront these issues …

Reasons for Bad Data

Reasons for Bad Data: Data Entry

The concept of “garbage in, garbage out” explains the most common cause of inaccurate data: mistakes made at data entry. While this concept is easy to understand, totally eliminating errors isn’t feasible so organizations need standards and systems to limit the extent of their damage.

With the right data governance approach, organizations can ensure the right people aren’t left out of the cataloging process, so the right context is applied. Plus you can ensure critical fields are not left blank, so data is recorded with as much context as possible.

With the business process integration discussed above, you’ll also have a single metadata repository.

All of this ensures sensitive data doesn’t fall through the cracks.

Reasons for Bad Data: Data Migration

Data migration is another key reason for bad data. Modern organizations often juggle a plethora of data systems that process data from an abundance of disparate sources, creating a melting pot for potential issues as data moves through the pipeline, from tool to tool and system to system.

The solution is to introduce a predetermined standard of accuracy through a centralized metadata repository with data governance at the helm. In essence, metadata describes data about data, ensuring that no matter where data is in relation to the pipeline, it still has the necessary context to be deciphered, analyzed and then used strategically.

The potential fallout of using inaccurate data has become even more severe with the GDPR’s implementation. A simple case of tagging and subsequently storing personally identifiable information incorrectly could lead to a serious breach in compliance and significant fines.

Such fines must be considered along with the costs resulting from any PR fallout.

Reasons for Bad Data: Data Integration

The proliferation of data sources, types, and stores increases the challenge of combining data into meaningful, valuable information. While companies are investing heavily in initiatives to increase the amount of data at their disposal, most information workers are spending more time finding the data they need rather than putting it to work, according to Database Trends and Applications (DBTA). erwin is co-sponsoring a DBTA webinar on this topic on July 17. To register, click here.

The need for faster and smarter data integration capabilities is growing. At the same time, to deliver business value, people need information they can trust to act on, so balancing governance is absolutely critical, especially with new regulations.

Organizations often invest heavily in individual software development tools for managing projects, requirements, designs, development, testing, deployment, releases, etc. Tools lacking inter-operability often result in cumbersome manual processes and heavy time investments to synchronize data or processes between these disparate tools.

Data integration combines data from several various sources into a unified view, making it more actionable and valuable to those accessing it.

Getting the Data Governance “EDGE”

The benefits of integrated data governance discussed above won’t be realized if it is isolated within IT with no input from other stakeholders, the day-to-day data users – from sales and customer service to the C-suite. Every data citizen has DG roles and responsibilities to ensure data units have context, meaning they are labeled, cataloged and secured correctly so they can be analyzed and used properly. In other words, the data can be trusted.

Once an organization understands that IT and the business are both responsible for data, it can develop comprehensive, holistic data governance capable of:

  • Reaching every stakeholder in the process
  • Providing a platform for understanding and governing trusted data assets
  • Delivering the greatest benefit from data wherever it lives, while minimizing risk
  • Helping users understand the impact of changes made to a specific data element across the enterprise.

To reduce the risks of and tackle the reasons for bad data and realize larger organizational objectives, organizations must make data governance everyone’s business.

To learn more about the collaborative approach to data governance and how it helps compliance in addition to adding value and reducing costs, get the free e-book here.

Data governance is everyone's business

Categories
erwin Expert Blog

The Role of An Effective Data Governance Initiative in Customer Purchase Decisions

A data governance initiative will maximize the security, quality and value of data, all of which build customer trust.

Without data, modern business would cease to function. Data helps guide decisions about products and services, makes it easier to identify customers, and serves as the foundation for everything businesses do today. The problem for many organizations is that data enters from any number of angles and gets stored in different places by different people and different applications.

Getting the most out of your data requires that you know what you have, where you have it, and that you understand its quality and value to the organization. This is where data governance comes into play. You can’t optimize your data if it’s scattered across different silos and lurking in various applications.

For about 150 years, manufacturers relied on their machinery and its ability to run reliably, properly and safely, to keep customers happy and revenue flowing. A data governance initiative has a similar role today, except its aim is to maximize the security, quality and value of data instead of machinery.

Customers are increasingly concerned about the safety and privacy of their data. According to a survey by Research+Data Insights, 85 percent of respondents worry about technology compromising their personal privacy. In a survey of 2,000 U.S. adults in 2016, researchers from Vanson Bourne found that 76 percent of respondents said they would move away from companies with a high record of data breaches.

For years, buying decisions were driven mainly by cost and quality, says Danny Sandwell, director of product marketing at erwin, Inc. But today’s businesses must consider their reputations in terms of both cost/quality and how well they protect their customers’ data when trying to win business.

Once the reputation is tarnished because of a breach or misuse of data, customers will question those relationships.

Unfortunately for consumers, examples of companies failing to properly govern their data aren’t difficult to find. Look no further than Under Armour, which announced this spring that 150 million accounts at its MyFitnessPal diet and exercise tracking app were breached, and Facebook, where the data of millions of users was harvested by third parties hoping to influence the 2016 presidential election in the United States.

Customers Hate Breaches, But They Love Data

While consumers are quick to report concerns about data privacy, customers also yearn for (and increasingly expect) efficient, personalized and relevant experiences when they interact with businesses. These experiences are, of course, built on data.

In this area, customers and businesses are on the same page. Businesses want to collect data that helps them build the omnichannel, 360-degree customer views that make their customers happy.

These experiences allow businesses to connect with their customers and demonstrate how well they understand them and know their preferences, like and dislikes – essentially taking the personalized service of the neighborhood market to the internet.

The only way to manage that effectively at scale is to properly govern your data.

Delivering personalized service is also valuable to businesses because it helps turn customers into brand ambassadors, and it’s a fact that it’s much easier to build on existing customer relationships than to find new customers.

Here’s the upshot: If your organization is doing data governance right, it’s helping create happy, loyal customers, while at the same time avoiding the bad press and financial penalties associated with poor data practices.

Putting A Data Governance Initiative Into Action

The good news is that 76 percent of respondents to a November 2017 survey we conducted with UBM said understanding and governing the data assets in the organization was either important or very important to the executives in their organization. Nearly half (49 percent) of respondents said that customer trust/satisfaction was driving their data governance initiatives.

Importance of a data governance initiative

What stops organizations from creating an effective data governance initiative? At some businesses, it’s a cultural issue. Both the business and IT sides of the organization play important roles in data, with the IT side storing and protecting it, and the business side consuming data and analyzing it.

For years, however, data governance was the volleyball passed back and forth over the net between IT and the business, with neither side truly owning it. Our study found signs this is changing. More than half (57 percent) of the respondents said both and IT and the business/corporate teams were responsible for data in their organization.

Who's responsible for a data governance initiative

Once an organization understands that IT and the business are both responsible for data, it still needs to develop a comprehensive, holistic strategy for data governance that is capable of:

  • Reaching every stakeholder in the process
  • Providing a platform for understanding and governing trusted data assets
  • Delivering the greatest benefit from data wherever it lives, while minimizing risk
  • Helping users understand the impact of changes made to a specific data element across the enterprise.

To accomplish this, a modern data governance initiative needs to be interdisciplinary. It should include not only data governance, which is ongoing because organizations are constantly changing and transforming, but other disciples as well.

Enterprise architecture is important because it aligns IT and the business, mapping a company’s applications and the associated technologies and data to the business functions they enable.

By integrating data governance with enterprise architecture, businesses can define application capabilities and interdependencies within the context of their connection to enterprise strategy to prioritize technology investments so they align with business goals and strategies to produce the desired outcomes.

A business process and analysis component is also vital to modern data governance. It defines how the business operates and ensures employees understand and are accountable for carrying out the processes for which they are responsible.

Enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Finally, data modeling remains the best way to design and deploy new relational databases with high-quality data sources and support application development.

Being able to cost-effectively and efficiently discover, visualize and analyze “any data” from “anywhere” underpins large-scale data integration, master data management, Big Data and business intelligence/analytics with the ability to synthesize, standardize and store data sources from a single design, as well as reuse artifacts across projects.

Michael Pastore is the Director, Content Services at QuinStreet B2B Tech. This content originally appeared as a sponsored post on http://www.eweek.com/.

Read the previous post on how compliance concerns and the EU’s GDPR are driving businesses to implement data governance.

Determine how effective your current data governance initiative is by taking our DG RediChek.

Take the DG RediChek

Categories
erwin Expert Blog

Data Discovery Fire Drill: Why Isn’t My Executive Business Intelligence Report Correct?

Executive business intelligence (BI) reporting can be incomplete, inconsistent and/or inaccurate, becoming a critical concern for the executive management team trying to make informed business decisions. When issues arise, it is up to the IT department to figure out what the problem is, where it occurred, and how to fix it. This is not a trivial task.

Take the following scenario in which a CEO receives two reports supposedly from the same set of data, but each report shows different results. Which report is correct?  If this is something your organization has experienced, then you know what happens next – the data discovery fire drill.

A flurry of activities take place, suspending all other top priorities. A special team is quickly assembled to delve into each report. They review the data sources, ETL processes and data marts in an effort to trace the events that affected the data. Fire drills like the above can consume days if not weeks of effort to locate the error.

In the above situation it turns out there was a new update to one ETL process that was implemented in only one report. When you multiply the number of data discovery fire drills by the number of data quality concerns for any executive business intelligence report, the costs continue to mount.

Data can arrive from multiple systems at the same time, often occurring rapidly and in parallel. In some cases, the ETL load itself may generate new data. Through all of this, IT still has to answer two fundamental questions: where did this data come from, and how did it get here?

Accurate Executive Business Intelligence Reporting Requires Data Governance

As the volume of data rapidly increases, BI data environments are becoming more complex. To manage this complexity, organizations invest in a multitude of elaborate and expensive tools. But despite this investment, IT is still overwhelmed trying to track the vast collection of data within their BI environment. Is more technology the answer?

Perhaps the better question we should look to answer is: how can we avoid these data discovery fires in the future?

We believe it’s possible to prevent data discovery fires, and that starts with proper data governance and a strong data lineage capability.

Data Discovery Fire Drill: Executive Business Intelligence

Why is data governance important?

  • Governed data promotes data sharing.
  • Data standards make data more reusable.
  • Greater context in data definitions assist in more accurate analytics.
  • A clear set of data policies and procedures support data security.

Why is data lineage important?

  • Data trust is built by establishing its origins.
  • The troubleshooting process is simplified by enabling data to be traced.
  • The risk of ETL data loss is reduced by exposing potential problems in the process.
  • Business rules, which otherwise would be buried in an ETL process, are visible.

Data Governance Enables Data-Driven Business

In the context of modern, data-driven business in which organizations are essentially production lines of information – data governance is responsible for the health and maintenance of said production line.

It’s the enabling factor of the enterprise data management suite that ensures data quality,  so organizations can have greater trust in their data. It ensures that any data created is properly stored, tagged and assigned the context needed to prevent corruption or loss as it moves through the production line – greatly enhancing data discovery.

Alongside improving data quality, aiding in regulatory compliance, and making practices like tracing data lineage easier, sound data governance also helps organizations be proactive with their data, using it to drive revenue. They can make better decisions faster and negate the likelihood of costly mistakes and data breaches that would eat into their  bottom lines.

For more information about how data governance supports executive business intelligence and the rest of the enterprise data management suite, click here.

Data governance is everyone's business

Categories
erwin Expert Blog

Defining Data Governance: What Is Data Governance?

Data governance (DG) is one of the fastest growing disciplines, yet when it comes to defining data governance many organizations struggle.

Dataversity says DG is “the practices and processes which help to ensure the formal management of data assets within an organization.” These practices and processes can vary, depending on an organization’s needs. Therefore, when defining data governance for your organization, it’s important to consider the factors driving its adoption.

The General Data Protection Regulation (GDPR) has contributed significantly to data governance’s escalating prominence. In fact, erwin’s 2018 State of Data Governance Report found that 60% of organizations consider regulatory compliance to be their biggest driver of data governance.

Defining data governance: DG Drivers

Other significant drivers include improving customer trust/satisfaction and encouraging better decision-making, but they trail behind regulatory compliance at 49% and 45% respectively. Reputation management (30%), analytics (27%) and Big Data (21%) also are factors.

But data governance’s adoption is of little benefit without understanding how DG should be applied within these contexts. This is arguably one of the issues that’s held data governance back in the past.

With no set definition, and the historical practice of isolating data governance within IT, organizations often have had different ideas of what data governance is, even between departments. With this inter-departmental disconnect, it’s not hard to imagine why data governance has historically left a lot to be desired.

However, with the mandate for DG within GDPR, organizations must work on defining data governance organization-wide to manage its successful implementation, or face GDPR’s penalties.

Defining Data Governance: Desired Outcomes

A great place to start when defining an organization-wide DG initiative is to consider the desired business outcomes. This approach ensures that all parties involved have a common goal.

Past examples of Data Governance 1.0 were mainly concerned with cataloging data to support search and discovery. The nature of this approach, coupled with the fact that DG initiatives were typically siloed within IT departments without input from the wider business, meant the practice often struggled to add value.

Without input from the wider business, the data cataloging process suffered from a lack of context. By neglecting to include the organization’s primary data citizens – those that manage and or leverage data on a day-to-day basis for analysis and insight – organizational data was often plagued by duplications, inconsistencies and poor quality.

The nature of modern data-driven business means that such data citizens are spread throughout the organization. Furthermore, many of the key data citizens (think value-adding approaches to data use such as data-driven marketing) aren’t actively involved with IT departments.

Because of this, Data Governance 1.0 initiatives fizzled out at discouraging frequencies.

This is, of course, problematic for organizations that identify regulatory compliance as a driver of data governance. Considering the nature of data-driven business – with new data being constantly captured, stored and leveraged – meeting compliance standards can’t be viewed as a one-time fix, so data governance can’t be de-prioritized and left to fizzle out.

Even those businesses that manage to maintain the level of input data governance needs on an indefinite basis, will find the Data Governance 1.0 approach wanting. In terms of regulatory compliance, the lack of context associated with data governance 1.0, and the inaccuracies it leads to mean that potentially serious data governance issues could go unfounded and result in repercussions for non-compliance.

We recommend organizations look beyond just data cataloging and compliance as desired outcomes when implementing DG. In the data-driven business landscape, data governance finds its true potential as a value-added initiative.

Organizations that identify the desired business outcome of data governance as a value-added initiative should also consider data governance 1.0’s shortcomings and any organizations that hasn’t identified value-adding as a business outcome, should ask themselves, “why?”

Many of the biggest market disruptors of the 21st Century have been digital savvy start-ups with robust data strategies – think Airbnb, Amazon and Netflix. Without high data governance standards, such companies would not have the level of trust in their data to confidently action such digital-first strategies, making them difficult to manage.

Therefore, in the data-driven business era, organizations should consider a Data Governance 2.0 strategy, with DG becoming an organization-wide, strategic initiative that de-silos the practice from the confines of IT.

This collaborative take on data governance intrinsically involves data’s biggest beneficiaries and users in the governance process, meaning functions like data cataloging benefit from greater context, accuracy and consistency.

It also means that organizations can have greater trust in their data and be more assured of meeting the standards set for regulatory compliance. It means that organizations can better respond to customer needs through more accurate methods of profiling and analysis, improving rates of satisfaction. And it means that organizations are less likely to suffer data breaches and their associated damages.

Defining Data Governance: The Enterprise Data Governance Experience (EDGE)

The EDGE is the erwin approach to Data Governance 2.0, empowering an organization to:

  • Manage any data, anywhere (Any2)
  • Instil a culture of collaboration and organizational empowerment
  • Introduce an integrated ecosystem for data management that draws from one central repository and ensures data (including real-time changes) is consistent throughout the organization
  • Have visibility across domains by breaking down silos between business and IT and introducing a common data vocabulary
  • Have regulatory peace of mind through mitigation of a wide range of risks, from GDPR to cybersecurity. 

To learn more about implementing data governance, click here.

Take the DG RediChek