Tag: data lineage

Data Modeling and Data Mapping: Results from Any Data Anywhere

Post author By Andrew McGovern
Post date November 20, 2018
No Comments on Data Modeling and Data Mapping: Results from Any Data Anywhere

A unified approach to data modeling and data mapping could be the breakthrough that many data-driven organizations need.

In most of the conversations I have with clients, they express the need for a viable solution to model their data, as well as the ability to capture and document the metadata within their environments.

Data modeling is an integral part of any data management initiative. Organizations use data models to tame “data at rest” for business use, governance and technical management of databases of all types.

What Is Data Modeling?

However, once an organization understands what data it has and how it’s structured via data models, it needs answers to other critical questions: Where did it come from? Did it change along the journey? Where does it go from here?

Data Mapping: Taming “Data in Motion”

Knowing how data moves throughout technical and business data architectures is key for true visibility, context and control of all data assets.

Managing data in motion has been a difficult, time-consuming task that involves mapping source elements to the data model, defining the required transformations, and/or providing the same for downstream targets.

Historically, it either has been outsourced to ETL/ELT developers who often create a siloed, technical infrastructure opaque to the business, or business-friendly mappings have been kept in an assortment of unwieldy spreadsheets difficult to consolidate and reuse much less capable of accommodating new requirements.

What if you could combine data at rest and data in motion to create an efficient, accurate and real-time data pipeline that also includes lineage? Then you can spend your time finding the data you need and using it to produce meaningful business outcomes.

Good news … you can.

Automated Data Mapping

Your data modelers can continue to use erwin Data Modeler (DM) as the foundation of your database management system, documenting, enforcing and improving those standards. But instead of relying on data models to disseminate metadata information, you can scan and integrate any data source and present it to all interested parties – automatically.

erwin Mapping Manager (MM) shifts the management of metadata away from data models to a dedicated, automated platform. It can collect metadata from any source, including JSON documents, erwin data models, databases and ERP systems, out of the box.

This functionality underscores our Any² data approach by collecting any data from anywhere. And erwin MM can schedule data collection and create versions for comparison to clearly identify any changes.

Metadata definitions can be enhanced using extended data properties, and detailed data lineages can be created based on collected metadata. End users can quickly search for information and see specific data in the context of business processes.

To summarize the key features current data modeling customers seem to be most excited about:

Easy import of legacy mappings, plus share and reuse mappings and transformations
Metadata catalog to automatically harvest any data from anywhere
Comprehensive upstream and downstream data lineage
Versioning with comparison features
Impact analysis

And all of these features support and can be integrated with erwin Data Governance. The end result is knowing what data you have and where it is so you can fuel a fast, high-quality and complete pipeline of any data from anywhere to accomplish your organizational objectives.

Want to learn more about a unified approach to data modeling and data mapping? Join us for our weekly demo to see erwin MM in action for yourself.

erwin Expert Blog Data Governance Data Intelligence

Demystifying Data Lineage: Tracking Your Data’s DNA

Post author By Danny Sandwell
Post date November 1, 2018
No Comments on Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

erwin Expert Blog

Top 10 Reasons to Automate Data Mapping and Data Preparation

Post author By Mariann McDonagh
Post date October 11, 2018
1 Comment on Top 10 Reasons to Automate Data Mapping and Data Preparation

Data preparation is notorious for being the most time-consuming area of data management. It’s also expensive.

“Surveys show the vast majority of time is spent on this repetitive task, with some estimates showing it takes up as much as 80% of a data professional’s time,” according to Information Week. And a Trifacta study notes that overreliance on IT resources for data preparation costs organizations billions.

The power of collecting your data can come in a variety of forms, but most often in IT shops around the world, it comes in a spreadsheet, or rather a collection of spreadsheets often numbering in the hundreds or thousands.

Most organizations, especially those competing in the digital economy, don’t have enough time or money for data management using manual processes. And outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too.

Taking the Time and Pain Out of Data Preparation: 10 Reasons to Automate Data Preparation/Data Mapping

Governance and Infrastructure

Data governance and a strong IT infrastructure are critical in the valuation, creation, storage, use, archival and deletion of data. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there is an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A design platform that allows for insights like data lineage, impact analysis, full history capture, and other data management features can provide a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault, or a traditional warehouse.

Eliminating Human Error

In the traditional data management organization, excel spreadsheets are used to manage the incoming data design, or what is known as the “pre-ETL” mapping documentation – this does not lend to any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to creating accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Completeness

The ability to scan and import from a broad range of sources and formats, as well as automated change tracking, means that you will always be able to import your data from wherever it lives and track all of the changes to that data over time.

Adaptability

Centralized design, immediate lineage and impact analysis, and change activity logging means that you will always have the answer readily available, or a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

Accuracy

Out-of-the-box capabilities to map your data from source to report, make reconciliation and validation a snap, with auditability and traceability built-in. Build a full array of validation rules that can be cross checked with the design mappings in a centralized repository.

Timeliness

The ability to be agile and reactive is important – being good at being reactive doesn’t sound like a quality that deserves a pat on the back, but in the case of regulatory requirements, it is paramount.

Comprehensiveness

Access to all of the underlying metadata, source-to-report design mappings, source and target repositories, you have the power to create reports within your reporting layer that have a traceable origin and can be easily explained to both IT, business, and regulatory stakeholders.

Clarity

The requirements inform the design, the design platform puts those to action, and the reporting structures are fed the right data to create the right information at the right time via nearly any reporting platform, whether mainstream commercial or homegrown.

Frequency

Adaptation is the key to meeting any frequency interval. Centralized designs, automated ETL patterns that feed your database schemas and reporting structures will allow for cyclical changes to be made and implemented in half the time of using conventional means. Getting beyond the spreadsheet, enabling pattern-based ETL, and schema population are ways to ensure you will be ready, whenever the need arises to show an audit trail of the change process and clearly articulate who did what and when through the system development lifecycle.

Business-Friendly

A user interface designed to be business-friendly means there’s no need to be a data integration specialist to review the common practices outlined and “passively enforced” throughout the tool. Once a process is defined, rules implemented, and templates established, there is little opportunity for error or deviation from the overall process. A diverse set of role-based security options means that everyone can collaborate, learn and audit while maintaining the integrity of the underlying process components.

Faster, More Accurate Analysis with Fewer People

What if you could get more accurate data preparation 50% faster and double your analysis with less people?

erwin Mapping Manager (MM) is a patented solution that automates data mapping throughout the enterprise data integration lifecycle, providing data visibility, lineage and governance – freeing up that 80% of a data professional’s time to put that data to work.

With erwin MM, data integration engineers can design and reverse-engineer the movement of data implemented as ETL/ELT operations and stored procedures, building mappings between source and target data assets and designing the transformation logic between them. These designs then can be exported to most ETL and data asset technologies for implementation.

erwin MM is 100% metadata-driven and used to define and drive standards across enterprise integration projects, enable data and process audits, improve data quality, streamline downstream work flows, increase productivity (especially over geographically dispersed teams) and give project teams, IT leadership and management visibility into the ‘real’ status of integration and ETL migration projects.

If an automated data preparation/mapping solution sounds good to you, please check out erwin MM here.

erwin Expert Blog

Healthy Co-Dependency: Data Management and Data Governance

Post author By Bunny Tharpe
Post date September 7, 2018
No Comments on Healthy Co-Dependency: Data Management and Data Governance

Data management and data governance are now more important than ever before. The hyper competitive nature of data-driven business means organizations need to get more out of their data than ever before – and fast.

A few data-driven exemplars have led the way, turning data into actionable insights that influence everything from corporate structure to new products and pricing. “Few” being the operative word.

It’s true, data-driven business is big business. Huge actually. But it’s dominated by a handful of organizations that realized early on what a powerful and disruptive force data can be.

The benefits of such data-driven strategies speak for themselves: Netflix has replaced Blockbuster, and Uber continues to shake up the taxi business. Organizations indiscriminate of industry are following suit, fighting to become the next big, disruptive players.

But in many cases, these attempts have failed or are on the verge of doing so.

Now with the General Data Protection Regulation (GDPR) in effect, data that is unaccounted for is a potential data disaster waiting to happen.

So organizations need to understand that getting more out of their data isn’t necessarily about collecting more data. It’s about unlocking the value of the data they already have.

The Enterprise Data Dilemma

However, most organizations don’t know exactly what data they have or even where some of it is. And some of the data they can account for is going to waste because they don’t have the means to process it. This is especially true of unstructured data types, which organizations are collecting more frequently.

Considering that 73 percent of company data goes unused, it’s safe to assume your organization is dealing with some if not all of these issues.

Big picture, this means your enterprise is missing out on thousands, perhaps millions in revenue.

The smaller picture? You’re struggling to establish a single source of data truth, which contributes to a host of problems:

Inaccurate analysis and discrepancies in departmental reporting
Inability to manage the amount and variety of data your organization collects
Duplications and redundancies in processes
Issues determining data ownership, lineage and access
Achieving and sustaining compliance

To avoid such circumstances and get more value out of data, organizations need to harmonize their approach to data management and data governance, using a platform of established tools that work in tandem while also enabling collaboration across the enterprise.

Data management drives the design, deployment and operation of systems that deliver operational data assets for analytics purposes.

Data governance delivers these data assets within a business context, tracking their physical existence and lineage, and maximizing their security, quality and value.

Although these two disciplines approach data from different perspectives (IT-driven and business-oriented), they depend on each other. And this co-dependency helps an organization make the most of its data.

The P-M-G Hub

Together, data management and data governance form a critical hub for data preparation, modeling and data governance. How?

It starts with a real-time, accurate picture of the data landscape, including “data at rest” in databases, data warehouses and data lakes and “data in motion” as it is integrated with and used by key applications. That landscape also must be controlled to facilitate collaboration and limit risk.

But knowing what data you have and where it lives is complicated, so you need to create and sustain an enterprise-wide view of and easy access to underlying metadata. That’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, and data analysis based on faulty insights.

However, these issues can be addressed with a strong data management strategy and technology to enable the data quality required by the business, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

Being able to pinpoint what data exists and where must be accompanied by an agreed-upon business understanding of what it all means in common terms that are adopted across the enterprise. Having that consistency is the only way to assure that insights generated by analyses are useful and actionable, regardless of business department or user exploring a question. Additionally, policies, processes and tools that define and control access to data by roles and across workflows are critical for security purposes.

These issues can be addressed with a comprehensive data governance strategy and technology to determine master data sets, discover the impact of potential glossary changes across the enterprise, audit and score adherence to rules, discover risks, and appropriately and cost-effectively apply security to data flows, as well as publish data to people/roles in ways that are meaningful to them.

Data Management and Data Governance: Play Together, Stay Together

When data management and data governance work in concert empowered by the right technology, they inform, guide and optimize each other. The result for an organization that takes such a harmonized approach is automated, real-time, high-quality data pipeline.

Then all stakeholders — data scientists, data stewards, ETL developers, enterprise architects, business analysts, compliance officers, CDOs and CEOs – can access the data they’re authorized to use and base strategic decisions on what is now a full inventory of reliable information.

The erwin EDGE creates an “enterprise data governance experience” through integrated data mapping, business process modeling, enterprise architecture modeling, data modeling and data governance. No other software platform on the market touches every aspect of the data management and data governance lifecycle to automate and accelerate the speed to actionable business insights.

Tags data modeling, enterprise architecture, GDPR, data management, data warehouse, data governance, business process modeling, data-driven business, metadata management, collaborative data governance, enterprise data governance experience, data lineage, data analysis, data lake, data mapping, data mangement, data sources, disparate data, data infrastructure

erwin Expert Blog

Data Governance Tackles the Top Three Reasons for Bad Data

Post author By Bunny Tharpe
Post date June 28, 2018
No Comments on Data Governance Tackles the Top Three Reasons for Bad Data

In modern, data-driven busienss, it’s integral that organizations understand the reasons for bad data and how best to address them. Data has revolutionized how organizations operate, from customer relationships to strategic decision-making and everything in between. And with more emphasis on automation and artificial intelligence, the need for data/digital trust also has risen. Even minor errors in an organization’s data can cause massive headaches because the inaccuracies don’t involve just one corrupt data unit.

Inaccurate or “bad” data also affects relationships to other units of data, making the business context difficult or impossible to determine. For example, are data units tagged according to their sensitivity [i.e., personally identifiable information subject to the General Data Protection Regulation (GDPR)], and is data ownership and lineage discernable (i.e., who has access, where did it originate)?

Relying on inaccurate data will hamper decisions, decrease productivity, and yield suboptimal results. Given these risks, organizations must increase their data’s integrity. But how?

Integrated Data Governance

Modern, data-driven organizations are essentially data production lines. And like physical production lines, their associated systems and processes must run smoothly to produce the desired results. Sound data governance provides the framework to address data quality at its source, ensuring any data recorded and stored is done so correctly, securely and in line with organizational requirements. But it needs to integrate all the data disciplines.

By integrating data governance with enterprise architecture, businesses can define application capabilities and interdependencies within the context of their connection to enterprise strategy to prioritize technology investments so they align with business goals and strategies to produce the desired outcomes. A business process and analysis component enables an organization to clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

And data modeling remains the best way to design and deploy new relational databases with high-quality data sources and support application development. Being able to cost-effectively and efficiently discover, visualize and analyze “any data” from “anywhere” underpins large-scale data integration, master data management, Big Data and business intelligence/analytics with the ability to synthesize, standardize and store data sources from a single design, as well as reuse artifacts across projects.

Let’s look at some of the main reasons for bad data and how data governance helps confront these issues …

Reasons for Bad Data: Data Entry

The concept of “garbage in, garbage out” explains the most common cause of inaccurate data: mistakes made at data entry. While this concept is easy to understand, totally eliminating errors isn’t feasible so organizations need standards and systems to limit the extent of their damage.

With the right data governance approach, organizations can ensure the right people aren’t left out of the cataloging process, so the right context is applied. Plus you can ensure critical fields are not left blank, so data is recorded with as much context as possible.

With the business process integration discussed above, you’ll also have a single metadata repository.

All of this ensures sensitive data doesn’t fall through the cracks.

Reasons for Bad Data: Data Migration

Data migration is another key reason for bad data. Modern organizations often juggle a plethora of data systems that process data from an abundance of disparate sources, creating a melting pot for potential issues as data moves through the pipeline, from tool to tool and system to system.

The solution is to introduce a predetermined standard of accuracy through a centralized metadata repository with data governance at the helm. In essence, metadata describes data about data, ensuring that no matter where data is in relation to the pipeline, it still has the necessary context to be deciphered, analyzed and then used strategically.

The potential fallout of using inaccurate data has become even more severe with the GDPR’s implementation. A simple case of tagging and subsequently storing personally identifiable information incorrectly could lead to a serious breach in compliance and significant fines.

Such fines must be considered along with the costs resulting from any PR fallout.

Reasons for Bad Data: Data Integration

The proliferation of data sources, types, and stores increases the challenge of combining data into meaningful, valuable information. While companies are investing heavily in initiatives to increase the amount of data at their disposal, most information workers are spending more time finding the data they need rather than putting it to work, according to Database Trends and Applications (DBTA). erwin is co-sponsoring a DBTA webinar on this topic on July 17. To register, click here.

The need for faster and smarter data integration capabilities is growing. At the same time, to deliver business value, people need information they can trust to act on, so balancing governance is absolutely critical, especially with new regulations.

Organizations often invest heavily in individual software development tools for managing projects, requirements, designs, development, testing, deployment, releases, etc. Tools lacking inter-operability often result in cumbersome manual processes and heavy time investments to synchronize data or processes between these disparate tools.

Data integration combines data from several various sources into a unified view, making it more actionable and valuable to those accessing it.

Getting the Data Governance “EDGE”

The benefits of integrated data governance discussed above won’t be realized if it is isolated within IT with no input from other stakeholders, the day-to-day data users – from sales and customer service to the C-suite. Every data citizen has DG roles and responsibilities to ensure data units have context, meaning they are labeled, cataloged and secured correctly so they can be analyzed and used properly. In other words, the data can be trusted.

Once an organization understands that IT and the business are both responsible for data, it can develop comprehensive, holistic data governance capable of:

Reaching every stakeholder in the process
Providing a platform for understanding and governing trusted data assets
Delivering the greatest benefit from data wherever it lives, while minimizing risk
Helping users understand the impact of changes made to a specific data element across the enterprise.

To reduce the risks of and tackle the reasons for bad data and realize larger organizational objectives, organizations must make data governance everyone’s business.

To learn more about the collaborative approach to data governance and how it helps compliance in addition to adding value and reducing costs, get the free e-book here.

erwin Expert Blog

Data Discovery Fire Drill: Why Isn’t My Executive Business Intelligence Report Correct?

Post author By Robert Lutton
Post date May 24, 2018
No Comments on Data Discovery Fire Drill: Why Isn’t My Executive Business Intelligence Report Correct?

Executive business intelligence (BI) reporting can be incomplete, inconsistent and/or inaccurate, becoming a critical concern for the executive management team trying to make informed business decisions. When issues arise, it is up to the IT department to figure out what the problem is, where it occurred, and how to fix it. This is not a trivial task.

Take the following scenario in which a CEO receives two reports supposedly from the same set of data, but each report shows different results. Which report is correct? If this is something your organization has experienced, then you know what happens next – the data discovery fire drill.

A flurry of activities take place, suspending all other top priorities. A special team is quickly assembled to delve into each report. They review the data sources, ETL processes and data marts in an effort to trace the events that affected the data. Fire drills like the above can consume days if not weeks of effort to locate the error.

In the above situation it turns out there was a new update to one ETL process that was implemented in only one report. When you multiply the number of data discovery fire drills by the number of data quality concerns for any executive business intelligence report, the costs continue to mount.

Data can arrive from multiple systems at the same time, often occurring rapidly and in parallel. In some cases, the ETL load itself may generate new data. Through all of this, IT still has to answer two fundamental questions: where did this data come from, and how did it get here?

Accurate Executive Business Intelligence Reporting Requires Data Governance

As the volume of data rapidly increases, BI data environments are becoming more complex. To manage this complexity, organizations invest in a multitude of elaborate and expensive tools. But despite this investment, IT is still overwhelmed trying to track the vast collection of data within their BI environment. Is more technology the answer?

Perhaps the better question we should look to answer is: how can we avoid these data discovery fires in the future?

We believe it’s possible to prevent data discovery fires, and that starts with proper data governance and a strong data lineage capability.

Why is data governance important?

Governed data promotes data sharing.
Data standards make data more reusable.
Greater context in data definitions assist in more accurate analytics.
A clear set of data policies and procedures support data security.

Why is data lineage important?

Data trust is built by establishing its origins.
The troubleshooting process is simplified by enabling data to be traced.
The risk of ETL data loss is reduced by exposing potential problems in the process.
Business rules, which otherwise would be buried in an ETL process, are visible.

Data Governance Enables Data-Driven Business

In the context of modern, data-driven business in which organizations are essentially production lines of information – data governance is responsible for the health and maintenance of said production line.

It’s the enabling factor of the enterprise data management suite that ensures data quality, so organizations can have greater trust in their data. It ensures that any data created is properly stored, tagged and assigned the context needed to prevent corruption or loss as it moves through the production line – greatly enhancing data discovery.

Alongside improving data quality, aiding in regulatory compliance, and making practices like tracing data lineage easier, sound data governance also helps organizations be proactive with their data, using it to drive revenue. They can make better decisions faster and negate the likelihood of costly mistakes and data breaches that would eat into their bottom lines.

For more information about how data governance supports executive business intelligence and the rest of the enterprise data management suite, click here.

Tags data governance, business intelligence, data quality, data lineage, BI, BI report, data discovery, data volume, data-driven business data governance, enterprise data management

erwin Expert Blog

Pillars of Data Governance Readiness: Enterprise Data Management Methodology

Post author By Bunny Tharpe
Post date April 20, 2018
No Comments on Pillars of Data Governance Readiness: Enterprise Data Management Methodology

Facebook’s data woes continue to dominate the headlines and further highlight the importance of having an enterprise-wide view of data assets. The high-profile case is somewhat different than other prominent data scandals as it wasn’t a “breach,” per se. But questions of negligence persist, and in all cases, data governance is an issue.

This week, the Wall Street Journal ran a story titled “Companies Should Beware Public’s Rising Anxiety Over Data.” It discusses an IBM poll of 10,000 consumers in which 78% of U.S. respondents say a company’s ability to keep their data private is extremely important, yet only 20% completely trust organizations they interact with to maintain data privacy. In fact, 60% indicate they’re more concerned about cybersecurity than a potential war.

The piece concludes with a clear lesson for CIOs: “they must make data governance and compliance with regulations such as the EU’s General Data Protection Regulation [GDPR] an even greater priority, keeping track of data and making sure that the corporation has the ability to monitor its use, and should the need arise, delete it.”

With a more thorough data governance initiative and a better understanding of data assets, their lineage and useful shelf-life, and the privileges behind their access, Facebook likely could have gotten ahead of the problem and quelled it before it became an issue. Sometimes erasure is the best approach if the reward from keeping data onboard is outweighed by the risk.

But perhaps Facebook is lucky the issue arose when it did. Once the GDPR goes into effect, this type of data snare would make the company non-compliant, as the regulation requires direct consent from the data owner (as well as notification within 72 hours if there is an actual breach).

Considering GDPR, as well as the gargantuan PR fallout and governmental inquiries Facebook faced, companies can’t afford such data governance mistakes.

During the past few weeks, we’ve been exploring each of the five pillars of data governance readiness in detail and how they come together to provide a full view of an organization’s data assets. In this blog, we’ll look at enterprise data management methodology as the fourth key pillar.

Enterprise Data Management in Four Steps

Enterprise data management methodology addresses the need for data governance within the wider data management suite, with all components and solutions working together for maximum benefits.

A successful data governance initiative should both improve a business’ understanding of data lineage/history and install a working system of permissions to prevent access by the wrong people. On the flip side, successful data governance makes data more discoverable, with better context so the right people can make better use of it.

This is the nature of Data Governance 2.0 – helping organizations better understand their data assets and making them easier to manage and capitalize on – and it succeeds where Data Governance 1.0 stumbled.

Enterprise Data Management: So where do you start?

Metadata management provides the organization with the contextual information concerning its data assets. Without it, data governance essentially runs blind.

The value of metadata management is the ability to govern common and reference data used across the organization with cross-departmental standards and definitions, allowing data sharing and reuse, reducing data redundancy and storage, avoiding data errors due to incorrect choices or duplications, and supporting data quality and analytics capabilities.

Your organization also needs to understand enterprise data architecture and enterprise data modeling. Without it, enterprise data governance will be hard to support

Enterprise data architecture supports data governance through concepts such as data movement, data transformation and data integration – since data governance develops policies and standards for these activities.

Data modeling, a vital component of data architecture, is also critical to data governance. By providing insights into the use cases satisfied by the data, organizations can do a better job of proactively analyzing the required shelf-life and better measure the risk/reward of keeping that data around.

Data stewards serve as SMEs in the development and refinement of data models and assist in the creation of data standards that are represented by data models. These artifacts allow your organization to achieve its business goals using enterprise data architecture.

Let’s face it, most organizations implement data governance because they want high quality data. Enterprise data governance is foundational for the success of data quality management.

Data governance supports data quality efforts through the development of standard policies, practices, data standards, common definitions, etc. Data stewards implement these data standards and policies, supporting the data quality professionals.

These standards, policies, and practices lead to effective and sustainable data governance.

Finally, without business intelligence (BI) and analytics, data governance will not add any value. The value of data governance to BI and analytics is the ability to govern data from its sources to destinations in warehouses/marts, define standards for data across those stages, and promote common algorithms and calculations where appropriate. These benefits allow the organization to achieve its business goals with BI and analytics.

Gaining an EDGE on the Competition

Old-school data governance is one-sided, mainly concerned with cataloging data to support search and discovery. The lack of short-term value here often caused executive support to dwindle, so the task of DG was siloed within IT.

These issues are circumvented by using the collaborative Data Governance 2.0 approach, spreading the responsibility of DG among those who use the data. This means that data assets are recorded with more context and are of greater use to an organization.

It also means executive-level employees are more aware of data governance working as they’re involved in it, as well as seeing the extra revenue potential in optimizing data analysis streams and the resulting improvements to times to market.

We refer to this enterprise-wide, collaborative, 2.0 take on data governance as the enterprise data governance experience (EDGE). But organizational collaboration aside, the real EDGE is arguably the collaboration it facilitates between solutions. The EDGE platform recognizes the fundamental reliance data governance has on the enterprise data management methodology suite and unifies them.

By existing on one platform, and sharing one repository, organizations can guarantee their data is uniform across the organization, regardless of department.

Additionally, it drastically improves workflows by allowing for real-time updates across the platform. For example, a change to a term in the data dictionary (data governance) will be automatically reflected in all connected data models (data modeling).

Further, the EDGE integrates enterprise architecture to define application capabilities and interdependencies within the context of their connection to enterprise strategy, enabling technology investments to be prioritized in line with business goals.

Business process also is included so enterprises can clearly define, map and analyze workflows and build models to drive process improvement, as well as identify business practices susceptible to the greatest security, compliance or other risks and where controls are most needed to mitigate exposures.

Essentially, it’s the approach data governance needs to become a value-adding strategic initiative instead of an isolated effort that peters out.

To learn more about enterprise data management and getting an EDGE on GDPR and the competition, click here.

To assess your data governance readiness ahead of the GDPR, click here.

Tags data modeling, GDPR, General Data Protection Regulation, data management, data governance readiness, enterprise data management methodology, enterprise data governance experience, EDGE, data lineage, five key pillars of data governance, pillars of DG, gdpr readiness, facebook data, wall street journal, rising anxiety over data, data trust, ability to keep their data private is extremely important, more concerned about cybersecurity than a potential war, enterprise archtiecture

erwin Expert Blog

Five Pillars of Data Governance Readiness: Team Resources

Post author By Bunny Tharpe
Post date April 12, 2018
No Comments on Five Pillars of Data Governance Readiness: Team Resources

The Facebook scandal has highlighted the need for organizations to understand and apply the five pillars of data governance readiness.

All eyes were on Mark Zuckerberg this week as he testified before the U.S. Senate and Congress on Facebook’s recent data drama.

A statement from Facebook indicates that the data snare was created due to permission settings leveraged by the Facebook-linked third-party app ‘thisisyourdigitallife.’

Although the method used by Cambridge Analytica to amass personal data from 87 million Facebook users didn’t constitute a “data breach,” it’s still a major data governance (DG) issue that is now creating more than a headache for the company.

The #DeleteFacebook movement is gaining momentum, not to mention the company’s stock dip.

With Facebook’s DG woes a mainstay in global news cycles, and the General Data Protection Regulation’s (GDPR) implementation just around the corner, organizations need to get DG-ready.

During the past few weeks, the erwin Expert Blog has been exploring the five pillars of data governance readiness. So far, we’ve covered initiative sponsorship and organizational support. Today, we talk team resources.

Facebook and the Data Governance Awakening

Most organizations lack the enterprise-level experience required to advance a data governance initiative.

This function may be called by another name (e.g., data management, information management, enterprise data management, etc.), a successful organization recognizes the need for managing data as an enterprise asset.

Data governance, as a foundational component of enterprise data management, would reside within such a group.

You would think an organization like Facebook would have this covered. However, it doesn’t appear that they did.

The reason Facebook is in hot water is because the platform allowed ‘thisisyourdigitallife’ to capture personal data from the Facebook friends of those who used the app, increasing the scope of the data snare by an order of magnitude.

For context, it took only 53 Australian ‘thisisyourdigitallife’ users to capture 310,000 Australian citizens’ data.

Facebook’s permission settings essentially enabled ‘thisisyourdigitallife’ users to consent on behalf of their friends. Had GDPR been in effect, Facebook would have been non-compliant.

Even so, the extent of the PR fallout demonstrates that regulatory compliance shouldn’t be the only driver for implementing data governance.

Understanding who has access to data and what that data can be used for is a key use case for data governance. This considered, it’s not difficult to imagine how a more robust DG program could have covered Facebook’s back.

Data governance is concerned with units of data – what are they used for, what are the associated risks, and what value do they have to the business? In addition, DG asks who is responsible for the data – who has access? And what is the data lineage?

It acts as the filter that makes data more discoverable to those who need it, while shutting out those without the required permissions.

The Five Pillars of Data Governance: #3 Team Resources

Data governance can’t be executed as a short-term fix. It must be an on-going, strategic initiative that the entire organization supports and is part of. But ideally, a fixed and formal data management group needs to oversee it.

As such, we consider team resources one of the key pillars of data governance readiness.

Data governance requires leadership with experience to ensure the initiative is a value-adding success, not the stifled, siloed programs associated with data governance of old (Data Governance 1.0).

Without experienced leadership, different arms of the organization will likely pull in different directions, undermining the uniformity of data that DG aims to introduce. If such experience doesn’t exist within the organization, then outside consultants should be tapped for their expertise.

As the main technical enabler of the practice, IT should be a key DG participant and even house the afore-mentioned data management group to oversee it. The key word here is “participant,” as the inclination to leave data governance to IT and IT alone has been a common reason for Data Governance 1.0’s struggles.

With good leadership, organizations can implement Data Governance 2.0: the collaborative, outcome-driven approach more suited to the data-driven business landscape. DG 2.0 avoids the pitfalls of its predecessor by expanding the practice beyond IT and traditional data stewards to make it an enterprise-wide responsibility.

By approaching data governance in this manner, organizations ensure those with a stake in data quality (e.g., anyone who uses data) are involved in its discovery, understanding, governance and socialization.

This leads to data with greater context, accuracy and trust. It also hastens decision-making and times to market, resulting in fewer bottlenecks in data analysis.

We refer to this collaborative approach to data governance as the enterprise data governance experience (EDGE).

Back to Facebook. If they had a more robust data governance program, the company could have discovered the data snare exploited by Cambridge Analytica and circumvented the entire scandal (and all its consequences).

But for data governance to be successful, organizations must consider team resources as well as enterprise data management methodology and delivery capability (we’ll cover the latter two in the coming weeks).

To determine your organization’s current state of data governance readiness, take the erwin DG RediChek.

To learn more about how to leverage data governance for GDPR compliance and an EDGE on the competition, click here.