Tag: metadata management

Healthy Co-Dependency: Data Management and Data Governance

Post author By Bunny Tharpe
Post date September 7, 2018
No Comments on Healthy Co-Dependency: Data Management and Data Governance

Data management and data governance are now more important than ever before. The hyper competitive nature of data-driven business means organizations need to get more out of their data than ever before – and fast.

A few data-driven exemplars have led the way, turning data into actionable insights that influence everything from corporate structure to new products and pricing. “Few” being the operative word.

It’s true, data-driven business is big business. Huge actually. But it’s dominated by a handful of organizations that realized early on what a powerful and disruptive force data can be.

The benefits of such data-driven strategies speak for themselves: Netflix has replaced Blockbuster, and Uber continues to shake up the taxi business. Organizations indiscriminate of industry are following suit, fighting to become the next big, disruptive players.

But in many cases, these attempts have failed or are on the verge of doing so.

Now with the General Data Protection Regulation (GDPR) in effect, data that is unaccounted for is a potential data disaster waiting to happen.

So organizations need to understand that getting more out of their data isn’t necessarily about collecting more data. It’s about unlocking the value of the data they already have.

The Enterprise Data Dilemma

However, most organizations don’t know exactly what data they have or even where some of it is. And some of the data they can account for is going to waste because they don’t have the means to process it. This is especially true of unstructured data types, which organizations are collecting more frequently.

Considering that 73 percent of company data goes unused, it’s safe to assume your organization is dealing with some if not all of these issues.

Big picture, this means your enterprise is missing out on thousands, perhaps millions in revenue.

The smaller picture? You’re struggling to establish a single source of data truth, which contributes to a host of problems:

Inaccurate analysis and discrepancies in departmental reporting
Inability to manage the amount and variety of data your organization collects
Duplications and redundancies in processes
Issues determining data ownership, lineage and access
Achieving and sustaining compliance

To avoid such circumstances and get more value out of data, organizations need to harmonize their approach to data management and data governance, using a platform of established tools that work in tandem while also enabling collaboration across the enterprise.

Data management drives the design, deployment and operation of systems that deliver operational data assets for analytics purposes.

Data governance delivers these data assets within a business context, tracking their physical existence and lineage, and maximizing their security, quality and value.

Although these two disciplines approach data from different perspectives (IT-driven and business-oriented), they depend on each other. And this co-dependency helps an organization make the most of its data.

The P-M-G Hub

Together, data management and data governance form a critical hub for data preparation, modeling and data governance. How?

It starts with a real-time, accurate picture of the data landscape, including “data at rest” in databases, data warehouses and data lakes and “data in motion” as it is integrated with and used by key applications. That landscape also must be controlled to facilitate collaboration and limit risk.

But knowing what data you have and where it lives is complicated, so you need to create and sustain an enterprise-wide view of and easy access to underlying metadata. That’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, and data analysis based on faulty insights.

However, these issues can be addressed with a strong data management strategy and technology to enable the data quality required by the business, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

Being able to pinpoint what data exists and where must be accompanied by an agreed-upon business understanding of what it all means in common terms that are adopted across the enterprise. Having that consistency is the only way to assure that insights generated by analyses are useful and actionable, regardless of business department or user exploring a question. Additionally, policies, processes and tools that define and control access to data by roles and across workflows are critical for security purposes.

These issues can be addressed with a comprehensive data governance strategy and technology to determine master data sets, discover the impact of potential glossary changes across the enterprise, audit and score adherence to rules, discover risks, and appropriately and cost-effectively apply security to data flows, as well as publish data to people/roles in ways that are meaningful to them.

Data Management and Data Governance: Play Together, Stay Together

When data management and data governance work in concert empowered by the right technology, they inform, guide and optimize each other. The result for an organization that takes such a harmonized approach is automated, real-time, high-quality data pipeline.

Then all stakeholders — data scientists, data stewards, ETL developers, enterprise architects, business analysts, compliance officers, CDOs and CEOs – can access the data they’re authorized to use and base strategic decisions on what is now a full inventory of reliable information.

The erwin EDGE creates an “enterprise data governance experience” through integrated data mapping, business process modeling, enterprise architecture modeling, data modeling and data governance. No other software platform on the market touches every aspect of the data management and data governance lifecycle to automate and accelerate the speed to actionable business insights.

Tags data modeling, enterprise architecture, GDPR, data management, data warehouse, data governance, business process modeling, data-driven business, metadata management, collaborative data governance, enterprise data governance experience, data lineage, data analysis, data lake, data mapping, data mangement, data sources, disparate data, data infrastructure

erwin Expert Blog

Solving the Enterprise Data Dilemma

Post author By Mariann McDonagh
Post date August 30, 2018
No Comments on Solving the Enterprise Data Dilemma

Due to the adoption of data-driven business, organizations across the board are facing their own enterprise data dilemmas.

This week erwin announced its acquisition of metadata management and data governance provider AnalytiX DS. The combined company touches every piece of the data management and governance lifecycle, enabling enterprises to fuel automated, high-quality data pipelines for faster speed to accurate, actionable insights.

Why Is This a Big Deal?

From digital transformation to AI, and everything in between, organizations are flooded with data. So, companies are investing heavily in initiatives to use all the data at their disposal, but they face some challenges. Chiefly, deriving meaningful insights from their data – and turning them into actions that improve the bottom line.

This enterprise data dilemma stems from three important but difficult questions to answer: What data do we have? Where is it? And how do we get value from it?

Large enterprises use thousands of unharvested, undocumented databases, applications, ETL processes and procedural code that make it difficult to gather business intelligence, conduct IT audits, and ensure regulatory compliance – not to mention accomplish other objectives around customer satisfaction, revenue growth and overall efficiency and decision-making.

The lack of visibility and control around “data at rest” combined with “data in motion”, as well as difficulties with legacy architectures, means these organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

To remedy this, enterprises need smarter and faster data management and data governance capabilities, including the ability to efficiently catalog and document their systems, processes and the associated data without errors. In addition, business and IT must collaborate outside their traditional operational silos.

But this coveted state of data nirvana isn’t possible without the right approach and technology platform.

Enterprise Data: Making the Data Management-Data Governance Love Connection

Bringing together data management and data governance delivers greater efficiencies to technical users and better analytics to business users. It’s like two sides of the same coin:

Data management drives the design, deployment and operation of systems that deliver operational and analytical data assets.
Data governance delivers these data assets within a business context, tracks their physical existence and lineage, and maximizes their security, quality and value.

Although these disciplines approach data from different perspectives and are used to produce different outcomes, they have a lot in common. Both require a real-time, accurate picture of an organization’s data landscape, including data at rest in data warehouses and data lakes and data in motion as it is integrated with and used by key applications.

However, creating and maintaining this metadata landscape is challenging because this data in its various forms and from numerous sources was never designed to work in concert. Data infrastructures have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration, so the applications and initiatives that depend on data infrastructure are often out-of-date and inaccurate, rendering faulty insights and analyses.

Organizations need to know what data they have and where it’s located, where it came from and how it got there, what it means in common business terms [or standardized business terms] and be able to transform it into useful information they can act on – all while controlling its access.

To support the total enterprise data management and governance lifecycle, they need an automated, real-time, high-quality data pipeline. Then every stakeholder – data scientist, ETL developer, enterprise architect, business analyst, compliance officer, CDO and CEO – can fuel the desired outcomes with reliable information on which to base strategic decisions.

Enterprise Data: Creating Your “EDGE”

At the end of the day, all industries are in the data business and all employees are data people. The success of an organization is not measured by how much data it has, but by how well it’s used.

Data governance enables organizations to use their data to fuel compliance, innovation and transformation initiatives with greater agility, efficiency and cost-effectiveness.

Organizations need to understand their data from different perspectives, identify how it flows through and impacts the business, aligns this business view with a technical view of the data management infrastructure, and synchronizes efforts across both disciplines for accuracy, agility and efficiency in building a data capability that impacts the business in a meaningful and sustainable fashion.

The persona-based erwin EDGE creates an “enterprise data governance experience” that facilitates collaboration between both IT and the business to discover, understand and unlock the value of data both at rest and in motion.

By bringing together enterprise architecture, business process, data mapping and data modeling, erwin’s approach to data governance enables organizations to get a handle on how they handle their data. With the broadest set of metadata connectors and automated code generation, data mapping and cataloging tools, the erwin EDGE Platform simplifies the total data management and data governance lifecycle.

This single, integrated solution makes it possible to gather business intelligence, conduct IT audits, ensure regulatory compliance and accomplish any other organizational objective by fueling an automated, high-quality and real-time data pipeline.

With the erwin EDGE, data management and data governance are unified and mutually supportive, with one hand aware and informed by the efforts of the other to:

Discover data: Identify and integrate metadata from various data management silos.
Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source.
Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards.
Analyze data: Understand how data relates to the business and what attributes it has.
Map data flows: Identify where to integrate data and track how it moves and transforms.
Govern data: Develop a governance model to manage standards and policies and set best practices.
Socialize data: Enable stakeholders to see data in one place and in the context of their roles.

An integrated solution with data preparation, modeling and governance helps businesses reach data governance maturity – which equals a role-based, collaborative data governance system that serves both IT and business users equally. Such maturity may not happen overnight, but it will ultimately deliver the accurate and actionable insights your organization needs to compete and win.

Your journey to data nirvana begins with a demo of the enhanced erwin Data Governance solution. Register now.

erwin Expert Blog

Five Pillars of Data Governance Readiness: Delivery Capability

Post author By Bunny Tharpe
Post date May 3, 2018
No Comments on Five Pillars of Data Governance Readiness: Delivery Capability

The five pillars of data governance readiness should be the starting point for implementing or revamping any DG initiative.

In a recent CSO Magazine article, “Why data governance should be corporate policy,” the author states: “Data is like water, and water is a fundamental resource for life, so data is an essential resource for the business. Data governance ensures this resource is protected and managed correctly enabling us to meet our customer’s expectations.”

Over the past few weeks, we’ve been exploring the five pillars of data governance (DG) readiness, and this week we turn our attention to the fifth and final pillar, delivery capability.

Together, the five pillars of data governance readiness work as a step-by-step guide to a successful DG implementation and ongoing initiative.

As a refresher, the first four pillars are:

The starting point is garnering initiative sponsorship from executives, before fostering support from the wider organization.

Organizations should then appoint a dedicated team to oversee and manage the initiative. Although DG is an organization-wide strategic initiative, it needs experience and leadership to guide it.

Once the above pillars are accounted for, the next step is to understand how data governance fits with the wider data management suite so that all components of a data strategy work together for maximum benefits.

And then enterprise data management methodology as a plan of action to assemble the necessary tools.

Once you’ve completed these steps, how do you go about picking the right solution for enterprise-wide data governance?

Five Pillars of Data Governance: Delivery Capability – What’s the Right Solution?

Many organizations don’t think about enterprise data governance technologies when they begin a data governance initiative. They believe that using some general-purpose tool suite like those from Microsoft can support their DG initiative. That’s simply not the case.

Selecting the proper data governance solution should be part of developing the data governance initiative’s technical requirements. However, the first thing to understand is that the “right” solution is subjective.

Data stewards work with metadata rather than data 80 percent of the time. As a result, successful and sustainable data governance initiatives are supported by a full-scale, enterprise-grade metadata management tool.

Additionally, many organizations haven’t implemented data quality products when they begin a DG initiative. Product selections, including those for data quality management, should be based on the organization’s business goals, its current state of data quality and enterprise data management, and best practices as promoted by the data quality management team.

If your organization doesn’t have an existing data quality management product, a data governance initiative can support the need for data quality and the eventual evaluation and selection of the proper data quality management product.

Enterprise data modeling is also important. A component of enterprise data architecture, it’s an enabling force in the performance of data management and successful data governance. Having the capability to manage data architecture and data modeling with the optimal products can have a positive effect on DG by providing the initiative architectural support for the policies, practices, standards and processes that data governance creates.

Finally, and perhaps most important, the lack of a formal data governance team/unit has been cited as a leading cause of DG failure. Having the capability to manage all data governance and data stewardship activities has a positive effect.

Shopping for Data Governance Technology

DG is part of a larger data puzzle. Although it’s a key enabler of data-driven business, it’s only effective in the context of the data management suite in which it belongs.

Therefore when shopping for a data governance solution, organizations should look for DG tools that unify critical data governance domains, leverage role-appropriate interfaces to bring together stakeholders and processes to support a culture committed to acknowledging data as the mission-critical asset that it is, and orchestrate the key mechanisms required to discover, fully understand, actively govern and effectively socialize and align data to the business.

Here’s an initial checklist of questions to ask in your evaluation of a DG solution. Does it support:

Relational, unstructured, on-premise and cloud data?
Business-friendly environment to build business glossaries with taxonomies of data standards?
Unified capabilities to integrate business glossaries, data dictionaries and reference data, data quality metrics, business rules and data usage policies?
Regulating data and managing data collaboration through assigned roles, business rules and responsibilities, and defined governance processes and workflows?
Viewing data dashboards, KPIs and more via configurable role-based interfaces?
Providing key integrations with enterprise architecture, business process modeling/management and data modeling?
A SaaS model for rapid deployment and low TCO?

To assess your data governance readiness, especially with the General Data Protection Regulation about to take effect, click here.

You also can try erwin DG for free. Click here to start your free trial.

Tags data modeling, GDPR, General Data Protection Regulation, data governance, data-driven business, metadata management, data quality, data governance readiness, DG readiness, five pillars of data governance, data governance should be corporate policy, DG implementation, data governance tools, comparing data governance, data governance solutions, data quality management, enterprise data architecture

erwin Expert Blog

Data Modeling in a Jargon-filled World – Managed Data Lakes

Post author By Kevin Schofield
Post date July 27, 2017
1 Comment on Data Modeling in a Jargon-filled World – Managed Data Lakes

More and more businesses are adopting managed data lakes.

Earlier in this blog series, we established that leading organizations are adopting a variety of approaches to manage data, including data that may be sourced from a wide range of NoSQL, NewSQL, RDBMS and unstructured sources.

In this post, we’ll discuss managed data lakes and their applications as a hybrid of less structured data and more traditionally structured relational data. We’ll also talk about whether there’s still a need for data modeling and metadata management.

The term Data Lake was first coined by James Dixon of Pentaho in a blog entry in which he said:

“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

Use of the term quickly took on a life of its own with often divergent meanings. So much so that four years later Mr. Dixon felt compelled to refute some criticisms by the analyst community by pointing out that they were objecting to things he actually never said about data lakes.

However, in my experience and despite Mr. Dixon’s objections, the notion that a data lake can contain data from more than one source is now widely accepted..

Similarly, while most early data lake implementations used Hadoop with many vendors pitching the idea that a data lake had to be implemented as a Hadoop data store, the notion that data lakes can be implemented on non-Hadoop platforms, such as Azure Blob storage or Amazon S3, has become increasingly widespread.

So a data lake – as the term is widely used in 2017 – is a detailed (non-aggregated) data store that can contain structured and/or non-structured data from more than one source implemented on some kind of inexpensive, massively scalable storage platform.

But what are “managed data lakes?”

To answer that question, let’s first touch on why many early data lake projects failed or significantly missed expectations. Criticisms were quick to arise, many of which were critiques of data lakes when they strayed from the original vision, as established earlier.

Vendors seized on data lakes as a marketing tool, and as often happens in our industry, they promised it could do almost anything. As long as you poured your data into the lake, people in the organization would somehow magically find exactly the data they needed just when they needed it. As is usually the case, it turned out that for most organizations, their reality was quite different. And for three important reasons:

Most large organizations’ analysts didn’t have the skillsets to wade through the rapidly accumulating pool of information in Hadoop or whichever new platforms had been chosen to implement their data lakes to locate the data they needed.
Not enough attention was paid to the need of providing metadata to help people find the data they needed.
Most interesting analytics are a result of integrating disparate data points to draw conclusions, and integration had not been an area of focus in most data lake implementations.

In the face of growing disenchantment with data lake implementations, some organizations and vendors pivoted to address these drawbacks. They did so by embracing what is most commonly called a managed data lake, though some prefer the label “curated data lake” or “modern data warehouse.”

The idea is to address the three criticisms mentioned above by developing an architectural approach that allows for the use of SQL, making data more accessible and providing more metadata about the data available in the data lake. It also takes on some of the challenging work of integration and transformation that earlier data lake implementations had hoped to kick down the road or avoid entirely.

The result in most implementations of a managed data lake is a hybrid that tries to blend the strengths of the original data lake concept with the strengths of traditional large-scale data warehousing (as opposed to the narrow data mart approach Mr. Dixon used as a foil when originally describing data lakes).

Incoming data, either structured or unstructured, can be easily and quickly loaded from many different sources (e.g., applications, IoT, third parties, etc.). The data can be accumulated with minimal processing at reasonable cost using a bulk storage platform such as Hadoop, Azure Blob storage or Amazon S3.

Then the data, which is widely used within the organization, can be integrated and made available through a SQL or SQL-like interface, such as those from Hive to Postgres to a tried-and-true commercial relational database such as SQL Server (or its cloud-based cousin Azure SQL Data Warehouse).

In this scenario, a handful of self-sufficient data scientists may wade (or swim or dive) in the surrounding data lake. However, most analysts in most organizations still spend most of their time using familiar SQL-capable tools to analyze data stored in the core of the managed data lake – an island in the lake if we really want to torture the analogy – which is typically implemented either using an RDBMS or a relational layer like Hive on top of the bulk-storage layer.

It’s important to note that these are not two discrete silos. Most major vendors have added capabilities to their database and BI offerings to enable analysis of both RDBMS-based and bulk-storage layer data through a familiar SQL interface.

This enables a much larger percentage of an organization’s analysts to access data both in the core and the less structured surrounding lake, using tools with which they’re already familiar.

As this hybrid managed data lake approach incorporates a relational core, robust data modeling capabilities are as important as ever. The same goes for data governance and a thorough focus on metadata to provide clear naming and definitions to assist in finding and linking with the most appropriate data.

This is true whether inside the structured relational core of the managed data lake or in the surrounding, more fluid data lake.

As you probably guessed from some of the links in this post, more and more managed data lakes are being implemented in the cloud. Please join us next time for the fifth installment in our series: Data Modeling in a Jargon-filled World – The Cloud.