Tag: data vault

Benefits of Data Vault Automation

Post author By Danny Sandwell
Post date September 26, 2019
No Comments on Benefits of Data Vault Automation

The benefits of Data Vault automation from the more abstract – like improving data integrity – to the tangible – such as clearly identifiable savings in cost and time.

So Seriously … You Should Automate Your Data Vault

By Danny Sandwell

Data Vault is a methodology for architecting and managing data warehouses in complex data environments where new data types and structures are constantly introduced.

Without Data Vault, data warehouses are difficult and time consuming to change causing latency issues and slowing time to value. In addition, the queries required to maintain historical integrity are complex to design and run slow causing performance issues and potentially incorrect results because the ability to understand relationships between historical snap shots of data is lacking.

In his blog, Dan Linstedt, the creator of Data Vault methodology, explains that Data Vaults “are extremely scalable, flexible architectures” enabling the business to grow and change without “the agony and pain of high costs, long implementation and test cycles, and long lists of impacts across the enterprise warehouse.”

With a Data Vault, new functional areas typically are added quickly and easily, with changes to existing architecture taking less than half the traditional time with much less impact on the downstream systems, he notes.

Astonishingly, nearly 20 years since the methodology’s creation, most Data Vault design, development and deployment phases are still handled manually. But why?

Traditional manual efforts to define the Data Vault population and create ETL code from scratch can take weeks or even months. The entire process is time consuming slowing down the data pipeline and often riddled with human errors.

On the flipside, automating the development and deployment of design changes and the resulting data movement processing code ensures companies can accelerate dev and deployment in a timely and cost-effective manner.

Benefits of Data Vault Automation – A Case Study …

Global Pharma Company Saves Considerable Time and Money with Data Vault Automation

Let’s take a look at a large global pharmaceutical company that switched to Data Vault automation with staggering results.

Like many pharmaceutical companies, it manages a massive data warehouse combining clinical trial, supply chain and other mission-critical data. They had chosen a Data Vault schema for its flexibility in handling change but found creating the hubs and satellite structure incredibly laborious.

They needed to accelerate development, as well as aggregate data from different systems for internal customers to access and share. Additionally, the company needed lineage and traceability for regulatory compliance efforts.

With this ability, they can identify data sources, transformations and usage to safeguard protected health information (PHI) for clinical trials.

After an initial proof of concept, they deployed erwin Data Vault Automation and generated more than 200 tables, jobs and processes with 10 to 12 scripts. The highly schematic structure of the models enabled large portions of the modeling process to be automated, dramatically accelerating Data Vault projects and optimizing data warehouse management.

erwin Data Vault Automation helped this pharma customer automate the complete lifecycle – accelerating development while increasing consistency, simplicity and flexibility – to save considerable time and money.

For this customer the benefits of data vault automation were as such:

Saving an estimated 70% of the costs of manual development
Generating 95% of the production code with “zero touch,” improving the time to business value and significantly reduced costly re-work associated with error-prone manual processes
Increasing data integrity, including for new requirements and use cases regardless of changes to the warehouse structure because legacy source data doesn’t degrade
Creating a sustainable approach to Data Vault deployment, ensuring the agile, adaptable and timely delivery of actionable insights to the business in a well-governed facility for regulatory compliance, including full transparency and ease of auditability

Homegrown Tools Never Provide True Data Vault Automation

Many organizations use some form of homegrown tool or standalone applications. However, they don’t integrate with other tools and components of the architecture, they’re expensive, and quite frankly, they make it difficult to derive any meaningful results.

erwin Data Vault Automation centralizes the specification and deployment of Data Vault architectures for better control and visibility of the software development lifecycle. erwin Data Catalog makes it easy to discover, organize, curate and govern data being sourced for and managed in the warehouse.

With this solution, users select data sets to be included in the warehouse and fully automate the loading of Data Vault structures and ETL operations.

erwin Data Vault Smart Connectors eliminate the need for a business analyst and ETL developers to repeat mundane tasks, so they can focus on choosing and using the desired data instead. This saves considerable development time and effort plus delivers a high level of standardization and reuse.

After the Data Vault processes have been automated, the warehouse is well documented with traceability from the marts back to the operational data to speed the investigation of issues and analyze the impact of changes.

Bottom line: if your Data Vault integration is not automated, you’re already behind.

If you’d like to get started with erwin Data Vault Automation or request a quote, you can email consulting@erwin.com.

Tags data vault, data vault integration, data vault architectures, data vault smart connectors, data vault schema, data vault design, data vault methodology, data vault automation, ETL code, data catalog, data modeler, data warehouse, data modeling

Data Intelligence Data Governance erwin Expert Blog

Demystifying Data Lineage: Tracking Your Data’s DNA

Post author By Danny Sandwell
Post date November 1, 2018
No Comments on Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

erwin Expert Blog

Top 10 Reasons to Automate Data Mapping and Data Preparation

Post author By Mariann McDonagh
Post date October 11, 2018
1 Comment on Top 10 Reasons to Automate Data Mapping and Data Preparation

Data preparation is notorious for being the most time-consuming area of data management. It’s also expensive.

“Surveys show the vast majority of time is spent on this repetitive task, with some estimates showing it takes up as much as 80% of a data professional’s time,” according to Information Week. And a Trifacta study notes that overreliance on IT resources for data preparation costs organizations billions.

The power of collecting your data can come in a variety of forms, but most often in IT shops around the world, it comes in a spreadsheet, or rather a collection of spreadsheets often numbering in the hundreds or thousands.

Most organizations, especially those competing in the digital economy, don’t have enough time or money for data management using manual processes. And outsourcing is also expensive, with inevitable delays because these vendors are dependent on manual processes too.

Taking the Time and Pain Out of Data Preparation: 10 Reasons to Automate Data Preparation/Data Mapping

Governance and Infrastructure

Data governance and a strong IT infrastructure are critical in the valuation, creation, storage, use, archival and deletion of data. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there is an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A design platform that allows for insights like data lineage, impact analysis, full history capture, and other data management features can provide a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault, or a traditional warehouse.

Eliminating Human Error

In the traditional data management organization, excel spreadsheets are used to manage the incoming data design, or what is known as the “pre-ETL” mapping documentation – this does not lend to any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to creating accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Completeness

The ability to scan and import from a broad range of sources and formats, as well as automated change tracking, means that you will always be able to import your data from wherever it lives and track all of the changes to that data over time.

Adaptability

Centralized design, immediate lineage and impact analysis, and change activity logging means that you will always have the answer readily available, or a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

Accuracy

Out-of-the-box capabilities to map your data from source to report, make reconciliation and validation a snap, with auditability and traceability built-in. Build a full array of validation rules that can be cross checked with the design mappings in a centralized repository.

Timeliness

The ability to be agile and reactive is important – being good at being reactive doesn’t sound like a quality that deserves a pat on the back, but in the case of regulatory requirements, it is paramount.

Comprehensiveness

Access to all of the underlying metadata, source-to-report design mappings, source and target repositories, you have the power to create reports within your reporting layer that have a traceable origin and can be easily explained to both IT, business, and regulatory stakeholders.

Clarity

The requirements inform the design, the design platform puts those to action, and the reporting structures are fed the right data to create the right information at the right time via nearly any reporting platform, whether mainstream commercial or homegrown.

Frequency

Adaptation is the key to meeting any frequency interval. Centralized designs, automated ETL patterns that feed your database schemas and reporting structures will allow for cyclical changes to be made and implemented in half the time of using conventional means. Getting beyond the spreadsheet, enabling pattern-based ETL, and schema population are ways to ensure you will be ready, whenever the need arises to show an audit trail of the change process and clearly articulate who did what and when through the system development lifecycle.

Business-Friendly

A user interface designed to be business-friendly means there’s no need to be a data integration specialist to review the common practices outlined and “passively enforced” throughout the tool. Once a process is defined, rules implemented, and templates established, there is little opportunity for error or deviation from the overall process. A diverse set of role-based security options means that everyone can collaborate, learn and audit while maintaining the integrity of the underlying process components.

Faster, More Accurate Analysis with Fewer People

What if you could get more accurate data preparation 50% faster and double your analysis with less people?

erwin Mapping Manager (MM) is a patented solution that automates data mapping throughout the enterprise data integration lifecycle, providing data visibility, lineage and governance – freeing up that 80% of a data professional’s time to put that data to work.

With erwin MM, data integration engineers can design and reverse-engineer the movement of data implemented as ETL/ELT operations and stored procedures, building mappings between source and target data assets and designing the transformation logic between them. These designs then can be exported to most ETL and data asset technologies for implementation.

erwin MM is 100% metadata-driven and used to define and drive standards across enterprise integration projects, enable data and process audits, improve data quality, streamline downstream work flows, increase productivity (especially over geographically dispersed teams) and give project teams, IT leadership and management visibility into the ‘real’ status of integration and ETL migration projects.

If an automated data preparation/mapping solution sounds good to you, please check out erwin MM here.

erwin Expert Blog

Data Education Month: Data-Focused Organizations Continue Their March

Post author By Bunny Tharpe
Post date March 28, 2017
No Comments on Data Education Month: Data-Focused Organizations Continue Their March

In the modern world, data education is immensely important.

Data has become a fundamental part of how businesses operate. It’s also essential to consumers in going about their day-to-day lives.

And while organizations and consumers alike go about their business, data constantly ticks in the background, enabling the systems and processes that keep the world functioning.

Considering this, and with March marking Data Education Month, now seems the perfect time to highlight the importance of understanding data’s potential, its drawbacks and the most efficient ways to ensure its effective management.

In 2013, the total amount of data in the world was believed to have reached 4.4 zettabytes. For context, 1 zettabyte is equivalent to around 44 trillion gigabytes, or about 152 million years of UHD 8K video format.

By 2020, analysts predict the world’s data will reach 44 zettabytes. The sudden acceleration is truly staggering, and it’s businesses driving it.

Start-ups that find new ways to exploit data can revolutionize markets almost overnight. And as the frequency in which this happens increases, more and more pre-established businesses are also putting resources behind digital innovation.

By now, businesses should be more than aware of just how important a good data management strategy is. If you’ve yet to make a data strategy a central focus of the way your business operates, then chances are, you’re being left behind – and the gap is widening quickly.

So in honor of data education month, we’ve collated some of our top educational data posts, and a few others around the Web.

Read, comment, share and celebrate #DataEducationMonth with us.

Data Education: Data Management

Managing Any Data, Anywhere with Any²

The acceleration in the amount of data is staggering, and can be overwhelming for businesses. You should apply the Any² approach to cope.

GDPR Guide: Preparing for the Changes

Businesses need to prepare for changes to General Data Protection Regulation (GDPR) legislation, and our GDPR guide is a great place to start.

Using EA, BP and DM to Build the Data Foundation Platform

Instead of utilizing built for purpose data management tools, businesses in the early stages of a data strategy often leverage pre-existing, make-shift software. However, the rate in which modern businesses create and store data, means these methods can be quickly outgrown.

Data Education: Data Modeling

The Data Vault Method for Modeling the Data Warehouse

How the data vault method benefits businesses by improving implementation times, and enabling data warehouse automation.

Data Modeling – What the Experts Think

Three data modeling experts share their advice, opinions and best practices for data modeling and data management strategies.

Data Education: Enterprise Architecture

Data-Driven Enterprise Architecture for Better Business Outcomes

A business outcome approach to enterprise architecture can reduce times to market, improve agility, and make the value of EA more apparent.

What’s Behind a Successful Enterprise Architecture Strategy?

Best practices to adopt to increase the likelihood of enterprise architecture’s success

Data Education: Business Process

Basics of Business Process Modeling

Business process modeling helps to standardize your processes and the ways in which people communicate, as well as to improve knowledge sharing.

Where Do I Start with Business Process Modeling?

FAQ blog providing insight from top consultants into key issues impacting the business process and enterprise architecture industries.

Tags data management, Business outcomes, data education month, data vault, Any2, GDPR, business process, enterprise architecture, data modeling

erwin Expert Blog

Data Vault Modeling & the Data Warehouse

Post author By Bunny Tharpe
Post date March 14, 2017
1 Comment on Data Vault Modeling & the Data Warehouse

The data vault method for modeling the data warehouse was born of necessity. Data warehouse projects classically have to contend with long implementation times. This means that business requirements are more likely to change in the course of the project, jeopardizing the achievement of target implementation times and costs for the project.

To improve implementation times, Dan Linstedt introduced the Data Vault method for modeling the core warehouse. The key design principle involves separating the business key, context, and relationships in distinct tables as hub, satellite, and link.

Data Vault modeling is currently the established standard for modeling the core data warehouse because of the many benefits it offers. These include the following:

Data Warehouse Pros & Cons

Data Warehouse Benefits

• Easy extensibility enables an agile project approach
• The models created are highly scalable
• The loading processes can be optimally parallelized because there are few synchronization points
• The models are easy to audit

But alongside the many benefits, Data Vault projects also present a number of challenges. These include, but are not limited to, the following:

Data Warehouse Drawbacks

• A vast increase in the number of data objects (tables, columns) as a result of separating the information types and enriching them with meta information for loading
• This gives rise to greater modeling effort comprising numerous unsophisticated mechanical tasks

How can these challenges be mastered using a standard data modeling tool?

The highly schematic structure of the models offers ideal prerequisites for generating models. This allows sizable parts of the modeling process to be automated, enabling Data Vault projects to be accelerated dramatically.

Potential for Automating Data Vault

Which specific parts of the model can be automated?

The standard architecture of a data warehouse includes the following layers:

Source system: Operational system, such as ERP or CRM systems
Staging area: This is where the data is delivered from the operational systems. The structure of the data model generally corresponds to the source system, with enhancements for documenting loading.
Core warehouse: The data from various systems is integrated here. This layer is modeled in accordance with Data Vault and is subdivided into the raw vault and business vault areas. This involves implementing all business rules in the business vault so that only very simple transformations are used in the raw vault.
Data marts: The structure of the data marts is based on the analysis requirements and is modeled as a star schema.

Both the staging area and the raw vault are very well suited for automation, as clearly defined derivation rules can be established from the preceding layer.

Should automation be implemented using a standard modeling tool or using a specialized data warehouse automation tool?

Automation potential can generally be leveraged using special automation tools.

What are the arguments in favor of using a standard tool such as the erwin Data Modeler?

Using a standard modeling tool offers many benefits:

The erwin Data Modeler generally already includes models (for example, source system), which can continue to be used
The modeling functions are highly sophisticated – for example, for comparing models and for standardization within models
A wide range of databases are supported as standard
A large number of interfaces are available for importing models from other tools
Often the tool has already been used to model source systems or other warehouses
The model range can be used to model the entire enterprise architecture, not only the
data warehouse (erwin Web Portal)
Business glossaries enable (existing) semantic information to be integrated

So far so good. But can the erwin Data Modeler generate models?

A special add-in for the erwin Data Modeler has been developed specifically to meet this requirement: MODGEN. This enables the potential for automation in erwin to be exploited to the full.

It integrates seamlessly into the erwin user interface and, in terms of operation, is heavily based on comparing models (complete compare).

MODGEN functionalities

The following specific functionalities are implemented in MODGEN:

Generation of staging and raw vault models based on the model of the preceding layer
Generation is controlled by enriching the particular preceding model with meta-information, which is stored in UDPs
Individual objects can be excluded from the generation process permanently or
interactively
Specifications for meta-columns can be integrated very easily using templates

To support a modeling process that can be repeated multiple times, during which iterative models are created or enhanced, it is essential that generation be round-trip capable.

To achieve this, the generation always performs a comparison between the source and target models and indicates any differences. These can be selected by the user and copied during generation.

The generation not only takes all the tables and columns into consideration as a matter of course (horizontal modeling), it also creates vertical model information.

This means the relationship of every generated target column to its source column as data source is documented. Source-to-target mappings can therefore be generated very easily using the model.

Integrating the source and target model into a web portal automatically makes the full impact and lineage analysis functionality available.

If you are interested in finding out more, or if you would like to experience MODGEN live, please contact our partner heureka.

Author details: Stefan Kausch, heureka e-Business GmbH
Stefan Kausch is the CEO and founder of heureka e-Business GmbH, a company focused on IT consultancy and software development.

Stefan has more than 15 years’ experience as a consultant, trainer, and educator and has developed and delivered data modeling processes and data governance initiatives for many different companies.

He has successfully executed many projects for customers, primarily developing application systems, data warehouse automation solutions and ETL processes. Stefan Kausch has in-depth knowledge of application development based on data models.

Contact:
Stefan Kausch
heureka e-Business GmbH
Untere Burghalde 69
71229 Leonberg

Tel.: 0049 7152 939310
Email: heureka@heureka.com
Web: www.heureka.com

Tags data warehouse automation, data vault, data mart, data warehouse

What Is Data Modeling?

Choosing the Right Data Modeling Tool