Categories
Data Intelligence erwin Expert Blog Data Governance

Demystifying Data Lineage: Tracking Your Data’s DNA

Getting the most out of your data requires getting a handle on data lineage. That’s knowing what data you have, where it is, and where it came from – plus understanding its quality and value to the organization.

But you can’t understand your data in a business context much less track data lineage, its physical existence and maximize its security, quality and value if it’s scattered across different silos in numerous applications.

Data lineage provides a way of tracking data from its origin to destination across its lifespan and all the processes it’s involved in. It also plays a vital role in data governance. Beyond the simple ability to know where the data came from and whether or not it can be trusted, there’s an element of statutory reporting and compliance that often requires a knowledge of how that same data (known or unknown, governed or not) has changed over time.

A platform that provides insights like data lineage, impact analysis, full-history capture, and other data management features serves as a central hub from which everything can be learned and discovered about the data – whether a data lake, a data vault or a traditional data warehouse.

In a traditional data management organization, Excel spreadsheets are used to manage the incoming data design, what’s known as the “pre-ETL” mapping documentation, but this does not provide any sort of visibility or auditability. In fact, each unit of work represented in these ‘mapping documents’ becomes an independent variable in the overall system development lifecycle, and therefore nearly impossible to learn from much less standardize.

The key to accuracy and integrity in any exercise is to eliminate the opportunity for human error – which does not mean eliminating humans from the process but incorporating the right tools to reduce the likelihood of error as the human beings apply their thought processes to the work.

Data Lineage

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

Data Lineage

Categories
erwin Expert Blog

Compliance First: How to Protect Sensitive Data

The ability to more efficiently govern, discover and protect sensitive data is something that all prospering data-driven organizations are constantly striving for.

It’s been almost four months since the European Union’s General Data Protection Regulation (GDPR) took effect. While no fines have been issued yet, the Information Commissioner’s Office has received upwards of 500 calls per week since the May 25 effective date.

However, the fine-free streak may be ending soon with British Airways (BA) as the first large company to pay a GDPR penalty because of a data breach. The hack at BA in August and early September lasted for more than two weeks, with intruders getting away with account numbers and personal information of customers making reservations on the carrier’s website and mobile app. If regulators conclude that BA failed to take measures to prevent the incident— a significant fine may follow.

Additionally, complaints against Google in the EU have started. For example, internet browser provider Brave claims that Google and other advertising companies expose user data during a process called “bid request.” A data breach occurs because a bid request fails to protect sensitive data against unauthorized access, which is unlawful under the GDPR.

Per Brave’s announcement, bid request data can include the following personal data:

  • What you are reading or watching
  • Your location
  • Description of your device
  • Unique tracking IDs or a “cookie match,” which allows advertising technology companies to try to identify you the next time you are seen, so that a long-term profile can be built or consolidated with offline data about you
  • Your IP address,depending on the version of “real-time bidding” system
  • Data broker segment ID, if available, which could denote things like your income bracket, age and gender, habits, social media influence, ethnicity, sexual orientation, religion, political leaning, etc., depending on the version of bidding system

Obviously, GDPR isn’t the only regulation that organizations need to comply with. From HIPAA in healthcare to FINRA, PII and BCBS in financial services to the upcoming California Consumer Privacy Act (CCPA) taking effect January 1, 2020, regulatory compliance is part of running – and staying in business.

The common denominator in compliance across all industry sectors is the ability to protect sensitive data. But if organizations are struggling to understand what data they have and where it’s located, how do they protect it? Where do they begin?

Protect sensitive data

Discover and Protect Sensitive Data

Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Knowing where sensitive data is located and properly governing it with policy rules, impact analysis and lineage views is critical for risk management, data audits and regulatory compliance.

However, when key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed and therefore putting your organization at risk.

Sensitive data – at rest or in motion – that exists in various forms across multiple systems must be automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its usage across workflows easily traced.

Thankfully, tools are available to help automate the scanning, detection and tagging of sensitive data by:

  • Monitoring and controlling sensitive data: Better visibility and control across the enterprise to identify data security threats and reduce associated risks
  • Enriching business data elements for sensitive data discovery: Comprehensive mechanism to define business data element for PII, PHI and PCI across database systems, cloud and Big Data stores to easily identify sensitive data based on a set of algorithms and data patterns
  • Providing metadata and value-based analysis: Discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data including PII, PHI, PCI and other sensitive information.


A Regulatory Rationale for Integrating Data Management and Data Governance

Data management and data governance, together, play a vital role in compliance. It’s easier to protect sensitive data when you know where it’s stored, what it is, and how it needs to be governed.

Truly understanding an organization’s data, including the data’s value and quality, requires a harmonized approach embedded in business processes and enterprise architecture. Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

But how is all this possible? Again, it comes back to the right technology for IT and business collaboration that will enable you to:

  • Discover data: Identify and interrogate metadata from various data management silos
  • Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source
  • Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards
  • Analyze data: Understand how data relates to the business and what attributes it has
  • Map data flows: Identify where to integrate data and track how it moves and transforms
  • Govern data: Develop a governance model to manage standards and policies and set best practices
  • Socialize data: Enable all stakeholders to see data in one place in their own context