information management Archives

Demystifying Data Lineage: Tracking Your Data’s DNA

Data Lineage: A Crucial First Step for Data Governance

Knowing what data you have and where it lives and where it came from is complicated. The lack of visibility and control around “data at rest” combined with “data in motion,” as well as difficulties with legacy architectures, means organizations spend more time finding the data they need rather than using it to produce meaningful business outcomes.

Organizations need to create and sustain an enterprise-wide view of and easy access to underlying metadata, but that’s a tall order with numerous data types and data sources that were never designed to work together and data infrastructures that have been cobbled together over time with disparate technologies, poor documentation and little thought for downstream integration. So the applications and initiatives that depend on a solid data infrastructure may be compromised, resulting in faulty analyses.

These issues can be addressed with a strong data management strategy underpinned by technology that enables the data quality the business requires, which encompasses data cataloging (integration of data sets from various sources), mapping, versioning, business rules and glossaries maintenance and metadata management (associations and lineage).

An automated, metadata-driven framework for cataloging data assets and their flows across the business provides an efficient, agile and dynamic way to generate data lineage from operational source systems (databases, data models, file-based systems, unstructured files and more) across the information management architecture; construct business glossaries; assess what data aligns with specific business rules and policies; and inform how that data is transformed, integrated and federated throughout business processes – complete with full documentation.

Centralized design, immediate lineage and impact analysis, and change-activity logging means you will always have answers readily available, or just a few clicks away. Subsets of data can be identified and generated via predefined templates, generic designs generated from standard mapping documents, and pushed via ETL process for faster processing via automation templates.

With automation, data quality is systemically assured and the data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Without such automation, business transformation will be stymied. Companies, especially large ones with thousands of systems, files and processes, will be particularly challenged by a manual approach. And outsourcing these data management efforts to professional services firms only increases costs and schedule delays.

With erwin Mapping Manager, organizations can automate enterprise data mapping and code generation for faster time-to-value and greater accuracy when it comes to data movement projects, as well as synchronize “data in motion” with data management and governance efforts.

Map data elements to their sources within a single repository to determine data lineage, deploy data warehouses and other Big Data solutions, and harmonize data integration across platforms. The web-based solution reduces the need for specialized, technical resources with knowledge of ETL and database procedural code, while making it easy for business analysts, data architects, ETL developers, testers and project managers to collaborate for faster decision-making.

Compliance First: How to Protect Sensitive Data

The ability to more efficiently govern, discover and protect sensitive data is something that all prospering data-driven organizations are constantly striving for.

It’s been almost four months since the European Union’s General Data Protection Regulation (GDPR) took effect. While no fines have been issued yet, the Information Commissioner’s Office has received upwards of 500 calls per week since the May 25 effective date.

However, the fine-free streak may be ending soon with British Airways (BA) as the first large company to pay a GDPR penalty because of a data breach. The hack at BA in August and early September lasted for more than two weeks, with intruders getting away with account numbers and personal information of customers making reservations on the carrier’s website and mobile app. If regulators conclude that BA failed to take measures to prevent the incident— a significant fine may follow.

Additionally, complaints against Google in the EU have started. For example, internet browser provider Brave claims that Google and other advertising companies expose user data during a process called “bid request.” A data breach occurs because a bid request fails to protect sensitive data against unauthorized access, which is unlawful under the GDPR.

Per Brave’s announcement, bid request data can include the following personal data:

What you are reading or watching
Your location
Description of your device
Unique tracking IDs or a “cookie match,” which allows advertising technology companies to try to identify you the next time you are seen, so that a long-term profile can be built or consolidated with offline data about you
Your IP address,depending on the version of “real-time bidding” system
Data broker segment ID, if available, which could denote things like your income bracket, age and gender, habits, social media influence, ethnicity, sexual orientation, religion, political leaning, etc., depending on the version of bidding system

Obviously, GDPR isn’t the only regulation that organizations need to comply with. From HIPAA in healthcare to FINRA, PII and BCBS in financial services to the upcoming California Consumer Privacy Act (CCPA) taking effect January 1, 2020, regulatory compliance is part of running – and staying in business.

The common denominator in compliance across all industry sectors is the ability to protect sensitive data. But if organizations are struggling to understand what data they have and where it’s located, how do they protect it? Where do they begin?

Discover and Protect Sensitive Data

Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Knowing where sensitive data is located and properly governing it with policy rules, impact analysis and lineage views is critical for risk management, data audits and regulatory compliance.

However, when key data isn’t discovered, harvested, cataloged, defined and standardized as part of integration processes, audits may be flawed and therefore putting your organization at risk.

Sensitive data – at rest or in motion – that exists in various forms across multiple systems must be automatically tagged, its lineage automatically documented, and its flows depicted so that it is easily found and its usage across workflows easily traced.

Thankfully, tools are available to help automate the scanning, detection and tagging of sensitive data by:

Monitoring and controlling sensitive data: Better visibility and control across the enterprise to identify data security threats and reduce associated risks
Enriching business data elements for sensitive data discovery: Comprehensive mechanism to define business data element for PII, PHI and PCI across database systems, cloud and Big Data stores to easily identify sensitive data based on a set of algorithms and data patterns
Providing metadata and value-based analysis: Discovery and classification of sensitive data based on metadata and data value patterns and algorithms. Organizations can define business data elements and rules to identify and locate sensitive data including PII, PHI, PCI and other sensitive information.

A Regulatory Rationale for Integrating Data Management and Data Governance

Data management and data governance, together, play a vital role in compliance. It’s easier to protect sensitive data when you know where it’s stored, what it is, and how it needs to be governed.

Truly understanding an organization’s data, including the data’s value and quality, requires a harmonized approach embedded in business processes and enterprise architecture. Such an integrated enterprise data governance experience helps organizations understand what data they have, where it is, where it came from, its value, its quality and how it’s used and accessed by people and applications.

But how is all this possible? Again, it comes back to the right technology for IT and business collaboration that will enable you to:

Discover data: Identify and interrogate metadata from various data management silos
Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source
Structure data: Connect physical metadata to specific business terms and definitions and reusable design standards
Analyze data: Understand how data relates to the business and what attributes it has
Map data flows: Identify where to integrate data and track how it moves and transforms
Govern data: Develop a governance model to manage standards and policies and set best practices
Socialize data: Enable all stakeholders to see data in one place in their own context