Hello from my home office! I hope you and your family are staying safe, practicing social distancing, and of course, washing your hands.
These are indeed strange days. During this coronavirus emergency, we are all being deluged by data from politicians, government agencies, news outlets, social media and websites, including valid facts but also opinions and rumors.
Happily for us data geeks, the general public is being told how important our efforts and those of data scientists are to analyzing, mapping and ultimately shutting down this pandemic.
Yay, data geeks!
Unfortunately though, not all of the incoming information is of equal value, ethically sourced, rigorously prepared or even good.
As we work to protect the health and safety of those around us, we need to understand the nuances of meaning for the received information as well as the motivations of information sources to make good decisions.
On a very personal level, separating the good information from the bad becomes a matter of life and potential death. On a business level, decisions based on bad external data may have the potential to cause business failures.
In business, data is the food that feeds the body or enterprise. Better data makes the body stronger and provides a foundation for the use of analytics and data science tools to reduce errors in decision-making. Ultimately, it gives our businesses the strength to deliver better products and services to our customers.
How then, as a business, can we ensure that the data we consume is of good quality?
Distancing from Third-Party Data
Just as we are practicing social distancing in our personal lives, so too we must practice data distancing in our professional lives.
In regard to third-party data, we should ask ourselves: How was the data created? What formulas were used? Does the definition (description, classification, allowable range of values, etc.) of incoming, individual data elements match our internal definitions of those data elements?
If we reflect on the coronavirus example, we can ask: How do individual countries report their data? Do individual countries use the same testing protocols? Are infections universally defined the same way (based on widely administered tests or only hospital admissions)? Are asymptomatic infections reported? Are all countries using the same methods and formulas to collect and calculate infections, recoveries and deaths?
In our businesses, it is vital that we work to develop a deeper understanding of the sources, methods and quality of incoming third-party data. This deeper understanding will help us make better decisions about the risks and rewards of using that external data.
Data Governance Methods for Data Distancing
We’ve received lots of instructions lately about how to wash our hands to protect ourselves from coronavirus. Perhaps we thought we already knew how to wash our hands, but nonetheless, a refresher course has been worthwhile.
Similarly, perhaps we think we know how to protect our business data, but maybe a refresher would be useful here as well?
Here are a few steps you can take to protect your business:
- Establish comprehensive third-party data sharing guidelines (for both inbound and outbound data). These guidelines should include communicating with third parties about how they make changes to collection and calculation methods.
- Rationalize external data dictionaries to our internal data dictionaries and understand where differences occur and how we will overcome those differences.
- Ingest to a quarantined area where it can be profiled and measured for quality, completeness, and correctness, and where necessary, cleansed.
- Periodically review all data ingestion or data-sharing policies, processes and procedures to ensure they remain aligned to business needs and goals.
- Establish data-sharing training programs so all data stakeholders understand associated security considerations, contextual meaning, and when and when not to share and/or ingest third-party data.
erwin Data Intelligence for Data Governance and Distancing
With solutions like those in the erwin Data Intelligence Suite (erwin DI), organizations can auto-document their metadata; classify their data with respect to privacy, contractual and regulatory requirements; attach data-sharing and management policies; and implement an appropriate level of data security.
If you believe the management of your third-party data interfaces could benefit from a review or tune-up, feel free to reach out to me and my colleagues here at erwin.
We’d be happy to provide a demo of how to use erwin DI for data distancing.