Categories
erwin Expert Blog

Data Governance 2.0: Biggest Data Shakeups to Watch in 2018

This year we’ll see some huge changes in how we collect, store and use data, with Data Governance 2.0 at the epicenter. For many organizations, these changes will be reactive, as they have to adapt to new regulations. Others will use regulatory change as a catalyst to be proactive with their data. Ideally, you’ll want to be in the latter category.

Data-driven businesses and their relevant industries are experiencing unprecedented rates of change.

Not only has the amount of data exploded in recent years, we’re now seeing the amount of insights data provides increase too. In essence, we’re finding smaller units of data more useful, but also collecting more than ever before.

At present, data opportunities are seemingly boundless, and we’ve barely begun to scratch the surface. So here are some of the biggest data shakeups to expect in 2018.

2018 data governance 2.0

GDPR

The General Data Protection Regulation (GDPR) has organizations scrambling. Penalties for non-compliance go into immediate effect on May 25, with hefty fines – up to €20 million or 4 percent of the company’s global annual turnover, whichever is greater.

Although it’s a European mandate, the fact is that all organizations trading with Europe, not just those based within the continent, must comply. Because of this, we’re seeing a global effort to introduce new policies, procedures and systems to prepare on a scale we haven’t seen since Y2K.

It’s easy to view mandated change of this nature as a burden. But the change is well overdue – both from a regulatory and commercial point of view.

In terms of regulation, a globalized approach had to be introduced. Data doesn’t adhere to borders in the same way as physical materials, and conflicting standards within different states, countries and continents have made sufficient regulation difficult.

In terms of business, many organizations have stifled their digital transformation efforts to become data-driven, neglecting to properly govern the data that would enable it. GDPR requires a collaborative approach to data governance (DG), and when done right, will add value as well as achieve compliance.

Rise of Data Governance 2.0

Data Governance 1.0 has failed to gain a foothold because of its siloed, un-collaborative nature. It lacks focus on business outcomes, so business leaders have struggled to see the value in it. Therefore, IT has been responsible for cataloging data elements to support search and discovery, yet they rarely understand the data’s context due to being removed from the operational side of the business. This means data is often incomplete and of poor quality, making effective data-driven business impossible.

Company-wide responsibility for data governance, encouraged by the new standards of regulation, stand to fundamentally change the way businesses view data governance. Data Governance 2.0 and its collaborative approach will become the new normal, meaning those with the most to gain from data and its insights will be directly involved in its governance.

This means more buy-in from C-level executives, line managers, etc. It means greater accountability, as well as improved discoverability and traceability. Most of all, it means better data quality that leads to faster, better decisions made with more confidence.

Escalated Digital Transformation

Digital transformation and its prominence won’t diminish this year. In fact, thanks to Data Governance 2.0, digital transformation is poised to accelerate – not slow down.

Organizations that commit to data governance beyond just compliance will reap the rewards. With a stronger data governance foundation, organizations undergoing digital transformation will enjoy a number of significant benefits, including better decision making, greater operational efficiency, improved data understanding and lineage, greater data quality, and increased revenue.

Data-driven exemplars, such as Amazon, Airbnb and Uber, have enjoyed these benefits, using them to disrupt and then dominate their respective industries. But you don’t have to be Amazon-sized to achieve them. De-siloing DG and treating it as a strategic initiative is the first step to data-driven success.

Data as Valuable Asset

Data became more valuable than oil in 2017. Yet despite this assessment, many businesses neglect to treat their data as a prized asset. For context, the Industrial Revolution was powered by machinery that had to be well-maintained to function properly, as downtime would result in loss. Such machinery adds value to a business, so it is inherently valuable.

Fast forward to 2018 with data at center stage. Because data is the value driver, the data itself is valuable. Just because it doesn’t have a physical presence doesn’t mean it is any less important than physical assets. So businesses will need to change how they perceive their data, and this is the year in which this thinking is likely to change.

DG-Enabled AI and IoT

Artificial Intelligence (AI) and the Internet of Things (IoT) aren’t new concepts. However, they’re yet to be fully realized with businesses still competing to carve a slice out of these markets.

As the two continue to expand, they will hypercharge the already accelerating volume of data – specifically unstructured data – to almost unfathomable levels. The three Vs of data tend to escalate in unison. As the volume increases, so does the velocity and speed at which data must be processed. The variety of data – mostly unstructured in these cases – also increases, so to manage it, businesses will need to put effective data governance in place.

Alongside strong data governance practices, more and more businesses will turn to NoSQL databases to manage diverse data types.

For more best practices in business and IT alignment, and successfully implementing Data Governance 2.0, click here.

Data governance is everyone's business

Categories
erwin Expert Blog

Digital Trust: Enterprise Architecture and the Farm Analogy

With the General Data Protection Regulation (GDPR) taking effect soon, organizations can use it as a catalyst in developing digital trust.

Data breaches are increasing in scope and frequency, creating PR nightmares for the organizations affected. The more data breaches, the more news coverage that stays on consumers’ minds.

The Equifax breach and subsequent stock price fall was well documented and should serve as a warning to businesses and how they manage their data. Large or small,  organizations have lessons to learn when it comes to building and maintaining digital trust, especially with GDPR looming ever closer.

Previously, we discussed the importance of fostering a relationship of trust between business and consumer.  Here, we focus more specifically on data keepers and the public.

Digital Tust: Data Farm

Digital Trust and The Farm Analogy

Any approach to mitigating the risks associated with data management needs to consider the ‘three Vs’: variety, velocity and volume.

In describing best practices for handling data, let’s imagine data as an asset on a farm. The typical farm’s wide span makes constant surveillance impossible, similar in principle to data security.

With a farm, you can’t just put a fence around the perimeter and then leave it alone. The same is true of data because you need a security approach that makes dealing with volume and variety easier.

On a farm, that means separating crops and different types of animals. For data, segregation serves to stop those without permissions from accessing sensitive information.

And as with a farm and its seeds, livestock and other assets, data doesn’t just come in to the farm. You also must manage what goes out.

A farm has several gates allowing people, animals and equipment to pass through, pending approval. With data, gates need to make sure only the intended information filters out and that it is secure when doing so. Failure to correctly manage data transfer will leave your business in breach of GDPR and liable for a hefty fine.

Furthermore, when looking at the gates in which data enters and streams out of an organization, we must also consider the third ‘V’ – velocity, the amount of data an organization’s systems can process at any given time.

Of course, the velocity of data an organization can handle is most often tied to how efficiently a business operates. Effectively dealing with high velocities of data requires faster analysis and times to market.

However, it’s arguably a matter of security too. Although not a breach, DDOS attacks are one such vulnerability associated with data velocity.

DDOS attacks are designed to put the aforementioned data gates under pressure, ramping up the amount of data that passes through them at any one time. Organizations with the infrastructure to deal with such an attack, especially one capable of scaling to demand, will suffer less preventable down time.

Enterprise Architecture and Harvesting the Farm

Making sure you can access, understand and use your data for strategic benefit – including fostering digital trust – comes down to effective data management and governance. And enterprise architecture is a great starting point because it provides a holistic view of an organization’s capabilities, applications and systems including how they all connect.

Enterprise architecture at the core of any data-driven business will serve to identify what parts of the farm need extra protections – those fences and gates mentioned earlier.

It also makes GDPR compliance and overall data governance easier, as the first step for both is knowing where all your data is.

For more data management best practices, click here. And you can subscribe to our blog posts here.

erwin blog

Categories
erwin Expert Blog

Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

The multi-tenancy vs. single-tenancy hosting debate has raged for years. Businesses’ differing demands have led to a stalemate, with certain industries more likely to lean one way than the other.

But with advancements in cloud computing and storage infrastructure, the stalemate could be at the beginning of its end.

To understand why multi-tenancy hosting is gaining traction over single-tenancy, it’s important to understand the fundamental differences.

Multi-Tenancy vs. Single-Tenancy

Gartner defines multi-tenancy as: “A reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated.”

The setup is comparable to that of a bank. The bank houses the assets of all customers in one place, but each customer’s assets are stored separately and securely from one another. Yet every bank customer still uses the same services, systems and processes to access the assets that belong to him/her.

The single-tenancy counterpart removes the shared infrastructure element described above. It operates on a one customer (tenant) per instance basis.

The trouble with the single-tenancy approach is that those servers are maintained separately by the host. And of course, this comes with costs – time as well as money – and customers have to foot the bill.

Additionally, the single-tenancy model involves tenants drawing from the power of a single infrastructure. Businesses with thorough Big Data strategies (of which numbers are increasing), need to be able to deal with a wide variety of data sources. The data is often high in volume, and must be processed at increasingly high velocities (more on the Three Vs of Big Data here).

Such businesses need greater ‘elasticity’ to operate efficiently, with ‘elasticity’ referring to the ability to scale resources up and down as required.

Along with cost savings and greater elasticity, multi-tenancy is also primed to make things easier for the tenant from the ground up. The host upgrades systems on the back-end, with updates instantly available to tenants. Maintenance is handled on the host side as well, and only one set of code is needed for delivering, greatly increasing the speed at which new updates can be made.

Given these considerations, it’s hard to fathom why the debate over multi-tenancy vs. single-tenancy has waged for so long.

Diminishing Multi-Tenancy Concerns

The advantages of cost savings, scalability and the ability to focus on improving the business, rather than up-keep, would seem to pique the interest of any business leader.

But the situation is more nuanced than that. Although all businesses would love to take advantage of multi-tenancy’s obvious advantages, shared infrastructure remains a point of contention for some.

Fears about host data breaches are valid and flanked by externally dictated downtime.

But these fears are now increasingly alleviated by sound reassurances. Multi-tenancy hosting initially spun out of single-tenancy hosting, and the fact it wasn’t built for purpose left gaps.

However, we’re now witnessing a generation of purpose-built, multi-tenancy approaches that address the aforementioned fears.

Server offloading means maintenance can happen without tenant downtime and widespread service disruption.

Internal policies and improvements in the way data is managed and siloed on a tenant-by-tenant basis serve to squash security concerns.

Of course, shared infrastructure will still be a point of contention in some industries, but we’re approaching a tipping point as evidenced by the success of such multi-tenancy hosts as Salesforce.

Through solid multi-tenancy strategy, Salesforce has dominated the CRM market, outstripping the growth of its contemporaries. Analysts expect further growth this year to match the uptick in cloud adoption.

What are your thoughts on multi-tenancy vs. single tenancy hosting?

Data-Driven Business Transformation

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – Internet of Things (IoT)

In the first post of this blog series, we focused on jargon related to the “volume” aspect of Big Data and its impact on data modeling and data-driven organizations. In this post, we’ll focus on “velocity,” the second of Big Data’s “three Vs.”

In particular, we’re going to explore the Internet of Things (IoT), the constellation of web-connected devices, vehicles, buildings and related sensors and software. It’s a great time for this discussion too, as IoT devices are proliferating at a dizzying pace in both number and variety.

Though IoT devices typically generate small “chunks” of data, they often do so at a rapid pace, hence the term “velocity.” Some of these devices generate data from multiple sensors for each time increment. For example, we recently worked with a utility that embedded sensors in each transformer in its electric network and then generated readings every 4 seconds for voltage, oil pressure and ambient temperature, among others.

While the transformer example is just one of many, we can quickly see two key issues that arise when IoT devices are generating data at high velocity. First, organizations need to be able to process this data at high speed.  Second, organizations need a strategy to manage and integrate this never-ending data stream. Even small chunks of data will accumulate into large volumes if they arrive fast enough, which is why it’s so important for businesses to have a strong data management platform.

It’s worth noting that the idea of managing readings from network-connected devices is not new. In industries like utilities, petroleum and manufacturing, organizations have used SCADA systems for years, both to receive data from instrumented devices to help control processes and to provide graphical representations and some limited reporting.

More recently, many utilities have introduced smart meters in their electricity, gas and/or water networks to make the collection of meter data easier and more efficient for a utility company, as well as to make the information more readily available to customers and other stakeholders.

For example, you may have seen an energy usage dashboard provided by your local electric utility, allowing customers to view graphs depicting their electricity consumption by month, day or hour, enabling each customer to make informed decisions about overall energy use.

Seems simple and useful, but have you stopped to think about the volume of data underlying this feature? Even if your utility only presents information on an hourly basis, if you consider that it’s helpful to see trends over time and you assume that a utility with 1.5 million customers decides to keep these individual hourly readings for 13 months for each customer, then we’re already talking about over 14 billion individual readings for this simple example (1.5 million customers x 13 months x over 30 days/month x 24 hours/day).

Now consider the earlier example I mentioned of each transformer in an electrical grid with sensors generating multiple readings every 4 seconds. You can get a sense of the cumulative volume impact of even very small chunks of data arriving at high speed.

With experts estimating the IoT will consist of almost 50 billion devices by 2020, businesses across every industry must prepare to deal with IoT data.

But I have good news because IoT data is generally very simple and easy to model. Each connected device typically sends one or more data streams with each having a value for the type of reading and the time at which it occurred. Historically, large volumes of simple sensor data like this were best stored in time-series databases like the very popular PI System from OSIsoft.

While this continues to be true for many applications, alternative architectures, such as storing the raw sensor readings in a data lake, are also being successfully implemented. Though organizations need to carefully consider the pros and cons of home-grown infrastructure versus time-tested industrial-grade solutions like the PI System.

Regardless of how raw IoT data is stored once captured, the real value of IoT for most organizations is only realized when IoT data is “contextualized,” meaning it is modeled in the context of the broader organization.

The value of modeled data eclipses that of “edge analytics” (where the value is inspected by a software program while inflight from the sensor, typically to see if it falls within an expected range, and either acted upon if required or allowed simply to pass through) or simple reporting like that in the energy usage dashboard example.

It is straightforward to represent a reading of a particular type from a particular sensor or device in a data model or process model. It starts to get interesting when we take it to the next step and incorporate entities into the data model to represent expected ranges –  both for readings under various conditions and representations of how the devices relate to one another.

If the utility in the transformer example has modeled that IoT data well, it might be able to prevent a developing problem with a transformer and also possibly identify alternate electricity paths to isolate the problem before it has an impact on network stability and customer service.

Hopefully this overview of IoT in the utility industry helps you see how your organization can incorporate high-velocity IoT data to become more data-driven and therefore more successful in achieving larger corporate objectives.

Subscribe and join us next time for Data Modeling in a Jargon-filled World – NoSQL/NewSQL.

Data-Driven Business Transformation

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – Big Data & MPP

By now, you’ve likely heard a lot about Big Data. You may have even heard about “the three Vs” of Big Data. Originally defined by Gartner, “Big Data is “high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.”