Tag: three vs of big data

erwin Expert Blog

Every Company Requires Data Governance and Here’s Why

Post author By Bunny Tharpe
Post date August 10, 2017
1 Comment on Every Company Requires Data Governance and Here’s Why

With GDPR regulations imminent, businesses need to ensure they have a handle on data governance.

Tags data maintenance and lifecycle, master data management, volume velocity variety, data value, data quality, three vs of big data, data governance, Data protection, GDPR

erwin Expert Blog

Data Modeling is Changing – Time to Make NoSQL Technology a Priority

Post author By Bunny Tharpe
Post date August 3, 2017
No Comments on Data Modeling is Changing – Time to Make NoSQL Technology a Priority

As the amount of data enterprises are tasked with managing increases, the benefits of NoSQL technology are becoming more apparent.

erwin Expert Blog

Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Post author By Bunny Tharpe
Post date July 6, 2017
No Comments on Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Multi-tenancy vs. single-tenancy cloud hosting

The multi-tenancy vs. single-tenancy hosting debate has raged for years. Businesses’ differing demands have led to a stalemate, with certain industries more likely to lean one way than the other.

But with advancements in cloud computing and storage infrastructure, the stalemate could be at the beginning of its end.

To understand why multi-tenancy hosting is gaining traction over single-tenancy, it’s important to understand the fundamental differences.

Multi-Tenancy vs. Single-Tenancy

Gartner defines multi-tenancy as: “A reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated.”

The setup is comparable to that of a bank. The bank houses the assets of all customers in one place, but each customer’s assets are stored separately and securely from one another. Yet every bank customer still uses the same services, systems and processes to access the assets that belong to him/her.

The single-tenancy counterpart removes the shared infrastructure element described above. It operates on a one customer (tenant) per instance basis.

The trouble with the single-tenancy approach is that those servers are maintained separately by the host. And of course, this comes with costs – time as well as money – and customers have to foot the bill.

Additionally, the single-tenancy model involves tenants drawing from the power of a single infrastructure. Businesses with thorough Big Data strategies (of which numbers are increasing), need to be able to deal with a wide variety of data sources. The data is often high in volume, and must be processed at increasingly high velocities (more on the Three Vs of Big Data here).

Such businesses need greater ‘elasticity’ to operate efficiently, with ‘elasticity’ referring to the ability to scale resources up and down as required.

Along with cost savings and greater elasticity, multi-tenancy is also primed to make things easier for the tenant from the ground up. The host upgrades systems on the back-end, with updates instantly available to tenants. Maintenance is handled on the host side as well, and only one set of code is needed for delivering, greatly increasing the speed at which new updates can be made.

Given these considerations, it’s hard to fathom why the debate over multi-tenancy vs. single-tenancy has waged for so long.

Diminishing Multi-Tenancy Concerns

The advantages of cost savings, scalability and the ability to focus on improving the business, rather than up-keep, would seem to pique the interest of any business leader.

But the situation is more nuanced than that. Although all businesses would love to take advantage of multi-tenancy’s obvious advantages, shared infrastructure remains a point of contention for some.

Fears about host data breaches are valid and flanked by externally dictated downtime.

But these fears are now increasingly alleviated by sound reassurances. Multi-tenancy hosting initially spun out of single-tenancy hosting, and the fact it wasn’t built for purpose left gaps.

However, we’re now witnessing a generation of purpose-built, multi-tenancy approaches that address the aforementioned fears.

Server offloading means maintenance can happen without tenant downtime and widespread service disruption.

Internal policies and improvements in the way data is managed and siloed on a tenant-by-tenant basis serve to squash security concerns.

Of course, shared infrastructure will still be a point of contention in some industries, but we’re approaching a tipping point as evidenced by the success of such multi-tenancy hosts as Salesforce.

Through solid multi-tenancy strategy, Salesforce has dominated the CRM market, outstripping the growth of its contemporaries. Analysts expect further growth this year to match the uptick in cloud adoption.

What are your thoughts on multi-tenancy vs. single tenancy hosting?

erwin Expert Blog

Data Modeling in a Jargon-filled World – Internet of Things (IoT)

Post author By Kevin Schofield
Post date June 29, 2017
No Comments on Data Modeling in a Jargon-filled World – Internet of Things (IoT)

In the first post of this blog series, we focused on jargon related to the “volume” aspect of Big Data and its impact on data modeling and data-driven organizations. In this post, we’ll focus on “velocity,” the second of Big Data’s “three Vs.”

In particular, we’re going to explore the Internet of Things (IoT), the constellation of web-connected devices, vehicles, buildings and related sensors and software. It’s a great time for this discussion too, as IoT devices are proliferating at a dizzying pace in both number and variety.

Though IoT devices typically generate small “chunks” of data, they often do so at a rapid pace, hence the term “velocity.” Some of these devices generate data from multiple sensors for each time increment. For example, we recently worked with a utility that embedded sensors in each transformer in its electric network and then generated readings every 4 seconds for voltage, oil pressure and ambient temperature, among others.

While the transformer example is just one of many, we can quickly see two key issues that arise when IoT devices are generating data at high velocity. First, organizations need to be able to process this data at high speed. Second, organizations need a strategy to manage and integrate this never-ending data stream. Even small chunks of data will accumulate into large volumes if they arrive fast enough, which is why it’s so important for businesses to have a strong data management platform.

It’s worth noting that the idea of managing readings from network-connected devices is not new. In industries like utilities, petroleum and manufacturing, organizations have used SCADA systems for years, both to receive data from instrumented devices to help control processes and to provide graphical representations and some limited reporting.

More recently, many utilities have introduced smart meters in their electricity, gas and/or water networks to make the collection of meter data easier and more efficient for a utility company, as well as to make the information more readily available to customers and other stakeholders.

For example, you may have seen an energy usage dashboard provided by your local electric utility, allowing customers to view graphs depicting their electricity consumption by month, day or hour, enabling each customer to make informed decisions about overall energy use.

Seems simple and useful, but have you stopped to think about the volume of data underlying this feature? Even if your utility only presents information on an hourly basis, if you consider that it’s helpful to see trends over time and you assume that a utility with 1.5 million customers decides to keep these individual hourly readings for 13 months for each customer, then we’re already talking about over 14 billion individual readings for this simple example (1.5 million customers x 13 months x over 30 days/month x 24 hours/day).

Now consider the earlier example I mentioned of each transformer in an electrical grid with sensors generating multiple readings every 4 seconds. You can get a sense of the cumulative volume impact of even very small chunks of data arriving at high speed.

With experts estimating the IoT will consist of almost 50 billion devices by 2020, businesses across every industry must prepare to deal with IoT data.

But I have good news because IoT data is generally very simple and easy to model. Each connected device typically sends one or more data streams with each having a value for the type of reading and the time at which it occurred. Historically, large volumes of simple sensor data like this were best stored in time-series databases like the very popular PI System from OSIsoft.

While this continues to be true for many applications, alternative architectures, such as storing the raw sensor readings in a data lake, are also being successfully implemented. Though organizations need to carefully consider the pros and cons of home-grown infrastructure versus time-tested industrial-grade solutions like the PI System.

Regardless of how raw IoT data is stored once captured, the real value of IoT for most organizations is only realized when IoT data is “contextualized,” meaning it is modeled in the context of the broader organization.

The value of modeled data eclipses that of “edge analytics” (where the value is inspected by a software program while inflight from the sensor, typically to see if it falls within an expected range, and either acted upon if required or allowed simply to pass through) or simple reporting like that in the energy usage dashboard example.

It is straightforward to represent a reading of a particular type from a particular sensor or device in a data model or process model. It starts to get interesting when we take it to the next step and incorporate entities into the data model to represent expected ranges – both for readings under various conditions and representations of how the devices relate to one another.

If the utility in the transformer example has modeled that IoT data well, it might be able to prevent a developing problem with a transformer and also possibly identify alternate electricity paths to isolate the problem before it has an impact on network stability and customer service.

Hopefully this overview of IoT in the utility industry helps you see how your organization can incorporate high-velocity IoT data to become more data-driven and therefore more successful in achieving larger corporate objectives.

Subscribe and join us next time for Data Modeling in a Jargon-filled World – NoSQL/NewSQL.