Tag: velocity

SQL, NoSQL or NewSQL: Evaluating Your Database Options

Post author By Prashant Parikh
Post date August 31, 2017
1 Comment on SQL, NoSQL or NewSQL: Evaluating Your Database Options

A common question in the modern data management space involves database technology: SQL, NoSQL or NewSQL?

But there isn’t a one-size-fits-all answer. What’s “right” must be evaluated on a case-by-case basis and is dependent on data maturity.

For example, a large bookstore chain with a big-data initiative would be stifled by a SQL database. The advantages that could be gained from analyzing social media data (for popular books, consumer buying habits) couldn’t be realized effectively through sequential analysis. There’s too much data involved in this approach, with too many threads to follow.

However, an independent bookstore isn’t necessarily bound to a big-data approach because it may not have a mature data strategy. It might not have ventured beyond digitizing customer records, and a SQL database is sufficient for that work.

Having said that, the “SQL, NoSQL or NewSQL” question is gaining prominence because businesses are becoming increasingly data-driven.

In 2019, an IDC study found 85% of enterprise decision-makers said they had a time frame of two years to make significant inroads into digital transformation or they will fall behind their competitors and suffer financially. Furthermore, a Progress study showed that 85% of enterprise decision-makers feel they only have two years to make significant digital-transformation progress before suffering financially and/or falling behind competitors.

Considering these statistics, what better time than now to evaluate your database technology? The “SQL, NoSQL or NewSQL question,” is especially important if you intend to become more data-driven.

SQL, NoSQL or NewSQL: Advantages and Disadvantages

SQL

SQL databases are tried and tested, proven to work on disks using interfaces with which businesses are already familiar.

As the longest-standing type of database, plenty of SQL options are available. This competitive market means you’ll likely find what you’re looking for at affordable prices.

Additionally, businesses in the earlier stages of data maturity are more likely to have a SQL database at work already, meaning no new investments need to be made.

However in the modern digital business context, SQL databases weren’t made to support the the three Vs of data. The volume is too high, the variety of sources is too vast, and the velocity (speed at which the data must be processed) is too great to be analyzed in sequence.

Furthermore, the foundational, legacy IT world they were purpose-built to serve has evolved. Now, corporate IT departments must be agile, and their databases must be agile and scalable to match.

NoSQL

Despite its name, “NoSQL” doesn’t mean the complete absence of the SQL database approach. Rather, it works as more of a hybrid. The term is a contraction of “not only SQL.”

So, in addition to the advantage of continuity that staying with SQL offers, NoSQL enjoys many of the benefits of SQL databases.

The key difference is that NoSQL databases were developed with modern IT in mind. They are scalable, agile and purpose-built to deal with disparate, high-volume data.

Hence, data is typically more readily available and can be changed, stored or handle the insertion of new data more easily.

For example, MongoDB, one of the key players in the NoSQL world, uses JavaScript Object Notation (JSON). As the company explains, “A JSON database returns query results that can be easily parsed, with little or no transformation.” The open, human- and machine-readable standard facilitates data interchange and can store records, “just as tables and rows store records in a relational database.”

Generally, NoSQL databases are better equipped to deal with other non-relational data too. As well as JSON, NoSQL supports log messages, XML and unstructured documents. This support avoids the lethargic “schema-on-write,” opting to “schema-on-read” instead.

NewSQL

NewSQL refers to databases based on the relational (SQL) database and SQL query language. In an attempt to solve some of the problems of SQL, the likes of VoltDB and others take a best-of-both-worlds approach, marrying the familiarity of SQL with the scalability and agile enablement of NoSQL.

However, as with most seemingly win-win opportunities, NewSQL isn’t without its caveats. These vary from vendor to vendor, but in essence, you either have to sacrifice familiarity side or scalability.

If you’d like to speak with someone at erwin about SQL, NoSQL or NewSQL in more detail, click here.

For more industry advice, subscribe to the erwin Expert Blog.

erwin Expert Blog

Data Modeling is Changing – Time to Make NoSQL Technology a Priority

Post author By Bunny Tharpe
Post date August 3, 2017
No Comments on Data Modeling is Changing – Time to Make NoSQL Technology a Priority

As the amount of data enterprises are tasked with managing increases, the benefits of NoSQL technology are becoming more apparent.

erwin Expert Blog

Data Modeling in a Jargon-filled World – NoSQL/NewSQL

Post author By Kevin Schofield
Post date July 13, 2017
No Comments on Data Modeling in a Jargon-filled World – NoSQL/NewSQL

In the first two posts of this series, we focused on the “volume” and “velocity” of Big Data, respectively. In this post, we’ll cover “variety,” the third of Big Data’s “three Vs.” In particular, I plan to discuss NoSQL and NewSQL databases and their implications for data modeling.

As the volume and velocity of data available to organizations continues to rapidly increase, developers have chafed under the performance shackles of traditional relational databases and SQL.

An astonishing array of database solutions have arisen during the past decade to provide developers with higher performance solutions for various aspects of managing their application data. These have been collectively labeled as NoSQL databases.

Originally NoSQL meant that “no SQL” was required to interface with the database. In many cases, developers viewed this as a positive characteristic.

However, SQL is very useful for some tasks, with many organizations having rich SQL skillsets. Consequently, as more organizations demanded SQL as an option to complement some of the new NoSQL databases, the term NoSQL evolved to mean “not only SQL.” This way, SQL capabilities can be leveraged alongside other non-traditional characteristics.

Among the most popular of these new NoSQL options are document databases like MongoDB. MongoDB offers the flexibility to vary fields from document to document and change structure over time. Document databases typically store data in JSON-like documents, making it easy to map to objects in application code.

As the scale of NoSQL deployments in some organizations has rapidly grown, it has become increasingly important to have access to enterprise-grade tools to support modeling and management of NoSQL databases and to incorporate such databases into the broader enterprise data modeling and governance fold.

While document databases, key-value databases, graph databases and other types of NoSQL databases have added valuable options for developers to address various challenges posed by the “three Vs,” they did so largely by compromising consistency in favor of availability and speed, instead offering “eventual consistency.” Consequently, most NoSQL stores lack true ACID transactions, though there are exceptions, such as Aerospike and MarkLogic.

But some organizations are unwilling or unable to forgo consistency and transactional requirements, giving rise to a new class of modern relational database management systems (RDBMS) that aim to guarantee ACIDity while also providing the same level of scalability and performance offered by NoSQL databases.

NewSQL databases are typically designed to operate using a shared nothing architecture. VoltDB is one prominent example of this emerging class of ACID-compliant NewSQL RDBMS. The logical design for NewSQL database schemas is similar to traditional RDBMS schema design, and thus, they are well supported by popular enterprise-grade data modeling tools such as erwin DM.

Whatever mixture of databases your organization chooses to deploy for your OLTP requirements on premise and in the cloud – RDBMS, NoSQL and/or NewSQL – it’s as important as ever for data-driven organizations to be able to model their data and incorporate it into an overall architecture.

When it comes to organizations’ analytics requirements, including data that may be sourced from a wide range of NoSQL, NewSQL RDBMS and unstructured sources, leading organizations are adopting a variety of approaches, including a hybrid approach that many refer to as Managed Data Lakes.

Please join us next time for the fourth installment in our series: Data Modeling in a Jargon-filled World – Managed Data Lakes.

erwin Expert Blog

Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Post author By Bunny Tharpe
Post date July 6, 2017
No Comments on Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Multi-tenancy vs. single-tenancy cloud hosting

The multi-tenancy vs. single-tenancy hosting debate has raged for years. Businesses’ differing demands have led to a stalemate, with certain industries more likely to lean one way than the other.

But with advancements in cloud computing and storage infrastructure, the stalemate could be at the beginning of its end.

To understand why multi-tenancy hosting is gaining traction over single-tenancy, it’s important to understand the fundamental differences.

Multi-Tenancy vs. Single-Tenancy

Gartner defines multi-tenancy as: “A reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated.”

The setup is comparable to that of a bank. The bank houses the assets of all customers in one place, but each customer’s assets are stored separately and securely from one another. Yet every bank customer still uses the same services, systems and processes to access the assets that belong to him/her.

The single-tenancy counterpart removes the shared infrastructure element described above. It operates on a one customer (tenant) per instance basis.

The trouble with the single-tenancy approach is that those servers are maintained separately by the host. And of course, this comes with costs – time as well as money – and customers have to foot the bill.

Additionally, the single-tenancy model involves tenants drawing from the power of a single infrastructure. Businesses with thorough Big Data strategies (of which numbers are increasing), need to be able to deal with a wide variety of data sources. The data is often high in volume, and must be processed at increasingly high velocities (more on the Three Vs of Big Data here).

Such businesses need greater ‘elasticity’ to operate efficiently, with ‘elasticity’ referring to the ability to scale resources up and down as required.

Along with cost savings and greater elasticity, multi-tenancy is also primed to make things easier for the tenant from the ground up. The host upgrades systems on the back-end, with updates instantly available to tenants. Maintenance is handled on the host side as well, and only one set of code is needed for delivering, greatly increasing the speed at which new updates can be made.

Given these considerations, it’s hard to fathom why the debate over multi-tenancy vs. single-tenancy has waged for so long.

Diminishing Multi-Tenancy Concerns

The advantages of cost savings, scalability and the ability to focus on improving the business, rather than up-keep, would seem to pique the interest of any business leader.

But the situation is more nuanced than that. Although all businesses would love to take advantage of multi-tenancy’s obvious advantages, shared infrastructure remains a point of contention for some.

Fears about host data breaches are valid and flanked by externally dictated downtime.

But these fears are now increasingly alleviated by sound reassurances. Multi-tenancy hosting initially spun out of single-tenancy hosting, and the fact it wasn’t built for purpose left gaps.

However, we’re now witnessing a generation of purpose-built, multi-tenancy approaches that address the aforementioned fears.

Server offloading means maintenance can happen without tenant downtime and widespread service disruption.

Internal policies and improvements in the way data is managed and siloed on a tenant-by-tenant basis serve to squash security concerns.

Of course, shared infrastructure will still be a point of contention in some industries, but we’re approaching a tipping point as evidenced by the success of such multi-tenancy hosts as Salesforce.

Through solid multi-tenancy strategy, Salesforce has dominated the CRM market, outstripping the growth of its contemporaries. Analysts expect further growth this year to match the uptick in cloud adoption.

What are your thoughts on multi-tenancy vs. single tenancy hosting?

erwin Expert Blog

Data Modeling in a Jargon-filled World – Internet of Things (IoT)

Post author By Kevin Schofield
Post date June 29, 2017
No Comments on Data Modeling in a Jargon-filled World – Internet of Things (IoT)

In the first post of this blog series, we focused on jargon related to the “volume” aspect of Big Data and its impact on data modeling and data-driven organizations. In this post, we’ll focus on “velocity,” the second of Big Data’s “three Vs.”

In particular, we’re going to explore the Internet of Things (IoT), the constellation of web-connected devices, vehicles, buildings and related sensors and software. It’s a great time for this discussion too, as IoT devices are proliferating at a dizzying pace in both number and variety.

Though IoT devices typically generate small “chunks” of data, they often do so at a rapid pace, hence the term “velocity.” Some of these devices generate data from multiple sensors for each time increment. For example, we recently worked with a utility that embedded sensors in each transformer in its electric network and then generated readings every 4 seconds for voltage, oil pressure and ambient temperature, among others.

While the transformer example is just one of many, we can quickly see two key issues that arise when IoT devices are generating data at high velocity. First, organizations need to be able to process this data at high speed. Second, organizations need a strategy to manage and integrate this never-ending data stream. Even small chunks of data will accumulate into large volumes if they arrive fast enough, which is why it’s so important for businesses to have a strong data management platform.

It’s worth noting that the idea of managing readings from network-connected devices is not new. In industries like utilities, petroleum and manufacturing, organizations have used SCADA systems for years, both to receive data from instrumented devices to help control processes and to provide graphical representations and some limited reporting.

More recently, many utilities have introduced smart meters in their electricity, gas and/or water networks to make the collection of meter data easier and more efficient for a utility company, as well as to make the information more readily available to customers and other stakeholders.

For example, you may have seen an energy usage dashboard provided by your local electric utility, allowing customers to view graphs depicting their electricity consumption by month, day or hour, enabling each customer to make informed decisions about overall energy use.

Seems simple and useful, but have you stopped to think about the volume of data underlying this feature? Even if your utility only presents information on an hourly basis, if you consider that it’s helpful to see trends over time and you assume that a utility with 1.5 million customers decides to keep these individual hourly readings for 13 months for each customer, then we’re already talking about over 14 billion individual readings for this simple example (1.5 million customers x 13 months x over 30 days/month x 24 hours/day).

Now consider the earlier example I mentioned of each transformer in an electrical grid with sensors generating multiple readings every 4 seconds. You can get a sense of the cumulative volume impact of even very small chunks of data arriving at high speed.

With experts estimating the IoT will consist of almost 50 billion devices by 2020, businesses across every industry must prepare to deal with IoT data.

But I have good news because IoT data is generally very simple and easy to model. Each connected device typically sends one or more data streams with each having a value for the type of reading and the time at which it occurred. Historically, large volumes of simple sensor data like this were best stored in time-series databases like the very popular PI System from OSIsoft.

While this continues to be true for many applications, alternative architectures, such as storing the raw sensor readings in a data lake, are also being successfully implemented. Though organizations need to carefully consider the pros and cons of home-grown infrastructure versus time-tested industrial-grade solutions like the PI System.

Regardless of how raw IoT data is stored once captured, the real value of IoT for most organizations is only realized when IoT data is “contextualized,” meaning it is modeled in the context of the broader organization.

The value of modeled data eclipses that of “edge analytics” (where the value is inspected by a software program while inflight from the sensor, typically to see if it falls within an expected range, and either acted upon if required or allowed simply to pass through) or simple reporting like that in the energy usage dashboard example.

It is straightforward to represent a reading of a particular type from a particular sensor or device in a data model or process model. It starts to get interesting when we take it to the next step and incorporate entities into the data model to represent expected ranges – both for readings under various conditions and representations of how the devices relate to one another.

If the utility in the transformer example has modeled that IoT data well, it might be able to prevent a developing problem with a transformer and also possibly identify alternate electricity paths to isolate the problem before it has an impact on network stability and customer service.

Hopefully this overview of IoT in the utility industry helps you see how your organization can incorporate high-velocity IoT data to become more data-driven and therefore more successful in achieving larger corporate objectives.

Subscribe and join us next time for Data Modeling in a Jargon-filled World – NoSQL/NewSQL.

erwin Expert Blog

Data Modeling in a Jargon-filled World – Big Data & MPP

Post author By Kevin Schofield
Post date June 16, 2017
No Comments on Data Modeling in a Jargon-filled World – Big Data & MPP

By now, you’ve likely heard a lot about Big Data. You may have even heard about “the three Vs” of Big Data. Originally defined by Gartner, “Big Data is “high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.”

erwin Expert Blog

Why the NoSQL Database is a Necessary Step

Post author By Bunny Tharpe
Post date March 16, 2017
1 Comment on Why the NoSQL Database is a Necessary Step

The NoSQL database is gaining huge traction and for good reason.

Traditionally, most organizations have leveraged relational databases to manage their data. Relational databases ensure the referential integrity, constraints, normalization and structured access for data across disparate tools, which is why they’re so widely used.

But as with any technology, evolving trends and requirements eventually push the limits of capability and suitability for emerging business use cases.

New data sources, characterized by increased volume, variety and velocity have exposed limitations in the strict relational approach to managing data. These characteristics require a more flexible approach to the storage and provisioning of data assets that can support these new forms of data with the agility and scalability they demand.

Technology – specifically data – has changed the way organizations operate. Lower development costs are allowing start ups and smaller business to grow far quicker. In turn, this leads to less stable markets and more frequent disruptions.

As more and more organizations look to cut their own slice of the data pie, businesses are more focused on in-house development than ever.

This is where relational data modeling becomes somewhat of a stumbling block.

Rise of the NoSQL Database

More and more, application developers are turning to the NoSQL database.

The NoSQL database is a more flexible approach that enables increased agility in development teams. Data models can be evolved on the fly to account for changing application requirements.

This enables businesses to adopt an agile system to releasing new iterations and code. They’re scalable and object oriented, and can also handle large volumes of structured, semi-structured and unstructured data.

Due to the growing deployment of NoSQL and the fact that our customers need the same tools to manage them as their relational databases, erwin is excited to announce the availability of a beta program for our new erwin DM for NoSQL product.

With our new erwin DM NoSQL option, we’re the only provider to help you model, govern and manage your unstructured cloud data just like any other traditional database in your business.

Building new cloud-based apps running on MongoDB?
Migrating from a relational database to MongoDB or the reverse?
Want to ensure that all your data is governed by a logical enterprise model, no matter where its located?

Then erwin DM NoSQL is the right solution for you. Click here to apply for our erwin DM NoSQL/MongoDB beta program now.

And look for more info here on the power and potential of NoSQL databases in the coming weeks.