Tag: variety

Four Use Cases Proving the Benefits of Metadata-Driven Automation

Post author By Bunny Tharpe
Post date February 7, 2019
1 Comment on Four Use Cases Proving the Benefits of Metadata-Driven Automation

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation.

The volume and variety of data has snowballed, and so has its velocity. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down. They are time-consuming and prone to human error, making compliance, innovation and transformation initiatives more complicated, which is less than ideal in the information age.

So it’s safe to say that organizations can’t reap the rewards of their data without automation.

Data scientists and other data professionals can spend up to 80 percent of their time bogged down trying to understand source data or addressing errors and inconsistencies.

That’s time needed and better used for data analysis.

By implementing metadata-driven automation, organizations across industry can unleash the talents of their highly skilled, well paid data pros to focus on finding the goods: actionable insights that will fuel the business.

Metadata-Driven Automation in the BFSI Industry

The banking, financial services and insurance industry typically deals with higher data velocity and tighter regulations than most. This bureaucracy is rife with data management bottlenecks.

These bottlenecks are only made worse when organizations attempt to get by with systems and tools that are not purpose-built.

For example, manually managing data mappings for the enterprise data warehouse via MS Excel spreadsheets had become cumbersome and unsustainable for one BSFI company.

After embracing metadata-driven automation and custom code automation templates, it saved hundreds of thousands of dollars in code generation and development costs and achieved more work in less time with fewer resources. ROI on the automation solutions was realized within the first year.

Metadata-Driven Automation in the Pharmaceutical Industry

Despite its shortcomings, the Excel spreadsheet method for managing data mappings is common within many industries.

But with the amount of data organizations need to process in today’s business climate, this manual approach makes change management and determining end-to-end lineage a significant and time-consuming challenge.

One global pharmaceutical giant headquartered in the United States experienced such issues until it adopted metadata-driven automation. Then the pharma company was able to scan in all source and target system metadata and maintain it within a single repository. Users now view end-to-end data lineage from the source layer to the reporting layer within seconds.

On the whole, the implementation resulted in extraordinary time savings and a total cost reduction of 60 percent.

Metadata-Driven Automation in the Insurance Industry

Insurance is another industry that has to cope with high data velocity and stringent data regulations. Plus many organizations in this sector find that they’ve outgrown their systems.

For example, an insurance company using a CDMA product to centralize data mappings is probably missing certain critical features, such as versioning, impact analysis and lineage, which adds to costs, times to market and errors.

By adopting metadata-driven automation, organizations can standardize the pre-ETL data mapping process and better manage data integration through the change and release process. As a result, both internal data mapping and cross functional teams now have easy and fast web-based access to data mappings and valuable information like impact analysis and lineage.

Here is the story of a business that adopted such an approach and achieved operational excellence and a delivery time reduction by 80 percent, as well as achieving ROI within 12 months.

Metadata-Driven Automation for a Non-Profit

Another common issue cited by organizations using manual data mapping is ballooning complexity and subsequent confusion.

Any organization expanding its data-driven focus without sufficiently maturing data management initiative(s) will experience this at some point.

One of the world’s largest humanitarian organizations, with millions of members and volunteers operating all over the world, was confronted with this exact issue.

It recognized the need for a solution to standardize the pre-ETL data mapping process to make data integration more efficient and cost-effective.

With metadata-driven automation, the organization would be able to scan and store metadata and data dictionaries in a central repository, as well as manage the business definitions and data dictionary for legacy systems contributing data to the enterprise data warehouse.

By adopting such an approach, the organization realized time savings across all IT development and cross-functional testing teams. Additionally, they were able to more easily manage mappings, code sets, reference data and data validation rules.

Again, ROI was achieved within a year.

A Universal Solution for Metadata-Driven Automation

Metadata-driven automation is a capability any organization can benefit from – regardless of industry, as demonstrated by the various real-world use cases chronicled here.

The erwin Automation Framework is a key component of the erwin EDGE platform for comprehensive data management and data governance.

With it, data professionals realize these industry-agnostic benefits:

Centralized and standardized code management with all automation templates stored in a governed repository
Better quality code and minimized rework
Business-driven data movement and transformation specifications
Superior data movement job designs based on best practices
Greater agility and faster time-to-value in data preparation, deployment and governance
Cross-platform support of scripting languages and data movement technologies

Learn more about metadata-driven automation as it relates to data preparation and enterprise data mapping.

Join one our weekly erwin Mapping Manager demos.

erwin Expert Blog

SQL, NoSQL or NewSQL: Evaluating Your Database Options

Post author By Prashant Parikh
Post date August 31, 2017
1 Comment on SQL, NoSQL or NewSQL: Evaluating Your Database Options

A common question in the modern data management space involves database technology: SQL, NoSQL or NewSQL?

But there isn’t a one-size-fits-all answer. What’s “right” must be evaluated on a case-by-case basis and is dependent on data maturity.

For example, a large bookstore chain with a big-data initiative would be stifled by a SQL database. The advantages that could be gained from analyzing social media data (for popular books, consumer buying habits) couldn’t be realized effectively through sequential analysis. There’s too much data involved in this approach, with too many threads to follow.

However, an independent bookstore isn’t necessarily bound to a big-data approach because it may not have a mature data strategy. It might not have ventured beyond digitizing customer records, and a SQL database is sufficient for that work.

Having said that, the “SQL, NoSQL or NewSQL” question is gaining prominence because businesses are becoming increasingly data-driven.

In 2019, an IDC study found 85% of enterprise decision-makers said they had a time frame of two years to make significant inroads into digital transformation or they will fall behind their competitors and suffer financially. Furthermore, a Progress study showed that 85% of enterprise decision-makers feel they only have two years to make significant digital-transformation progress before suffering financially and/or falling behind competitors.

Considering these statistics, what better time than now to evaluate your database technology? The “SQL, NoSQL or NewSQL question,” is especially important if you intend to become more data-driven.

SQL, NoSQL or NewSQL: Advantages and Disadvantages

SQL

SQL databases are tried and tested, proven to work on disks using interfaces with which businesses are already familiar.

As the longest-standing type of database, plenty of SQL options are available. This competitive market means you’ll likely find what you’re looking for at affordable prices.

Additionally, businesses in the earlier stages of data maturity are more likely to have a SQL database at work already, meaning no new investments need to be made.

However in the modern digital business context, SQL databases weren’t made to support the the three Vs of data. The volume is too high, the variety of sources is too vast, and the velocity (speed at which the data must be processed) is too great to be analyzed in sequence.

Furthermore, the foundational, legacy IT world they were purpose-built to serve has evolved. Now, corporate IT departments must be agile, and their databases must be agile and scalable to match.

NoSQL

Despite its name, “NoSQL” doesn’t mean the complete absence of the SQL database approach. Rather, it works as more of a hybrid. The term is a contraction of “not only SQL.”

So, in addition to the advantage of continuity that staying with SQL offers, NoSQL enjoys many of the benefits of SQL databases.

The key difference is that NoSQL databases were developed with modern IT in mind. They are scalable, agile and purpose-built to deal with disparate, high-volume data.

Hence, data is typically more readily available and can be changed, stored or handle the insertion of new data more easily.

For example, MongoDB, one of the key players in the NoSQL world, uses JavaScript Object Notation (JSON). As the company explains, “A JSON database returns query results that can be easily parsed, with little or no transformation.” The open, human- and machine-readable standard facilitates data interchange and can store records, “just as tables and rows store records in a relational database.”

Generally, NoSQL databases are better equipped to deal with other non-relational data too. As well as JSON, NoSQL supports log messages, XML and unstructured documents. This support avoids the lethargic “schema-on-write,” opting to “schema-on-read” instead.

NewSQL

NewSQL refers to databases based on the relational (SQL) database and SQL query language. In an attempt to solve some of the problems of SQL, the likes of VoltDB and others take a best-of-both-worlds approach, marrying the familiarity of SQL with the scalability and agile enablement of NoSQL.

However, as with most seemingly win-win opportunities, NewSQL isn’t without its caveats. These vary from vendor to vendor, but in essence, you either have to sacrifice familiarity side or scalability.

If you’d like to speak with someone at erwin about SQL, NoSQL or NewSQL in more detail, click here.

For more industry advice, subscribe to the erwin Expert Blog.

erwin Expert Blog

Data Modeling is Changing – Time to Make NoSQL Technology a Priority

Post author By Bunny Tharpe
Post date August 3, 2017
No Comments on Data Modeling is Changing – Time to Make NoSQL Technology a Priority

As the amount of data enterprises are tasked with managing increases, the benefits of NoSQL technology are becoming more apparent.

erwin Expert Blog

Data Modeling in a Jargon-filled World – NoSQL/NewSQL

Post author By Kevin Schofield
Post date July 13, 2017
No Comments on Data Modeling in a Jargon-filled World – NoSQL/NewSQL

In the first two posts of this series, we focused on the “volume” and “velocity” of Big Data, respectively. In this post, we’ll cover “variety,” the third of Big Data’s “three Vs.” In particular, I plan to discuss NoSQL and NewSQL databases and their implications for data modeling.

As the volume and velocity of data available to organizations continues to rapidly increase, developers have chafed under the performance shackles of traditional relational databases and SQL.

An astonishing array of database solutions have arisen during the past decade to provide developers with higher performance solutions for various aspects of managing their application data. These have been collectively labeled as NoSQL databases.

Originally NoSQL meant that “no SQL” was required to interface with the database. In many cases, developers viewed this as a positive characteristic.

However, SQL is very useful for some tasks, with many organizations having rich SQL skillsets. Consequently, as more organizations demanded SQL as an option to complement some of the new NoSQL databases, the term NoSQL evolved to mean “not only SQL.” This way, SQL capabilities can be leveraged alongside other non-traditional characteristics.

Among the most popular of these new NoSQL options are document databases like MongoDB. MongoDB offers the flexibility to vary fields from document to document and change structure over time. Document databases typically store data in JSON-like documents, making it easy to map to objects in application code.

As the scale of NoSQL deployments in some organizations has rapidly grown, it has become increasingly important to have access to enterprise-grade tools to support modeling and management of NoSQL databases and to incorporate such databases into the broader enterprise data modeling and governance fold.

While document databases, key-value databases, graph databases and other types of NoSQL databases have added valuable options for developers to address various challenges posed by the “three Vs,” they did so largely by compromising consistency in favor of availability and speed, instead offering “eventual consistency.” Consequently, most NoSQL stores lack true ACID transactions, though there are exceptions, such as Aerospike and MarkLogic.

But some organizations are unwilling or unable to forgo consistency and transactional requirements, giving rise to a new class of modern relational database management systems (RDBMS) that aim to guarantee ACIDity while also providing the same level of scalability and performance offered by NoSQL databases.

NewSQL databases are typically designed to operate using a shared nothing architecture. VoltDB is one prominent example of this emerging class of ACID-compliant NewSQL RDBMS. The logical design for NewSQL database schemas is similar to traditional RDBMS schema design, and thus, they are well supported by popular enterprise-grade data modeling tools such as erwin DM.

Whatever mixture of databases your organization chooses to deploy for your OLTP requirements on premise and in the cloud – RDBMS, NoSQL and/or NewSQL – it’s as important as ever for data-driven organizations to be able to model their data and incorporate it into an overall architecture.

When it comes to organizations’ analytics requirements, including data that may be sourced from a wide range of NoSQL, NewSQL RDBMS and unstructured sources, leading organizations are adopting a variety of approaches, including a hybrid approach that many refer to as Managed Data Lakes.

Please join us next time for the fourth installment in our series: Data Modeling in a Jargon-filled World – Managed Data Lakes.

erwin Expert Blog

Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Post author By Bunny Tharpe
Post date July 6, 2017
No Comments on Multi-tenancy vs. Single-tenancy: Have We Reached the Multi-Tenant Tipping Point?

Multi-tenancy vs. single-tenancy cloud hosting

The multi-tenancy vs. single-tenancy hosting debate has raged for years. Businesses’ differing demands have led to a stalemate, with certain industries more likely to lean one way than the other.

But with advancements in cloud computing and storage infrastructure, the stalemate could be at the beginning of its end.

To understand why multi-tenancy hosting is gaining traction over single-tenancy, it’s important to understand the fundamental differences.

Multi-Tenancy vs. Single-Tenancy

Gartner defines multi-tenancy as: “A reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated.”

The setup is comparable to that of a bank. The bank houses the assets of all customers in one place, but each customer’s assets are stored separately and securely from one another. Yet every bank customer still uses the same services, systems and processes to access the assets that belong to him/her.

The single-tenancy counterpart removes the shared infrastructure element described above. It operates on a one customer (tenant) per instance basis.

The trouble with the single-tenancy approach is that those servers are maintained separately by the host. And of course, this comes with costs – time as well as money – and customers have to foot the bill.

Additionally, the single-tenancy model involves tenants drawing from the power of a single infrastructure. Businesses with thorough Big Data strategies (of which numbers are increasing), need to be able to deal with a wide variety of data sources. The data is often high in volume, and must be processed at increasingly high velocities (more on the Three Vs of Big Data here).

Such businesses need greater ‘elasticity’ to operate efficiently, with ‘elasticity’ referring to the ability to scale resources up and down as required.

Along with cost savings and greater elasticity, multi-tenancy is also primed to make things easier for the tenant from the ground up. The host upgrades systems on the back-end, with updates instantly available to tenants. Maintenance is handled on the host side as well, and only one set of code is needed for delivering, greatly increasing the speed at which new updates can be made.

Given these considerations, it’s hard to fathom why the debate over multi-tenancy vs. single-tenancy has waged for so long.

Diminishing Multi-Tenancy Concerns

The advantages of cost savings, scalability and the ability to focus on improving the business, rather than up-keep, would seem to pique the interest of any business leader.

But the situation is more nuanced than that. Although all businesses would love to take advantage of multi-tenancy’s obvious advantages, shared infrastructure remains a point of contention for some.

Fears about host data breaches are valid and flanked by externally dictated downtime.

But these fears are now increasingly alleviated by sound reassurances. Multi-tenancy hosting initially spun out of single-tenancy hosting, and the fact it wasn’t built for purpose left gaps.

However, we’re now witnessing a generation of purpose-built, multi-tenancy approaches that address the aforementioned fears.

Server offloading means maintenance can happen without tenant downtime and widespread service disruption.

Internal policies and improvements in the way data is managed and siloed on a tenant-by-tenant basis serve to squash security concerns.

Of course, shared infrastructure will still be a point of contention in some industries, but we’re approaching a tipping point as evidenced by the success of such multi-tenancy hosts as Salesforce.

Through solid multi-tenancy strategy, Salesforce has dominated the CRM market, outstripping the growth of its contemporaries. Analysts expect further growth this year to match the uptick in cloud adoption.

What are your thoughts on multi-tenancy vs. single tenancy hosting?

erwin Expert Blog

Data Modeling in a Jargon-filled World – Internet of Things (IoT)

Post author By Kevin Schofield
Post date June 29, 2017
No Comments on Data Modeling in a Jargon-filled World – Internet of Things (IoT)

In the first post of this blog series, we focused on jargon related to the “volume” aspect of Big Data and its impact on data modeling and data-driven organizations. In this post, we’ll focus on “velocity,” the second of Big Data’s “three Vs.”

In particular, we’re going to explore the Internet of Things (IoT), the constellation of web-connected devices, vehicles, buildings and related sensors and software. It’s a great time for this discussion too, as IoT devices are proliferating at a dizzying pace in both number and variety.

Though IoT devices typically generate small “chunks” of data, they often do so at a rapid pace, hence the term “velocity.” Some of these devices generate data from multiple sensors for each time increment. For example, we recently worked with a utility that embedded sensors in each transformer in its electric network and then generated readings every 4 seconds for voltage, oil pressure and ambient temperature, among others.

While the transformer example is just one of many, we can quickly see two key issues that arise when IoT devices are generating data at high velocity. First, organizations need to be able to process this data at high speed. Second, organizations need a strategy to manage and integrate this never-ending data stream. Even small chunks of data will accumulate into large volumes if they arrive fast enough, which is why it’s so important for businesses to have a strong data management platform.

It’s worth noting that the idea of managing readings from network-connected devices is not new. In industries like utilities, petroleum and manufacturing, organizations have used SCADA systems for years, both to receive data from instrumented devices to help control processes and to provide graphical representations and some limited reporting.

More recently, many utilities have introduced smart meters in their electricity, gas and/or water networks to make the collection of meter data easier and more efficient for a utility company, as well as to make the information more readily available to customers and other stakeholders.

For example, you may have seen an energy usage dashboard provided by your local electric utility, allowing customers to view graphs depicting their electricity consumption by month, day or hour, enabling each customer to make informed decisions about overall energy use.

Seems simple and useful, but have you stopped to think about the volume of data underlying this feature? Even if your utility only presents information on an hourly basis, if you consider that it’s helpful to see trends over time and you assume that a utility with 1.5 million customers decides to keep these individual hourly readings for 13 months for each customer, then we’re already talking about over 14 billion individual readings for this simple example (1.5 million customers x 13 months x over 30 days/month x 24 hours/day).

Now consider the earlier example I mentioned of each transformer in an electrical grid with sensors generating multiple readings every 4 seconds. You can get a sense of the cumulative volume impact of even very small chunks of data arriving at high speed.

With experts estimating the IoT will consist of almost 50 billion devices by 2020, businesses across every industry must prepare to deal with IoT data.

But I have good news because IoT data is generally very simple and easy to model. Each connected device typically sends one or more data streams with each having a value for the type of reading and the time at which it occurred. Historically, large volumes of simple sensor data like this were best stored in time-series databases like the very popular PI System from OSIsoft.

While this continues to be true for many applications, alternative architectures, such as storing the raw sensor readings in a data lake, are also being successfully implemented. Though organizations need to carefully consider the pros and cons of home-grown infrastructure versus time-tested industrial-grade solutions like the PI System.

Regardless of how raw IoT data is stored once captured, the real value of IoT for most organizations is only realized when IoT data is “contextualized,” meaning it is modeled in the context of the broader organization.

The value of modeled data eclipses that of “edge analytics” (where the value is inspected by a software program while inflight from the sensor, typically to see if it falls within an expected range, and either acted upon if required or allowed simply to pass through) or simple reporting like that in the energy usage dashboard example.

It is straightforward to represent a reading of a particular type from a particular sensor or device in a data model or process model. It starts to get interesting when we take it to the next step and incorporate entities into the data model to represent expected ranges – both for readings under various conditions and representations of how the devices relate to one another.

If the utility in the transformer example has modeled that IoT data well, it might be able to prevent a developing problem with a transformer and also possibly identify alternate electricity paths to isolate the problem before it has an impact on network stability and customer service.

Hopefully this overview of IoT in the utility industry helps you see how your organization can incorporate high-velocity IoT data to become more data-driven and therefore more successful in achieving larger corporate objectives.

Subscribe and join us next time for Data Modeling in a Jargon-filled World – NoSQL/NewSQL.

erwin Expert Blog

Data Modeling in a Jargon-filled World – Big Data & MPP

Post author By Kevin Schofield
Post date June 16, 2017
No Comments on Data Modeling in a Jargon-filled World – Big Data & MPP

By now, you’ve likely heard a lot about Big Data. You may have even heard about “the three Vs” of Big Data. Originally defined by Gartner, “Big Data is “high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.”

erwin Expert Blog

Why the NoSQL Database is a Necessary Step

Post author By Bunny Tharpe
Post date March 16, 2017
1 Comment on Why the NoSQL Database is a Necessary Step

The NoSQL database is gaining huge traction and for good reason.

Traditionally, most organizations have leveraged relational databases to manage their data. Relational databases ensure the referential integrity, constraints, normalization and structured access for data across disparate tools, which is why they’re so widely used.

But as with any technology, evolving trends and requirements eventually push the limits of capability and suitability for emerging business use cases.

New data sources, characterized by increased volume, variety and velocity have exposed limitations in the strict relational approach to managing data. These characteristics require a more flexible approach to the storage and provisioning of data assets that can support these new forms of data with the agility and scalability they demand.

Technology – specifically data – has changed the way organizations operate. Lower development costs are allowing start ups and smaller business to grow far quicker. In turn, this leads to less stable markets and more frequent disruptions.

As more and more organizations look to cut their own slice of the data pie, businesses are more focused on in-house development than ever.

This is where relational data modeling becomes somewhat of a stumbling block.

Rise of the NoSQL Database

More and more, application developers are turning to the NoSQL database.

The NoSQL database is a more flexible approach that enables increased agility in development teams. Data models can be evolved on the fly to account for changing application requirements.

This enables businesses to adopt an agile system to releasing new iterations and code. They’re scalable and object oriented, and can also handle large volumes of structured, semi-structured and unstructured data.

Due to the growing deployment of NoSQL and the fact that our customers need the same tools to manage them as their relational databases, erwin is excited to announce the availability of a beta program for our new erwin DM for NoSQL product.

With our new erwin DM NoSQL option, we’re the only provider to help you model, govern and manage your unstructured cloud data just like any other traditional database in your business.

Building new cloud-based apps running on MongoDB?
Migrating from a relational database to MongoDB or the reverse?
Want to ensure that all your data is governed by a logical enterprise model, no matter where its located?

Then erwin DM NoSQL is the right solution for you. Click here to apply for our erwin DM NoSQL/MongoDB beta program now.

And look for more info here on the power and potential of NoSQL databases in the coming weeks.