Categories
erwin Expert Blog Data Modeling

Integrating SQL and NoSQL into Data Modeling for Greater Business Value: The Latest Release of erwin Data Modeler

SQL or NoSQL words written on white board, Big data concept

Due to the prevalence of internal and external market disruptors, many organizations are aligning their digital transformation and cloud migration efforts with other strategic requirements (e.g., compliance with the General Data Protection Regulation).

Accelerating the retrieval and analysis of data —so much of it unstructured—is vital to becoming a data-driven business that can effectively respond in real time to customers, partners, suppliers and other parties, and profit from these efforts. But even though speed is critical, businesses must take the time to model and document new applications for compliance and transparency.

For decades, data modeling has been the optimal way to design and deploy new relational databases with high-quality data sources and support application development. It facilitates communication between the business and system developers so stakeholders can understand the structure and meaning of enterprise data within a given context. Today, it provides even greater value because critical data exists in both structured and unstructured formats and lives both on premises and in the cloud.

Comparing SQL and NoSQL

While it may not be the most exciting match up, there’s much to be said when comparing SQL vs NoSQL databases. SQL databases use schemas and pre-defined tables, while NoSQL databases are the complete opposite. Instead of schemas and tables, NoSQL databases store data in ways that depend on what kind of NoSQL database is being used.

While the SQL and NoSQL worlds can complement each other in today’s data ecosystem, most enterprises need to focus on building expertise and processes for the latter format.

After all, they’ve already had decades of practice designing and managing SQL databases that emphasize storage efficiency and referential integrity rather than fast data access, which is so important to building cloud applications that deliver real-time value to staff, customers and other parties. Query-optimized modeling is the new watchword when it comes to supporting today’s fast delivery, iterative and real-time applications

DBMS products based on rigid schema requirements impede our ability to fully realize business opportunities that can expand the depth and breadth of relevant data streams for conversion into actionable information. New, business-transforming use cases often involve variable data feeds, real-time or near-time processing and analytics requirements, and the scale to process large volumes of data.

NoSQL databases, such as Couchbase and MongoDB, are purpose-built to handle the variety, velocity and volume of these new data use cases. Schema-less or dynamic schema capabilities, combined with increased processing speed and built-in scalability, make NoSQL the ideal platform.

Making the Move to NoSQL

Now the hard part. Once we’ve agreed to make the move to NoSQL, the next step is to identify the architectural and technological implications facing the folks tasked with building and maintaining these new mission-critical data sources and the applications they feed.

As the data modeling industry leader, erwin has identified a critical success factor for the majority of organizations adopting a NoSQL platform like Couchbase, Cassandra and MongoDB. Successfully leveraging this solution requires a significant paradigm shift in how we design NoSQL data structures and deploy the databases that manage them.

But as with most technology requirements, we need to shield the business from the complexity and risk associated with this new approach. The business cares little for the technical distinctions of the underlying data management “black box.”

Business data is business data, with the main concerns being its veracity and value. Accountability, transparency, quality and reusability are required, regardless. Data needs to be trusted, so decisions can be made with confidence, based on facts. We need to embrace this paradigm shift, while ensuring it fits seamlessly into our existing data management practices as well as interactions with our partners within the business. Therefore, the challenge of adopting NoSQL in an organization is two-fold: 1) mastering and managing this new technology and 2) integrating it into an expansive and complex infrastructure.

The Newest Release of erwin Data Modeler

There’s a reason erwin Data Modeler is the No.1 data modeling solution in the world.

And the newest release delivers all in one SQL and NoSQL data modeling, guided denormalization and model-driven engineering support for Couchbase, Cassandra, MongoDB, JSON and AVRO. NoSQL users get all of the great capabilities inherent in erwin Data Modeler. It also provides Data Vault modeling, enhanced productivity, and simplified administration of the data modeling repository.

Now you can rely on one solution for all your enterprise data modeling needs, working across DBMS platforms, using modern modeling techniques for faster data value, and centrally governing all data definition, data modeling and database design initiatives.

erwin data models reduce complexity, making it easier to design, deploy and understand data sources to meet business needs. erwin Data Modeler also automates and standardizes model design tasks, including complex queries, to improve business alignment, ensure data integrity and simplify integration.

In addition to the above, the newest release of erwin Data Modeler by Quest also provides:

  • Updated support and certifications for the latest versions of Oracle, MS SQL Server, MS Azure SQL and MS Azure SQL Synapse
  • JDBC-connectivity options for Oracle, MS SQL Server, MS Azure SQL, Snowflake, Couchbase, Cassandra and MongoDB
  • Enhanced administration capabilities to simplify and accelerate data model access, collaboration, governance and reuse
  • New automation, connectivity, UI and workflow optimization to enhance data modeler productivity by reducing onerous manual tasks

erwin Data Modeler is a proven technology for improving the quality and agility of an organization’s overall data capability – and that includes data governance and data intelligence.

Click here for your free trial of erwin Data Modeler.

Categories
erwin Expert Blog Data Modeling

Data Modeling Best Practices for Data-Driven Organizations

As data-driven business becomes increasingly prominent, an understanding of data modeling and data modeling best practices is crucial. This posts outlines just that, and other key questions related to data modeling such as “SQL vs. NoSQL.”

What is Data Modeling?

Data modeling is a process that enables organizations to discover, design, visualize, standardize and deploy high-quality data assets through an intuitive, graphical interface.

Data models provide visualization, create additional metadata and standardize data design across the enterprise.

As the value of data and the way it is used by organizations has changed over the years, so too has data modeling.

In the modern context, data modeling is a function of data governance.

While data modeling has always been the best way to understand complex data sources and automate design standards, modern data modeling goes well beyond these domains to accelerate and ensure the overall success of data governance in any organization.

 

 

As well as keeping the business in compliance with data regulations, data governance – and data modeling – also drive innovation.

Companies that want to advance artificial intelligence (AI) initiatives, for instance, won’t get very far without quality data and well-defined data models.

With the right approach, data modeling promotes greater cohesion and success in organizations’ data strategies.

But what is the right data modeling approach?

Data Modeling Data Goverance

Data Modeling Best Practices

The right approach to data modeling is one in which organizations can make the right data available at the right time to the right people. Otherwise, data-driven initiatives can stall.

Thanks to organizations like Amazon, Netflix and Uber, businesses have changed how they leverage their data and are transforming their business models to innovate – or risk becoming obsolete.

According to a 2018 survey by Tech Pro Research, 70 percent of survey respondents said their companies either have a digital transformation strategy in place or are working on one. And 60% of companies that have undertaken digital transformation have created new business models.

But data-driven business success doesn’t happen by accident. Organizations that adapt that strategy without the necessary processes, platforms and solutions quickly realize that data creates a lot of noise but not necessarily the right insights.

This phenomenon is perhaps best articulated through the lens of the “three Vs” of data: volume, variety and velocity.

Data Modeling Tool

Any2 Data Modeling and Navigating Data Chaos

The three Vs describe the volume (amount), variety (type) and velocity (speed at which it must be processed) of data.

Data’s value grows with context, and such context is found within data. That means there’s an incentive to generate and store higher volumes of data.

Typically, an increase in the volume of data leads to more data sources and types. And higher volumes and varieties of data become increasingly difficult to manage in a way that provides insight.

Without due diligence, the above factors can lead to a chaotic environment for data-driven organizations.

Therefore, the data modeling best practice is one that allows users to view any data from anywhere – a data governance and management best practice we dub “any-squared” (Any2).

Organizations that adopt the Any2 approach can expect greater consistency, clarity and artifact reuse across large-scale data integration, master data management, metadata management, Big Data and business intelligence/analytics initiatives.

SQL or NoSQL? The Advantages of NoSQL Data Modeling

For the most part, databases use “structured query language” (SQL) for maintaining and manipulating data. This structured approach and its proficiency in handling complex queries has led to its widespread use.

But despite the advantages of such structure, its inherent sequential nature (“this, then “this”), means it can be hard to operate holistically and deal with large amounts of data at once.

Additionally, as alluded to earlier, the nature of modern, data-driven business and the three VS means organizations are dealing with increasing amounts of unstructured data.

As such in a modern business context, the three Vs have become somewhat of an Achilles’ heel for SQL databases.

The sheer rate at which businesses collect and store data – as well as the various types of data stored – mean organizations have to adapt and adopt databases that can be maintained with greater agility.

That’s where NoSQL comes in.

Benefits of NoSQL

Despite what many might assume, adopting a NoSQL database doesn’t mean abandoning SQL databases altogether. In fact, NoSQL is actually a contraction of “not only SQL.”

The NoSQL approach builds on the traditional SQL approach, bringing old (but still relevant) ideas in line with modern needs.

NoSQL databases are scalable, promote greater agility, and handle changes to data and the storing of new data more easily.

They’re better at dealing with other non-relational data too. NoSQL supports JavaScript Object Notation (JSON), log messages, XML and unstructured documents.

Data Modeling Is Different for Every Organization

It perhaps goes without saying, but different organizations have different needs.

For some, the legacy approach to databases meets the needs of their current data strategy and maturity level.

For others, the greater flexibility offered by NoSQL databases makes NoSQL databases, and by extension NoSQL data modeling, a necessity.

Some organizations may require an approach to data modeling that promotes collaboration.

Bringing data to the business and making it easy to access and understand increases the value of data assets, providing a return-on-investment and a return-on-opportunity. But neither would be possible without data modeling providing the backbone for metadata management and proper data governance.

Whatever the data modeling need, erwin can help you address it.

erwin DM is available in several versions, including erwin DM NoSQL, with additional options to improve the quality and agility of data capabilities.

And we just announced a new version of erwin DM, with a modern and customizable modeling environment, support for Amazon Redshift; updated support for the latest DB2 releases; time-saving modeling task automation, and more.

New to erwin DM? You can try the new erwin Data Modeler for yourself for free!

erwin Data Modeler Free Trial - Data Modeling

Categories
erwin Expert Blog

Choosing the Right Data Modeling Tool

The need for an effective data modeling tool is more significant than ever.

For decades, data modeling has provided the optimal way to design and deploy new relational databases with high-quality data sources and support application development. But it provides even greater value for modern enterprises where critical data exists in both structured and unstructured formats and lives both on premise and in the cloud.

In today’s hyper-competitive, data-driven business landscape, organizations are awash with data and the applications, databases and schema required to manage it.

For example, an organization may have 300 applications, with 50 different databases and a different schema for each. Additional challenges, such as increasing regulatory pressures – from the General Data Protection Regulation (GDPR) to the Health Insurance Privacy and Portability Act (HIPPA) – and growing stores of unstructured data also underscore the increasing importance of a data modeling tool.

Data modeling, quite simply, describes the process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model. There’s an expression: measure twice, cut once. Data modeling is the upfront “measuring tool” that helps organizations reduce time and avoid guesswork in a low-cost environment.

From a business-outcome perspective, a data modeling tool is used to help organizations:

  • Effectively manage and govern massive volumes of data
  • Consolidate and build applications with hybrid architectures, including traditional, Big Data, cloud and on premise
  • Support expanding regulatory requirements, such as GDPR and the California Consumer Privacy Act (CCPA)
  • Simplify collaboration across key roles and improve information alignment
  • Improve business processes for operational efficiency and compliance
  • Empower employees with self-service access for enterprise data capability, fluency and accountability

Data Modeling Tool

Evaluating a Data Modeling Tool – Key Features

Organizations seeking to invest in a new data modeling tool should consider these four key features.

  1. Ability to visualize business and technical database structures through an integrated, graphical model.

Due to the amount of database platforms available, it’s important that an organization’s data modeling tool supports a sufficient (to your organization) array of platforms. The chosen data modeling tool should be able to read the technical formats of each of these platforms and translate them into highly graphical models rich in metadata. Schema can be deployed from models in an automated fashion and iteratively updated so that new development can take place via model-driven design.

  1. Empowering of end-user BI/analytics by data source discovery, analysis and integration. 

A data modeling tool should give business users confidence in the information they use to make decisions. Such confidence comes from the ability to provide a common, contextual, easily accessible source of data element definitions to ensure they are able to draw upon the correct data; understand what it represents, including where it comes from; and know how it’s connected to other entities.

A data modeling tool can also be used to pull in data sources via self-service BI and analytics dashboards. The data modeling tool should also have the ability to integrate its models into whatever format is required for downstream consumption.

  1. The ability to store business definitions and data-centric business rules in the model along with technical database schemas, procedures and other information.

With business definitions and rules on board, technical implementations can be better aligned with the needs of the organization. Using an advanced design layer architecture, model “layers” can be created with one or more models focused on the business requirements that then can be linked to one or more database implementations. Design-layer metadata can also be connected from conceptual through logical to physical data models.

  1. Rationalize platform inconsistencies and deliver a single source of truth for all enterprise business data.

Many organizations struggle to breakdown data silos and unify data into a single source of truth, due in large part to varying data sources and difficulty managing unstructured data. Being able to model any data from anywhere accounts for this with on-demand modeling for non-relational databases that offer speed, horizontal scalability and other real-time application advantages.

With NoSQL support, model structures from non-relational databases, such as Couchbase and MongoDB can be created automatically. Existing Couchbase and MongoDB data sources can be easily discovered, understood and documented through modeling and visualization. Existing entity-relationship diagrams and SQL databases can be migrated to Couchbase and MongoDB too. Relational schema also will be transformed to query-optimized NoSQL constructs.

Other considerations include the ability to:

  • Compare models and databases.
  • Increase enterprise collaboration.
  • Perform impact analysis.
  • Enable business and IT infrastructure interoperability.

When it comes to data modeling, no one knows it better. For more than 30 years, erwin Data Modeler has been the market leader. It is built on the vision and experience of data modelers worldwide and is the de-facto standard in data model integration.

You can learn more about driving business value and underpinning governance with erwin DM in this free white paper.

Data Modeling Drives Business Value

Categories
erwin Expert Blog

A New Wave in Application Development

Application development is new again.

The ever-changing business landscape – fueled by digital transformation initiatives indiscriminate of industry – demands businesses deliver innovative customer – and partner – facing solutions, not just tactical apps to support internal functions.

Therefore, application developers are playing an increasingly important role in achieving business goals. The financial services sector is a notable example, with companies like JPMorgan Chase spending millions on emerging fintech like online and mobile tools for opening accounts and completing transactions, real-time stock portfolio values, and electronic trading and cash management services.

But businesses are finding that creating market-differentiating applications to improve the customer experience, and subsequently customer satisfaction, requires some significant adjustments. For example, using non-relational database technologies, building another level of development expertise, and driving optimal data performance will be on their agendas.

Of course, all of this must be done with a focus on data governance – backed by data modeling – as the guiding principle for accurate, real-time analytics and business intelligence (BI).

Evolving Application Development Requirements

The development organization must identify which systems, processes and even jobs must evolve to meet demand. The factors it will consider include agile development, skills transformation and faster querying.

Rapid delivery is the rule, with products released in usable increments in sprints as part of ongoing, iterative development. Developers can move from conceptual models for defining high-level requirements to creating low-level physical data models to be incorporated directly into the application logic. This route facilitates dynamic change support to drive speedy baselining, fast-track sprint development cycles and quick application scaling. Logical modeling then follows.

Application Development

Agile application development usually goes hand in hand with using NoSQL databases, so developers can take advantage of more pliable data models. This technology has more dynamic and flexible schema design than relational databases and supports whatever data types and query options an application requires, processing efficiency, and scalability and performance suiting Big Data and new-age apps’ real-time requirements. However, NoSQL skills aren’t widespread so specific tools for modeling unstructured data in NoSQL databases can help staff used to RDBMS ramp up.

Finally, the shift to agile development and NoSQL technology as part of more complex data architectures is driving another shift. Storage-optimized models are moving to the backlines because a new format is available to support real-time app development. It is one that understands what’s being asked of the data and enables schemes to be structured to support application data access requirements for speedy responses to complex queries.

The NoSQL Paradigm

erwin DM NoSQL takes into account all the requirements for the new application development era. In addition to its modeling tools, the solution includes patent-pending Query-Optimized ModelingTM that replaces storage-optimized modeling, giving users guidance to build schemas for optimal performance for NoSQL applications.

erwin DM NoSQL also embraces an “any-squared” approach to data management, so “any data” from “anywhere” can be visualized for greater understanding. And the solution now supports the Couchbase Data Platform in addition to MongoDB. Used in conjunction with erwin DG, businesses also can be assured that agility, speed and flexibility will not take precedence over the equally important need to stringently manage data.

With all this in place, enterprises will be positioned to deliver unique, real-time and responsive apps to enhance the customer experience and support new digital-transformation opportunities. At the same time, they’ll be able to preserve and extend the work they’ve already done in terms of maintaining well-governed data assets.

For more information about how to realize value from app development in the age of digital transformation with the help of data modeling and data governance, you can download our new e-book: Application Development Is New Again.

Categories
erwin Expert Blog

Data Governance 2.0: Biggest Data Shakeups to Watch in 2018

This year we’ll see some huge changes in how we collect, store and use data, with Data Governance 2.0 at the epicenter. For many organizations, these changes will be reactive, as they have to adapt to new regulations. Others will use regulatory change as a catalyst to be proactive with their data. Ideally, you’ll want to be in the latter category.

Data-driven businesses and their relevant industries are experiencing unprecedented rates of change.

Not only has the amount of data exploded in recent years, we’re now seeing the amount of insights data provides increase too. In essence, we’re finding smaller units of data more useful, but also collecting more than ever before.

At present, data opportunities are seemingly boundless, and we’ve barely begun to scratch the surface. So here are some of the biggest data shakeups to expect in 2018.

2018 data governance 2.0

GDPR

The General Data Protection Regulation (GDPR) has organizations scrambling. Penalties for non-compliance go into immediate effect on May 25, with hefty fines – up to €20 million or 4 percent of the company’s global annual turnover, whichever is greater.

Although it’s a European mandate, the fact is that all organizations trading with Europe, not just those based within the continent, must comply. Because of this, we’re seeing a global effort to introduce new policies, procedures and systems to prepare on a scale we haven’t seen since Y2K.

It’s easy to view mandated change of this nature as a burden. But the change is well overdue – both from a regulatory and commercial point of view.

In terms of regulation, a globalized approach had to be introduced. Data doesn’t adhere to borders in the same way as physical materials, and conflicting standards within different states, countries and continents have made sufficient regulation difficult.

In terms of business, many organizations have stifled their digital transformation efforts to become data-driven, neglecting to properly govern the data that would enable it. GDPR requires a collaborative approach to data governance (DG), and when done right, will add value as well as achieve compliance.

Rise of Data Governance 2.0

Data Governance 1.0 has failed to gain a foothold because of its siloed, un-collaborative nature. It lacks focus on business outcomes, so business leaders have struggled to see the value in it. Therefore, IT has been responsible for cataloging data elements to support search and discovery, yet they rarely understand the data’s context due to being removed from the operational side of the business. This means data is often incomplete and of poor quality, making effective data-driven business impossible.

Company-wide responsibility for data governance, encouraged by the new standards of regulation, stand to fundamentally change the way businesses view data governance. Data Governance 2.0 and its collaborative approach will become the new normal, meaning those with the most to gain from data and its insights will be directly involved in its governance.

This means more buy-in from C-level executives, line managers, etc. It means greater accountability, as well as improved discoverability and traceability. Most of all, it means better data quality that leads to faster, better decisions made with more confidence.

Escalated Digital Transformation

Digital transformation and its prominence won’t diminish this year. In fact, thanks to Data Governance 2.0, digital transformation is poised to accelerate – not slow down.

Organizations that commit to data governance beyond just compliance will reap the rewards. With a stronger data governance foundation, organizations undergoing digital transformation will enjoy a number of significant benefits, including better decision making, greater operational efficiency, improved data understanding and lineage, greater data quality, and increased revenue.

Data-driven exemplars, such as Amazon, Airbnb and Uber, have enjoyed these benefits, using them to disrupt and then dominate their respective industries. But you don’t have to be Amazon-sized to achieve them. De-siloing DG and treating it as a strategic initiative is the first step to data-driven success.

Data as Valuable Asset

Data became more valuable than oil in 2017. Yet despite this assessment, many businesses neglect to treat their data as a prized asset. For context, the Industrial Revolution was powered by machinery that had to be well-maintained to function properly, as downtime would result in loss. Such machinery adds value to a business, so it is inherently valuable.

Fast forward to 2018 with data at center stage. Because data is the value driver, the data itself is valuable. Just because it doesn’t have a physical presence doesn’t mean it is any less important than physical assets. So businesses will need to change how they perceive their data, and this is the year in which this thinking is likely to change.

DG-Enabled AI and IoT

Artificial Intelligence (AI) and the Internet of Things (IoT) aren’t new concepts. However, they’re yet to be fully realized with businesses still competing to carve a slice out of these markets.

As the two continue to expand, they will hypercharge the already accelerating volume of data – specifically unstructured data – to almost unfathomable levels. The three Vs of data tend to escalate in unison. As the volume increases, so does the velocity and speed at which data must be processed. The variety of data – mostly unstructured in these cases – also increases, so to manage it, businesses will need to put effective data governance in place.

Alongside strong data governance practices, more and more businesses will turn to NoSQL databases to manage diverse data types.

For more best practices in business and IT alignment, and successfully implementing Data Governance 2.0, click here.

Data governance is everyone's business

Categories
erwin Expert Blog

SQL, NoSQL or NewSQL: Evaluating Your Database Options

A common question in the modern data management space involves database technology: SQL, NoSQL or NewSQL?

But there isn’t a one-size-fits-all answer. What’s “right” must be evaluated on a case-by-case basis and is dependent on data maturity.

For example, a large bookstore chain with a big-data initiative would be stifled by a SQL database. The advantages that could be gained from analyzing social media data (for popular books, consumer buying habits) couldn’t be realized effectively through sequential analysis. There’s too much data involved in this approach, with too many threads to follow.

However, an independent bookstore isn’t necessarily bound to a big-data approach because it may not have a mature data strategy. It might not have ventured beyond digitizing customer records, and a SQL database is sufficient for that work.

Having said that, the “SQL, NoSQL or NewSQL” question is gaining prominence because businesses are becoming increasingly data-driven.

In 2019, an IDC study found 85% of enterprise decision-makers said they had a time frame of two years to make significant inroads into digital transformation or they will fall behind their competitors and suffer financially. Furthermore, a Progress study showed that 85% of enterprise decision-makers feel they only have two years to make significant digital-transformation progress before suffering financially and/or falling behind competitors.

Considering these statistics, what better time than now to evaluate your database technology? The “SQL, NoSQL or NewSQL question,” is especially important if you intend to become more data-driven.

SQL, NoSQL or NewSQL: Advantages and Disadvantages

SQL

SQL databases are tried and tested, proven to work on disks using interfaces with which businesses are already familiar.

As the longest-standing type of database, plenty of SQL options are available. This competitive market means you’ll likely find what you’re looking for at affordable prices.

Additionally, businesses in the earlier stages of data maturity are more likely to have a SQL database at work already, meaning no new investments need to be made.

However in the modern digital business context, SQL databases weren’t made to support the the three Vs of data. The volume is too high, the variety of sources is too vast, and the velocity (speed at which the data must be processed) is too great to be analyzed in sequence.

Furthermore, the foundational, legacy IT world they were purpose-built to serve has evolved. Now, corporate IT departments must be agile, and their databases must be agile and scalable to match.

NoSQL

Despite its name, “NoSQL” doesn’t mean the complete absence of the SQL database approach. Rather, it works as more of a hybrid. The term is a contraction of “not only SQL.”

So, in addition to the advantage of continuity that staying with SQL offers, NoSQL enjoys many of the benefits of SQL databases.

The key difference is that NoSQL databases were developed with modern IT in mind. They are scalable, agile and purpose-built to deal with disparate, high-volume data.

Hence, data is typically more readily available and can be changed, stored or handle the insertion of new data more easily.

For example, MongoDB, one of the key players in the NoSQL world, uses JavaScript Object Notation (JSON). As the company explains, “A JSON database returns query results that can be easily parsed, with little or no transformation.” The open, human- and machine-readable standard facilitates data interchange and can store records, “just as tables and rows store records in a relational database.”

Generally, NoSQL databases are better equipped to deal with other non-relational data too. As well as JSON, NoSQL supports log messages, XML and unstructured documents. This support avoids the lethargic “schema-on-write,” opting to “schema-on-read” instead.

NewSQL

NewSQL refers to databases based on the relational (SQL) database and SQL query language. In an attempt to solve some of the problems of SQL, the likes of VoltDB and others take a best-of-both-worlds approach, marrying the familiarity of SQL with the scalability and agile enablement of NoSQL.

However, as with most seemingly win-win opportunities, NewSQL isn’t without its caveats. These vary from vendor to vendor, but in essence, you either have to sacrifice familiarity side or scalability.

If you’d like to speak with someone at erwin about SQL, NoSQL or NewSQL in more detail, click here.

For more industry advice, subscribe to the erwin Expert Blog.

Benefits of NoSQL Data Modeling

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – In-memory Databases

With the volume and velocity of data increasing, in-memory databases provide a way to keep processing speeds low.

Traditionally, databases have stored their data on mechanical storage media such as hard disks. While this has contributed to durability, it’s also constrained attainable query speeds. Database and software designers have long realized this limitation and sought ways to harness the faster speeds of in-memory processing.

The traditional approach to database design – and analytics solutions to access them – includes in-memory caching, which retains a subset of recently accessed data in memory for fast access. While caching often worked well for online transaction processing (OLTP), it was not optimal for analytics and business intelligence. In these cases, the most frequently accessed information – rather than the most recently accessed information – is typically of most interest.

That said, loading an entire data warehouse or even a large data mart into memory has been challenging until recent years.

In-Memory

There are a few key factors in making in-memory databases and analytics offerings relevant for more and more use cases. One such factor has been the shift to 64-bit operating systems. Another is that it makes available much more addressable memory. And as one might assume, the availability of increasingly large and affordable memory solutions has also played a part.

Database and software developers have begun to take advantage of in-memory databases in a myriad of ways. These include the many key-value stores such as Amazon DynamoDB, which provide very low latency for IoT and a host of other use cases.

Another way businesses are taking advantage of in-memory is through distributed in-memory NoSQL databases such as Aerospike, to in-memory NewSQL databases such as VoltDB. However, for the remainder of this post, we’ll touch in more detail on several solutions with which you might be more familiar.

Some database vendors have chosen to build hybrid solutions that incorporate in-memory technologies. They aim to bridge in-memory with solutions based on tried-and-true, disk-based RDBMS technologies. Such vendors include Microsoft with its incorporation of xVelocity into SQL Server, Analysis Services and PowerPivot, and Teradata with its Intelligent Memory.

Other vendors, like IBM with its dashDB database, have chosen to deploy in-memory technology in the cloud, while capitalizing on previously developed or acquired technologies (in-database analytics from Netezza in the case of dashDB).

However, probably the most high-profile application of in-memory technology has been SAP’s significant bet on its HANA in-memory database, which first shipped in late 2010. SAP has since made it available in the cloud through its SAP HANA Cloud Platform, and on Microsoft Azure and it has released a comprehensive application suite called S/4HANA.

Like most of the analytics-focused in-memory databases and analytics tools, HANA stores data in a column-oriented, in-memory database. The primary rationale for taking a column-oriented approach to storing data in memory is that in analytic use cases, where data is queried but not updated, it allows for often very impressive compression of data values in each column. This means much less memory is used, resulting in even higher throughput and less need for expensive memory.

So what approach should a data architect adopt? Are Microsoft, Teradata and other “traditional” RDBMS vendors correct with their hybrid approach?

As memory gets cheaper by the day, and the value of rapid insights increases by the minute, should we host the whole data warehouse or data mart in-memory as with vendors SAP and IBM?

It depends on the specific use case, data volumes, business requirements, budget, etc. One thing that is not in dispute is that all the major vendors recognize that in-memory technology adds value to their solutions. And that extends beyond the database vendors to analytics tool stalwarts like Tableau and newer arrivals like Yellowfin.

It is incumbent upon enterprise architects to learn about the relative merits of the different approaches championed by the various vendors and to select the best fit for their specific situation. This is something that’s admittedly, not easy given the pace of adoption of in-memory databases and the variety of approaches being taken.

But there’s a silver lining to the creative disruption caused by the increasing adoption of in-memory technologies. Because of the sheer speed the various solutions offered, many organizations are finding that the need to pre-aggregate data to achieve certain performance targets for specific analytics workloads is disappearing. The same goes for the need to de-normalize database designs to achieve specific analytics performance targets.

Instead, organizations are finding that it’s more important to create comprehensive atomic data models that are flexible and independent of any assumed analytics workload.

Perhaps surprisingly to some, third normal form (3NF) is once again not an unreasonable standard of data modeling for modelers who plan to deploy to a pure in-memory or in-memory-augmented platform.

Organizations can forgo the time-consuming effort to model and transform data to support specific analytics workloads, which are likely to change over time anyway. They also can stop worrying about de-normalizing and tuning an RDBMS for those same fickle and variable analytics workloads, focusing on creating a logical data model of the business that reflects the business information requirements and relationships in a flexible and detailed format, that doesn’t assume specific aggregations and transformations.

The blinding speed of in-memory technologies provides the aggregations, joins and other transformations on the fly, without the onerous performance penalties we have historically experienced with very large data volumes on disk-only-based solutions. As a long-time data modeler, I like the sound of that. And so far in my experience with many of the solutions mentioned in this post, the business people like the blinding speed and flexibility of these new in-memory technologies!

Please join us next time for the final installment of our series, Data Modeling in a Jargon-filled World – The Logical Data Warehouse. We’ll discuss an approach to data warehousing that uses some of the technologies and approaches we’ve discussed in the previous six installments while embracing “any data, anywhere.”

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – The Cloud

There’s no escaping data’s role in the cloud, and so it’s crucial that we analyze the cloud’s impact on data modeling. 

Categories
erwin Expert Blog

Data Modeling is Changing – Time to Make NoSQL Technology a Priority

As the amount of data enterprises are tasked with managing increases, the benefits of NoSQL technology are becoming more apparent. 

Categories
erwin Expert Blog

Data Modeling in a Jargon-filled World – Managed Data Lakes

More and more businesses are adopting managed data lakes.

Earlier in this blog series, we established that leading organizations are adopting a variety of approaches to manage data, including data that may be sourced from a wide range of NoSQL, NewSQL, RDBMS and unstructured sources.

In this post, we’ll discuss managed data lakes and their applications as a hybrid of less structured data and more traditionally structured relational data. We’ll also talk about whether there’s still a need for data modeling and metadata management.

The term Data Lake was first coined by James Dixon of Pentaho in a blog entry in which he said:

“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

Use of the term quickly took on a life of its own with often divergent meanings. So much so that four years later Mr. Dixon felt compelled to refute some criticisms by the analyst community by pointing out that they were objecting to things he actually never said about data lakes.

However, in my experience and despite Mr. Dixon’s objections, the notion that a data lake can contain data from more than one source is now widely accepted..

Similarly, while most early data lake implementations used Hadoop with many vendors pitching the idea that a data lake had to be implemented as a Hadoop data store, the notion that data lakes can be implemented on non-Hadoop platforms, such as Azure Blob storage or Amazon S3, has become increasingly widespread.

So a data lake – as the term is widely used in 2017 – is a detailed (non-aggregated) data store that can contain structured and/or non-structured data from more than one source implemented on some kind of inexpensive, massively scalable storage platform.

But what are “managed data lakes?”

To answer that question, let’s first touch on why many early data lake projects failed or significantly missed expectations. Criticisms were quick to arise, many of which were critiques of data lakes when they strayed from the original vision, as established earlier.

Vendors seized on data lakes as a marketing tool, and as often happens in our industry, they promised it could do almost anything. As long as you poured your data into the lake, people in the organization would somehow magically find exactly the data they needed just when they needed it. As is usually the case, it turned out that for most organizations, their reality was quite different. And for three important reasons:

  1. Most large organizations’ analysts didn’t have the skillsets to wade through the rapidly accumulating pool of information in Hadoop or whichever new platforms had been chosen to implement their data lakes to locate the data they needed.
  2. Not enough attention was paid to the need of providing metadata to help people find the data they needed.
  3. Most interesting analytics are a result of integrating disparate data points to draw conclusions, and integration had not been an area of focus in most data lake implementations.

In the face of growing disenchantment with data lake implementations, some organizations and vendors pivoted to address these drawbacks. They did so by embracing what is most commonly called a managed data lake, though some prefer the label “curated data lake” or “modern data warehouse.”

The idea is to address the three criticisms mentioned above by developing an architectural approach that allows for the use of SQL, making data more accessible and providing more metadata about the data available in the data lake. It also takes on some of the challenging work of integration and transformation that earlier data lake implementations had hoped to kick down the road or avoid entirely.

The result in most implementations of a managed data lake is a hybrid that tries to blend the strengths of the original data lake concept with the strengths of traditional large-scale data warehousing (as opposed to the narrow data mart approach Mr. Dixon used as a foil when originally describing data lakes).

Incoming data, either structured or unstructured, can be easily and quickly loaded from many different sources (e.g., applications, IoT, third parties, etc.). The data can be accumulated with minimal processing at reasonable cost using a bulk storage platform such as Hadoop, Azure Blob storage or Amazon S3.

Then the data, which is widely used within the organization, can be integrated and made available through a SQL or SQL-like interface, such as those from Hive to Postgres to a tried-and-true commercial relational database such as SQL Server (or its cloud-based cousin Azure SQL Data Warehouse).

In this scenario, a handful of self-sufficient data scientists may wade (or swim or dive) in the surrounding data lake. However, most analysts in most organizations still spend most of their time using familiar SQL-capable tools to analyze data stored in the core of the managed data lake – an island in the lake if we really want to torture the analogy – which is typically implemented either using an RDBMS or a relational layer like Hive on top of the bulk-storage layer.

It’s important to note that these are not two discrete silos. Most major vendors have added capabilities to their database and BI offerings to enable analysis of both RDBMS-based and bulk-storage layer data through a familiar SQL interface.

This enables a much larger percentage of an organization’s analysts to access data both in the core and the less structured surrounding lake, using tools with which they’re already familiar.

As this hybrid managed data lake approach incorporates a relational core, robust data modeling capabilities are as important as ever. The same goes for data governance and a thorough focus on metadata to provide clear naming and definitions to assist in finding and linking with the most appropriate data.

This is true whether inside the structured relational core of the managed data lake or in the surrounding, more fluid data lake.

As you probably guessed from some of the links in this post, more and more managed data lakes are being implemented in the cloud. Please join us next time for the fifth installment in our series: Data Modeling in a Jargon-filled World – The Cloud.