Companies are moving analytics to the cloud at an aggressive rate. And, why wouldn’t they? With the pace and format of data volumes constantly growing—and the proliferation of both open source and proprietary tools deployable in the cloud for just pennies or dollars per hour—the cloud is a great place to analyze data.
A recent study showed that more than 50 percent of organizations surveyed have already moved some operational data to the cloud. That makes a lot of sense, especially as that type of data is increasingly merged with Internet of Things (IoT), social media, and web data, because you can analyze the data where it lies.
We see more and more customers moving away from Hadoop as their data lake (on-premises and in the cloud), and moving to object storage like Amazon S3 or Azure Blob as their new default data lake. As companies see increasing data gravity toward these object-storage-based data lakes in the cloud, we see them wanting to analyze the data straight from these sources. And, they don’t really want to move that data around via wide area networks (WANs), either. This gives rise to analyzing that data in one of two ways: all-cloud or hybrid.
When data in a public cloud can be analyzed by itself—without needing to be merged with other data that might “live” somewhere else, such as on-premises or in a different cloud—then all the tools needed to analyze that data can be deployed in the cloud platform of choice, usually AWS, Azure, or even Google. But we often see a need for hybrid analytics, where some data is in a public cloud, or multiple public clouds as well as on-premises. These types of deployments require more thought into the overall analytics architecture, and require additional tools for querying data across WANs.
These trends are noted for much of the production or operational data within organizations. We also see data scientists and many lines of business using the public cloud for shadow IT analytics. That’s right, our old friend (or foe, depending on your perspective) shadow IT is a reality for analytics in the cloud. It’s an age-old issue that data scientists and line of business folks go to IT asking for access to tools and compute resources to run analytic experiments. And IT usually responds, “Get in line, and fill out this request in triplicate,” or “Sorry, there’s no way we can give you any compute space on our operational database systems, and besides, we don’t even have half the tools you requested.”
What is an analytics adventurer supposed to so? Well, increasingly they are going to the cloud. And again, why not? The public cloud is really the perfect place to quickly spin up the compute needed, deploy the tools desired to do the analytics, and play away. Whether it takes a day, a week, or a month to find actionable results (or not), just spin it down when you are done and “throw the compute away.” Again, isn’t this what the cloud was originally built for? It’s perfect for development, experimentation, and sand-boxing!
Where are the gaps?
Are all analytics ready for the cloud? The short answer is “maybe.” It largely depends on each company’s situation when it comes to data and what they are trying to accomplish. The first main factor is data gravity and volume. The first workloads we saw moving to the cloud were smaller, under 20TBs of data. Over the last year, however, we have seen larger and larger data volumes move to the cloud for analytics. But there is still a gap for companies whose main data gravity is on-premises, coming from a large enterprise resource planning (ERP) system—maybe a mainframe or systems like that. Unless a company is moving the systems generating that data gravity into the cloud, they are in effect choosing to keep their analytics mostly on-premises. Many companies are not currently interested in moving dozens—or even hundreds—of terabytes of data into the cloud, or the cost and effort to do it.
At the end of the day, we see smaller analytic workloads, or analytics on data whose location is already in the cloud, moving rapidly to the cloud. Otherwise, there’s a lot of interest around hybrid situations using querying tools that can reach across disparate geographic locations, like on-premises and multiple clouds, to analyze the data where it lies and then bring the separate results back together.
The middle ground
After helping countless customers evaluate whether to move to the cloud for analytics, and helping dozens of enterprises move some or all of their analytics to the cloud, my advice is to be thoughtful about moving analytics to the cloud.
There are two extremes of thought when it comes to moving analytics to the cloud: 1) “The cloud is magic and everything should move there immediately,” or 2) “The cloud isn’t secure, we have no control, and we are never moving our data into the cloud.” I would advise that neither is correct, and the reality is somewhere in between. We see the that most successful companies are very thoughtful about their entire analytics ecosystem, and about truly evaluating what makes sense to put into the cloud—and what might not be ready for the cloud.
In the next year or so, as cloud providers continue to add infrastructure that better handles the volume and complexity of data, and add speed to the interconnection between their various services—and as WAN pricing continues to come down—we’ll see more and more data and analytics moving to the cloud.
One last word of advice: when choosing cloud providers for analytics, try to ensure that you architect in a way that doesn’t lock you into one platform or another. No one likes vendor lock in, do they? You never know where the next great innovation for analytics in the cloud is going to come from, and being prepared and able to move between cloud vendors when it makes sense is paramount to taking advantage of the best that the cloud has to offer.
I encourage you to follow the conversation at #CloudExperts or #BuiltForTheCloud, and reach out to your Teradata account executive to learn more.