Cloud is complicated. But that’s not surprising, aligning the software mechanics to run complex cloud computing instances for modern applications amidst the torrent of real-time data in the always-on world of today was always going to be complicated.
To build the foundations of a Cloud Service Provider (CSP) ‘hyperscaler’ service capable of delivering the cloud’s promise of flexibility and control, Amazon Web Services, Inc. (AWS) has understandably had to create complex layers of services, which it delivers from its global network of datacenters around the globe.
But as AWS CEO Adam Selipsky has said on several occasions, the move to the cloud is ‘just getting started – most it is still yet to come’ and we have a lot of so-called digital transformation (which you can take to mean cloud migration and data-driven digitization), integration and toolset simplification to do.
That theme of integration and simplification might just typify and characterize where AWS is at right now.
Making services easier
“By far the majority of innovations [we are bringing to market in 2022 and onward] are driven by listening and responding to you, the customers,” said Selipsky, speaking at this year’s AWS re:Invent 2022, staged in Las Vegas. He used his time on stage to note (see 34 mins in link) that AWS has been, “Working for a few years now to build integrations between our services to make it easier to analytics and Machine Learning without having to do all the ETL muck.”
There’s not much padding out at an AWS keynote, the execs tend to get straight into the guts of the systems they are developing as they talk about real world software issues.
When CEO Selipsky talks about ETL muck, he is referring to the process of Extract, Transform & Load i.e. the technical chores associated with extracting typically raw unstructured data from a data warehouse and putting into a subsequent cloud server where it is transformed for use, all before it is then loaded into a further server (often in a data warehouse, a database computing repository that has been provisioned for specific data tasks) and driven towards its analytics jobs, to serve a business need.
As an illustration of ETL muck cleaning, AWS used this year’s convention to announce new integrations that make it easier for customers to connect and analyze data across data stores without having to move data between services. In this case, the new connection is between the Amazon Aurora database and Amazon Redshift, a data warehouse product and service designed to handle large-scale data processing.
Organizations can now also now run Apache Spark (an analytics engine for large-scale data processing) applications on Amazon Redshift data using AWS analytics and machine learning (ML) services (such as Amazon EMR, AWS Glue, and Amazon SageMaker).
A zero-ETL future
All of this amounts to what CEO Selipsky called a zero-ETL future on AWS… but why do organizations need to store, run, keep and analyze data in more than one location in the first place?
Answering exactly this point at this year’s AWS re:Invent was Swami Sivasubramanian, vice president of databases, analytics and machine learning.
“The vastness and complexity of data that customers manage today means they cannot analyze and explore it with a single technology or even a small set of tools. Many of our customers rely on multiple AWS database and analytics services to extract value from their data, so ensuring they have access to the right tool for the job is important to their success,” said Sivasubramanian.
According to Sivasubramanian, by eliminating ETL and other ‘data movement’ tasks, an organization is freed-up to focus on analyzing data and driving business. “[We know that] real-world data systems are often sprawling and complex, with diverse data dispersed across multiple services and on-premises systems. Many organizations are sitting on a treasure trove of data and want to maximize the value they get out of it,” notes the team.
As an established software industry firm in its own right, Infor is an industry-specific Enterprise Resource Planning (ERP) and wider software services company. Jim Plourde, senior vice president for cloud services at Infor explains that the company uses AWS to run a new managed data warehouse service for its customers’ industry cloud data.
“We are excited for Amazon Aurora to support zero-ETL integration with Amazon Redshift, which will reduce our operational burden by making transactional data from Amazon Aurora available in Amazon Redshift in near real-time,” said Plourde. Now, we can benefit from the performance of Amazon Aurora as our relational database management system, while easily leveraging the analytics and ML capabilities in Amazon Redshift for our new managed data warehouse service.”
Why is connecting data hard?
While there is clearly a lot happening here in terms of the drive to coalesce and connect data and cloud computing services together, shouldn’t we stop and ask why the physical (okay, virtualized cloud engineering-based, but still requiring a cloud engineer to use a keyboard and mouse) act of connecting information so tough in the first place?
To explain the reality of the scenario, AWS says that customers often want to analyze Amazon Redshift data directly from different cloud services. This requires them to go through the complex, time-consuming process of finding testing, and certifying a third-party connector to help read and write the data between their environment and Amazon Redshift.
Even after they have found a connector, customers must manage intermediate data-staging locations, such as Amazon S3, to read and write data from and to Amazon Redshift. All of these challenges increase operational complexity and make it difficult for customers to use a technology like Apache Spark to its full extent.
This momentum to provide pre-programmed integrations (often via the use of ML) is also illustrated in other AWS products.
The newly announced AWS Supply Chain is an application designed to help businesses increase supply chain visibility and make faster, more informed decisions that mitigate risks. It automatically combines and analyzes data across multiple supply chain systems so businesses can observe their operations in real-time, find trends more quickly, and generate more accurate demand forecasts that ensure adequate inventory to meet customer expectations.
While it can be tough to form all the connective tissue to bond AWS to SAP HANA (for example, other ERP-based supply chain management systems are also available), the rationale for creating AWS Supply Chain is heavily centralized around offering those connections pre-baked.
Undifferentiated heavy lifting
“Customers tell us that the undifferentiated heavy lifting required in connecting data between different supply chain solutions has inhibited their ability to quickly see and respond to potential supply chain disruptions,” said Diego Pantoja-Navajas, vice president of AWS Supply Chain. “AWS Supply Chain aggregates this data and provides visual, interactive dashboards that provide the insights and recommendations customers need to take actions toward more resilient supply chains.”
Also ranking high in this year’s ‘we make cloud easier’ play is the new AWS Clean Rooms analytics service. This helps companies to securely analyze and collaborate on their combined datasets without sharing or revealing underlying data – a use case that is prevalent in the advertising and media industry, as well as in life sciences and financials.
In advertising, commercial organization brands and media publishers want to get sight of datasets stored across different ‘channels’ (for example, across a user base’s activity spanning television choices and music preferences, but also across their shopping habits and home furnishing choices) to improve the relevance of their campaigns and better engage with consumers.
All the while, these companies also want to protect sensitive consumer information, so to achieve this, one company usually has to offer a copy of their user-level data to partners. They then have to go down the road of relying upon contractual agreements to prevent misuse. So-called data clean rooms can help solve this challenge by allowing multiple parties to combine and analyze their data in a protected environment, where participants are unable to see each other’s raw data. But, AWS reminds us, clean rooms are tough to build, requiring complex privacy controls, specialized tools to protect each participant’s data, and months of development time customizing analytics tools.
According to Dilip Kumar, vice president of AWS Applications, “AWS Clean Rooms helps customers and their partners to better analyze and collaborate on their data on AWS. With the launch of AWS Clean Rooms, we are making it easier, simpler, and more secure for multiple companies to share and analyze combined datasets to generate new insights that they could not do on their own.”
Cloud’s future: more solid data
We have attempted to highlight the efforts AWS is making to make cloud services simpler to use, but there is a key facilitating factor that underpins this effort and also explains the mechanics of what is going on… and, possibly, it may perhaps even validate the organization’s technology proposition at a platform and tools level.
We need to move data around less.
What businesses need from cloud computing (other than its innate ability to offer flexible services and lower Capital Expenditure (CapEx) outlay) is the power to work on their data without having to transport it around between different clouds, different databases and different repositories, different integrations to third-party applications, different data pipelines and different compute engines.
In the early days of cloud, we spent a lot of time worrying about the cost of getting on-premises terrestrially-based data ‘up to the cloud’, now that we are firmly entering the era of cloud-native where information is born in and of the cloud, we need to be able to work on it without clogging up the virtualized information superhighways we can now travel down.
There’s no vaccine for cloud congestion, but there is a cure. It’s winter, go and get your booster shots.