Simplify Big Data Analytics With AirMettle

ctsmithiii
Feb 22, 2024
3 min read

100x faster SQL queries without warehouses. The software-defined approach runs on commodity hardware both on-premises and in the cloud.

Big data analytics continues to grow in importance, but working with massive datasets poses challenges. Moving petabytes of data repeatedly for analysis strains networks and budgets. Even then, irrelevant data can overwhelm analytics platforms, generating more expense without additional insight.

We met with AirMettle during the 53rd IT Press Tour. It offers a new approach, integrating analytics into the data lake itself. Their software runs on commodity hardware, delivering faster insights without the overhead of traditional data warehouses. Key features help developers simplify big data analytics:

Accelerated SQL Queries

AirMettle accelerates SQL directly on data lake storage, eliminating unnecessary data transfers. Built-in parallelism delivers 100x faster SELECT performance than native S3 analytics. This supports ad hoc analysis that data warehouses struggle with.

Focus Relevant Data

Rather than analyzing entire objects, AirMettle summarizes and extracts relevant subsets before they leave storage. This massively reduces the volume developers must manage for a given query, making it feasible to use more historical data.

Software-Defined Flexibility

As software-defined storage, AirMettle runs on any x86 servers with SSDs. It integrates easily into existing infrastructure on-premises and in the cloud, replacing only the high-performance storage tier. This makes it accessible for teams looking to optimize big data analysis.

Handle Diverse Data Types

AirMettle handles diverse data types, from video to complex scientific data, automatically detecting optimal ways to structure each for fast in-place processing. Support for open APIs like S3 and Arrow makes analytical results readily consumable.

Collaboration to Deployment

The platform spans the development lifecycle with shared workspaces, version control, code review, and CI/CD integration. Role-based access control and security features help manage access, while extensive extensibility supports customizing workflows.

2024 Launch

AirMettle will launch in mid-2024 after four years of development driven by $4 million in investor funding. Early customers like Los Alamos National Laboratory validate both commercial and research applications. Upcoming reference deployments in verticals like finance, security, entertainment, and climate science showcase use cases.

How It Works

For engineering teams struggling to extract value from rapidly growing data stores, AirMettle promises welcome simplicity. Moving analytics closer to the data lake aims to deliver deeper insights without the overhead enterprises suffer today. Early results suggest they can accelerate SQL queries up to 100x using only commodity infrastructure.

What's Different

AirMettle stands apart from other vendors in the computational storage space, like Coho Data, in its pure software approach. As software-defined storage, it isn't tied to proprietary hardware and promises easier adoption and infrastructure integration. This aligns more closely with analytics platforms from public cloud providers, but AirMettle differs by running directly within on-premises storage infrastructure.

It also goes beyond cloud analytics by supporting more flexible and granular in-place processing instead of EMDR-only query acceleration. AirMettle's ability to handle unstructured data in native formats helps address gaps left by analytical warehouses reliant on rigid schemas. Its ambitious performance claims of up to 100x faster analysis separate it from existing analytics options, struggling to keep pace with rapidly growing data volumes.

The early customer wins indicate that AirMettle’s integrated architecture could disrupt the separation of storage and insights common today.

Customer Feedback

"Our scientific large-scale simulations can generate hundreds of petabytes of highly dimensional floating-point data. However, the data associated with a scientific feature of interest can be smaller than the written data. Hence, a key challenge is quickly and efficiently finding what’s relevant in this sea of data. To optimize this process, we’ve been drawn towards computational storage — processing data in-place and near storage — to eliminate unnecessary data movement while maintaining parallelism and adequate data protection." — Gary Grider, High-Performance Computing Division Leader, Los Alamos National Lab.

"We are a leading SIEM company, spending tens of millions of dollars annually on our data warehouse for security analytics. Our costs keep rising as attacks get more sophisticated and customers demand more proof that their data is protected. By moving analytical processing directly into our data lake with AirMettle, we expect to save over $10 million annually on our largest application while enabling more advanced analytics by leveraging all of our log data." — Anonymous, Public SIEM Company.

"As our therapeutic simulations generate exponentially more data, AirMettle will allow us to extract insights at speeds previously impossible. By parallelizing computation where the data already resides, we can now mine years of archives for clues that could lead to healthcare breakthroughs." — Chief Data Officer, Biotechnology Startup.

If the combination of improved performance, cost savings, and easy access to diverse data in its native formats resonates, AirMettle may open new possibilities for organizations to navigate big data. Engineers searching for analytical freedom from warehouses and proprietary lock-in should watch their mid-2024 launch.

Insights From Analytics