Hammerspace simplifies distributed computing with Hyperscale NAS and Global Data Environment enhancements for performance, S3 support, and data orchestration.
Hammerspace, a software-defined, multi-cloud data control plane provider, recently announced significant enhancements to their Hyperscale NAS and Global Data Environment offerings to simplify data management for distributed computing. The new capabilities include performance optimizations, an S3 interface, and high-performance erasure coding.
For developers, engineers, and architects struggling to efficiently manage and move data across silos to power AI/ML training and other computing workloads, Hammerspace provides a compelling solution. As Molly Presley, SVP of Marketing at Hammerspace, explained during the 54th IT Press Tour, their goal is to "radically improve how data is used" by shifting from "data at rest isolated in storage" to "data in motion across a global data environment."
Hyperscale NAS: Performance and Simplicity
One of the critical challenges Hammerspace addresses is providing the performance needed for AI/ML model training and other GPU-intensive workloads while maintaining the simplicity of standard NAS interfaces. Presley noted that "Models require standard NFS data interface," but "existing NAS and object storage were not designed for large compute performance."
Hammerspace's Hyperscale NAS solves this by combining the performance of HPC-class parallel file systems with the simplicity and enterprise features of scale-out NAS. "Hyperscale NAS architectures provide performance and efficiency that a massive web or hyperscale organization [needs], but even at small scale, the efficiencies are the same," said Presley.
This enables linear scalability as the system grows. "With parallel file systems, this is true of Lustre, probably OneFS, GPFS for sure, that you can scale very linearly, so you get the benefits of being able to connect to the enterprise with the simplicity of NAS but with all the benefits of an HPC file system."
Real-world deployments have shown the power of Hyperscale NAS. One web-scale customer is using it to feed 34,000 GPUs for AI training, soon scaling to 1 million GPUs. It provides an aggregate performance of 12.5 TB/sec, rising to 100 TB/sec — all using standards-based, plug-and-play infrastructure.
Global Data Environment Unifies Siloed Data
The other major problem area Hammerspace targets is data silos that make it difficult to share data across sites and clouds efficiently. Their Global Data Environment creates a unified namespace across all storage, enabling transparent data movement and access.
"Hammerspace virtualizes the underlying storage infrastructure. All authorized users and applications can access the same data locally from anywhere," explained Presley. This eliminates the need for complex data migrations and copying.
On the backend, Hammerspace supports any storage type, from NAS to object storage to cloud. The front end provides standard access protocols like NFS, SMB, and now S3 as well. This allows applications to access data without modification while gaining the benefits of the global data environment.
"[Customers] bias for a specific need. But then they add more workloads over time and getting all of their data into a single global data environment," said Presley. "Now all those S3 applications can interact with us without any limitations on the data patterns that they'll have."
Built-in high-performance erasure coding on Hammerspace-provided storage nodes offers efficient data protection without sacrificing performance. "The speed was [the] peak that caught our attention, but there is a lot about the resiliency," noted Presley. "Having the ability to suffer a lot of failures and self-heal and continue to provide performance and not lose performance through the process is what made this so attractive to us."
Simplifying Data Management Across the Lifecycle
For engineers and architects, this all adds up to vastly simplified data lifecycle management. Data generated on any storage system or cloud can be easily ingested into the global data environment and made available wherever it is needed without migration or copying. Automated orchestration and placement policies ensure data is always in the right place at the right time.
This streamlines common workflows like using AI to process data from edge sites or IoT devices and enables remote users to get real-time access to large datasets. Data is automatically placed close to GPU resources when needed for training, tiered to cost-effective storage when dormant, and protected via erasure coding.
The impact of this model was summed up well by Presley: "When you imagine you have a lot of capacity in HPC in particular, they historically haven't done data backup because they just can't afford to have two copies of 200 petabytes or whatever it is. So they will have a single copy. And what we do is instead of if you want to make a file available on another node, we will move it to the other node. We don't keep copies of data. There's a placement of the data somewhere new, and then you still only have a single gold dataset."
Looking Forward
Hammerspace's Hyperscale NAS and Global Data Environment updates mark a significant step forward in simplifying distributed data environments. As GPU-accelerated computing and multi-cloud deployments become the norm, the ability to efficiently manage data across these landscapes is critical.
Developers can look forward to focusing on their applications and models while leaving the intricacies of distributed data management to Hammerspace's data orchestration. Infrastructure engineers and architects gain a powerful and flexible platform for unifying data silos and putting data where it needs to be for each workload — all without sacrificing performance or enterprise reliability. These capabilities will only become more essential as the size and distribution of datasets continues to grow.
Comments