Alluxio 2.5 focuses on POSIX and S3 interface access to improve performance and compatibility with popular interfaces for analytics and machine learning data pipelines.
Alluxio, the developer of open-source cloud data orchestration software, today announced the immediate availability of version 2.5 of its Data Orchestration Platform featuring access via POSIX and S3 interfaces enabling data platform teams to accelerate data pipelines for both business intelligence and model training using frameworks such as Tensorflow and PyTorch.
“For modern AI/ML data pipelines, the preferred application programming interface (API) for storage access is not HDFS,” said Haoyuan Li, Founder and CEO, Alluxio. “With this release, Alluxio significantly improves support for model training pipelines with an accelerated POSIX API for unified storage access, performance, and ease of management.”
“With Alluxio 2.5, we have made major strides in improving machine learning and AI support on Kubernetes. Enhancements to the FUSE interface for Tensorflow access have dramatically improved the model training experience,” said Yang Che, Sr. Staff Engineer, Alibaba Cloud.
"The Alluxio Data Orchestration System slashed query run times by half when running analytics jobs like Spark in Tencent Cloud, using our EMR platform to allow for greater I/O performance, and provides the ability to provision elastic compute with significantly reduced network resources," said Long Chen, Vice Director of R&D, Center of Bigdata Product, Tencent Cloud.
Alluxio 2.5 also improves compatibility with the S3 API. Together S3, HDFS, and POSIX make up a majority of the APIs preferred by data-driven applications and data management tools. Administrators now have the flexibility to manage the Alluxio file system namespace through a standard object storage console. This flexibility makes it even simpler to integrate Alluxio into existing large-scale data pipelines.
New and improved storage connectors on Amazon Web Services, Azure Cloud, and Google Cloud Platform improve the onboarding experience with seamless authentication and improved performance. Data lakes on all major cloud platforms can now easily integrate Alluxio to orchestrate data management. A new Quickstart guide using Data Orchestration Hub for single, hybrid, or multi-cloud data orchestration is also included, along with support for the Hub on Kubernetes.
Kubernetes is a popular deployment choice for Alluxio with both data analytics and machine learning pipelines across on-premise and cloud environments. With the dynamic nature of containerized environments, log collection is a challenge when containers are frequently killed or restarted. Now Alluxio logs can be aggregated on a centralized collection server in Kubernetes.
Alluxio 2.5 Community and Enterprise Edition features new capabilities, including:
JNI Based POSIX API
Alluxio 2.5 introduces a new Java Native Interface (JNI) based FUSE integration to support POSIX data access. This new JNI-based FUSE integration improves the performance by 3x to 5x for high-performance and high-concurrency workloads such as AI/ML training.
S3 Northbound API
The new release improves S3 API access to achieve compatibility with S3 browsing software such as s3browser (https://s3browser.com/). Improved support allows administrators to maintain and manage the Alluxio namespace through a standard object storage console across existing users.
ADLS Gen2 Connector
Alluxio 2.5 improves support for Azure cloud with the introduction of a connector for Azure Data Lake Storage Gen 2. This allows users to benefit from the various optimizations provided by ADLS Gen2 when using Azure object storage with Alluxio.
Native GCS Connector
An updated Google Cloud Storage (GCS) connector uses the native Google provided SDK to enable users to benefit from the latest optimizations and features available from the GCS SDK such as JSON file-based login. This change reduces the onboarding time for Alluxio users on Google Cloud Platform (GCP).
STS Support for AWS S3 Connector
The S3 connector in Alluxio 2.5 supports Amazon’s Security Token Service to only use temporary, limited-access credentials to access S3. This allows users to leverage AWS’s role-based authentication model whereby services temporarily assume a role with the appropriate permissions to access data and services in AWS. STS is AWS’s recommended authentication paradigm and has benefits such as all credentials are temporary, cross-account bucket sharing, and fine-grained privilege control.
Hybrid Cloud Quickstart with Alluxio Data Orchestration Hub
Alluxio is frequently used for multi-datacenter and hybrid cloud environments. Version 2.5 provides an even simpler way of getting started with the deployment and configuration of such environments. Data Orchestration Hub is now supported on Kubernetes to aid cluster configuration and connectivity across private data centers or public clouds. AWS users now also have access to a quickstart using Terraform to deploy an Alluxio cluster with Amazon EMR in minutes. Once an Alluxio cluster is deployed, either using the new Terraform or helm on Kubernetes, the Hub is available to manage subsequent changes.
Alluxio 2.5 Community and Enterprise Edition are generally available for download here: https://www.alluxio.io/download/