Data Orchestration Advances as Data Grows

COVID-19 has accelerated data volume, velocity, and variety, driving demand for more responsive solutions to integrate and analyze.



When I spoke to Haoyuan Li, CEO, and Founder of Alluxio shortly after the COVID-19 lockdown he shared that their clients, particularly large e-commerce companies, we're seeing rapidly accelerated demand and needed to immediately scale to meet unknown demand.


Seven months later, Alluxio, the developer of open-source cloud data orchestration software, has updated its data orchestration platform to feature an expanded metadata service, a new management console for hybrid and multi-cloud deployments, and more cloud-native deployments.


These new features enable data platform teams to save on infrastructure and operational costs while managing data access across multiple environments.


Enterprises leverage Alluxio at an enormous scale, both in data size and number of files. Data orchestration decouples compute from the location of data to optimize for which data resides where and for how long without the management overhead. Users are able to manage namespaces with billions of files without relying on third-party systems, greatly reducing the overall deployment footprint of the solution. The new management console enables easy to connect an analytics cluster, with engines such as Presto and Spark, with data sources across multiple clouds, single cloud or on-premises using Alluxio.


“Organizations have adopted an infrastructure with compute engines and data sets spread across private data centers and public clouds for business agility and cost-effectiveness. Our customers have turned to Alluxio to bridge the gap between applications and the storage systems spread across regions and cloud providers,” said Li.


Alluxio 2.4 Community and Enterprise Edition features new capabilities, including:


Expanded Metadata Service - At the core of the Data Orchestration Platform is a metadata service, a scalable, distributed data service for management across multiple sources like traditional Hadoop-based data lakes on-premises or modern cloud-based data lakes. Leveraged to unify data lakes at an enormous scale, both in data size and a number of files, Alluxio has expanded this service to provide support for billions of files while removing third-party system dependencies. Breaking away from dependencies on traditional Hadoop components, Alluxio has bolstered support for cloud-native and container-based deployments. The lifecycle management of the Alluxio’s metadata service now also supports automatic backups without impacting the live system to further reduce the platform management overhead.


New management console – The Data Orchestration Hub is a new web-based management console that makes it easy to connect an analytics or machine learning clusters with multiple data sources to unify data lakes. The new service provides an easy to use unified management view for configuration and monitoring.


Cloud-native deployment - Spawning analytics clusters in AWS and GCP is now easier. Based on Terraform, Alluxio enables the launch of pre-configured clusters programmatically using a single command. Alluxio has been featured as a recommended data lake partner for data lake modernization solution with Google Cloud, including the ability to launch an Alluxio-enabled cluster using the Dataproc component exchange console.


Sensitive data management - Alluxio now integrates with Vault for secure management of sensitive information for data access with dynamic infrastructure across multiple clouds and on-premises environments. With the shift from on-premise infrastructure to multiple cloud-providers, protection of data access tokens and credentials is more important than ever.

Simplified DevOps and system monitoring - Alluxio 2.4 adds several system enhancements to simplify and improve cluster management and maintenance. The system provides an aggregated cluster view of key performance metrics like I/O throughput and metadata request rates through the UI and programmatic monitoring endpoints. Internal monitoring for failures and system slowdowns has been added, further improving the operator view of the health and performance of the system.


Support for Java 11 - Java 11 is the latest long term support version of Java. Alluxio 2.4 provides compatibility with Java 11 while maintaining support for Java 8. Users looking to move their compute engines or Alluxio systems to Java 11 can now do so without any concerns.


Here are some thoughts from a couple of Alluxuio's big data clients:


"The Alluxio Data Orchestration System slashed query run times by half when running analytics jobs like Spark in Tencent Cloud, using our EMR platform to allow for greater I/O performance, and provides the ability to provision elastic compute with significantly reduced network resources," said Xiaoping Lei, Vice General Manager of Big Data, Tencent Cloud.

“We have worked closely with Alluxio and see the Alluxio Data Orchestration System as an integral component of our machine learning and AI platform on Kubernetes, with engines like Tensorflow, now available for both internal and external customers,” said Yang Che, Sr. Staff Engineer, Alibaba Cloud.

Drop Me a Line, Let Me Know What You Think

© 2020 by Tom Smith | ctsmithiii@gmail.com | @ctsmithiii