DBaaS improves the developer experience for analytics applications with instant access, intelligent operations, and built-in streaming with Confluent Cloud.
Apache Druid extends leadership as the database for analytics applications.
I had the opportunity to speak with Gian Merlino, Cofounder and an original creator of Apache Druid, and David Wang, V.P. of Product Marketing at Imply about their new cloud database service for modern analytics applications, as well as the expansion for Apache Druid.
Applications are becoming more analytics-driven. Value is shifting from business intelligence to revenue impact and revenue generation with data. Modern analytic applications are interactive at any scale, provide high concurrency at the best value, and provide insights on real-time and historical data.
Imply has just unveiled the first milestone in Project Shapeshift, the 12-month initiative designed to solve the most pressing issues developers face when building analytics applications. It includes a cloud database service built from Apache Druid and the preview of a multi-stage query engine for Druid. These innovations reinforce Imply's commitment to delivering a developer-friendly database for analytics applications.
Developers are increasingly at the forefront of analytics innovation, driving an evolution in analytics beyond traditional BI and reporting to modern analytics applications. These applications, fueled by the digitization of businesses, are being built for real-time observability at scale for cloud products and services, next-gen operational visibility for security and IT, revenue-impacting insights and recommendations, and for extending analytics to external customers. Apache Druid has been the database-of-choice for analytics applications trusted by developers of 1000+ companies including Netflix, Confluent, and Salesforce.
“Today, we are at an inflection point with the adoption of Apache Druid, as every organization now needs to build modern analytics applications. This is why it’s now time to take Druid to the next level. Project Shapeshift is all about making things easier for developers, so they can drive the analytics evolution inside their companies,” said Fangjin Yang, CEO, and co-founder, Imply.
While developers have turned to Apache Druid to power interactive data experiences on streaming and batch data with limitless scale, Imply saw the opportunity to simplify the end-to-end developer experience and extend the Druid architecture to power more analytics use cases for applications from a single database.
Real-Time Database as a Service, Built from Apache Druid
Building analytics applications involves operational work for software development and engineering teams across deployment, database operations, lifecycle management, and ecosystem integration. For databases, cloud database services have become the norm as they remove the burden of infrastructure from cluster sizing to scaling and shift the consumption model to pay-as-you-use.
Imply Polaris, however, is a cloud database service reimagined from the ground up to simplify the developer experience for analytics applications end-to-end. More than moving Apache Druid to the cloud, Polaris helps to drive automation and intelligence that delivers the performance of Druid without needing expertise. It provides a complete, integrated experience that simplifies everything from streaming to visualization. Specifically,
Fully-Managed Cloud Service: Developers can build modern analytics applications without needing to think about the underlying infrastructure. No more sizing and planning is required to deploy and scale the database. Developers can start ingesting data and building applications in just a few minutes.
Database Optimization: Developers get all the performance of Druid they need without having to turn knobs. The service automates configurations and tuning parameters and includes built-in performance monitoring and personalized recommendations that ensure the database is optimized for every query in the application.
Single Development Experience: Developers get a seamless, integrated experience to build analytics applications from streaming to visualization. A built-in push-based streaming service via Confluent Cloud and visualization engine integrated into a single UI makes it simple to connect to data sources and build rich, interactive applications.
“Polaris is built on the core tenets of Apache Druid—flexibility, efficiency, and resiliency—and packages them into a cloud service that deploys instantly, scales effortlessly, and doesn’t require any Druid expertise, enabling any developer to build modern analytics applications,” said Jad Naous, chief product officer, Imply.
Evolving the Druid Architecture
From its inception, Druid has uniquely enabled developers to build highly interactive and concurrent applications at scale—powered by a query engine built for always-on applications with a sub-second performance at TB to PB+ scale. Increasingly, however, developers need data exports, reporting, and advanced alerting included with their applications, requiring additional data processing systems to deploy and manage.
Today, Imply introduces a private preview of a multi-stage query engine, a technical evolution for Druid that reinforces its leadership as the most capable database for analytics applications. The multi-stage query engine—in conjunction with the core Druid query engine—will extend Druid beyond interactivity to support the following new use cases in a single database platform.
Druid for Reporting. Adding capabilities to query external tables, dynamically transform data in the database, and execute long-running heavyweight queries, enabling reporting while also adding cost-control capabilities to make reporting affordable
Druid for Alerting. Building on Druid’s longstanding capability to combine live streaming data with on-disk historical data, the multi-stage query engine adds dynamic schema modification and high-precision queries at a large scale to enable anomaly detection and alerting across streams of any size, for both real-time understanding and rapid response to operational, security, customer, and other events.
Simplified and More Capable Ingestion. Druid has always provided very high concurrency, very fast queries across large data sets, partially powered by indexing and segmentation during data ingestion. While this could require tuning to optimize processes for ingesting data from streams and files, the new multi-stage query engine will enable support for SQL-based ingestion from object stores, including HDFS, Amazon S3, Azure Blob, and Google GCS, with in-database transformation, making data ingestion simple without giving up any of Druid’s power to enable interactive conversations in modern data analytics applications.
“The multi-stage query engine represents the most significant evolution of Druid, an expansion of the architecture that makes it unparalleled in the industry. It brings both flexibilities as well as ease to the developer experience. I’m excited that the entire open source community will be able to take full advantage of it,” said Gian Merlino, co-founder/CTO of Imply and Apache Druid PMC Chair.
Comments