Kylin + Artificial Intelligence

AI accelerates time to value with real-time data insights by reducing the work required to examine queries and create models.



I had the opportunity to speak with Li Kang, Vice President North America, and George Demarest, Head of Marketing at Kyligence, an AI-powered intelligent data cloud to identify, manage, and accelerate access to the most valuable enterprise data assets.


Kylin

Apache Kylin is an open-source distributed analytics engine designed to provide an SQL interface and multidimensional analysis on Hadoop and Alluxio supporting extremely large datasets. Originally developed by eBay in 2015, Kylin is now a project of the Apache Software Foundation.


Kylin was designed to provide OLAP (Online Analytical Processing) capability by renovating the multi-dimensional cube and precalculation technology on Hadoop and spark to achieve near constant query speed analysis regardless of data volume. This reduces query latency from minutes to sub-second.


Apache Kylin lets you query billions of rows at sub-second latency in three steps:

  1. Identify a Star/Snowflake schema on Hadoop

  2. Build Cube from the identified tables

  3. Query using ANSI-SQL and get results in sub-scond via ODBC, JDBC, or RESTful API


Kyligence is the commercial open source model that takes Kylin and adds artificial intelligence (AI). Kyligence Cloud 4.5 is a self-tuning analytics acceleration platform that provides interactive data applications, dashboards, ad-hoc analytics, and real-time streaming data.


Smart-Tiered Storage

The integration of Clickhouse, the high-performance open source OLAP database, significantly improves the performance of both ad-hoc and detailed queries. With cloud object storage/HDFS and Clickhouse serving as two available storage tiers, Kyligence covers queries of all kinds - controlled and optimized by the same AI-augmented engine. Business users can still rely on the Unified Semantic Layer using their preferred visualization tool without the need to know the underlying computation and storage mechanism.


There have been customer beta tests in fiancial services and retail with six billion records with more than 400 attributes. Attributes were randomly selected in Excel pivot tables. Previously queries would time out in 10 minutes. Now companies are getting answers to queries in 15 to 20 seconds.


Real-time Response

Real-time response provides maximum data freshness for businesses who wish to conduct data monitoring and hybrid analysis of historical data and current data by offering a low latency, real-time query engine. Separate products are no longer required for batch and real time analytics. Instead, customers can run a hybrid architecture that reduces the complexity of analytics pipelines, shortens development cycles, and lowers operation and maintenance costs for IT.


Sync Semantic Layer to Power BI

Creating different data models to use with each BI tool leads to unnecessary complexity and inconsistency across reports and groups. The Kyligence Unified Semantic Layer reduces confusion and human error by providing all users and tools with consistent definitions and business views of their most valuable data assets. The new platform extends the semantic layer to support Power BI. Organizations that use Power BI can now leverage the same business definition and views as the other users.


Custom Data Source

Custom data source support ensures timely insight from any future data sources. The platform currently supports S3, Microsoft data lake storage, Snowflake and others. Future data sources will be supported via the SDK.


Key Takeaway

These features are key as the number of data sources are expanding at the same time the amount of data is expanding exponentially. This solution is of interest to different members of the data team responsible for generating value from data with data warehouses, data lakes, and analytics.