You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by xx...@apache.org on 2022/01/19 08:12:20 UTC

[kylin] branch document updated: Add new blog: The future of Kylin

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


The following commit(s) were added to refs/heads/document by this push:
     new cc1f863  Add new blog: The future of Kylin
cc1f863 is described below

commit cc1f863ffc37984ed2b38d1b9110d803babf0db0
Author: yaqian.zhang <59...@qq.com>
AuthorDate: Mon Jan 10 10:30:53 2022 +0800

    Add new blog: The future of Kylin
---
 .../blog/2022-01-12-The-Future-Of-Kylin.cn.md      |  47 ++++++++++++++++++
 .../_posts/blog/2022-01-12-The-Future-Of-Kylin.md  |  53 +++++++++++++++++++++
 website/download/index.md                          |   2 +-
 website/images/blog/the_future_of_kylin.png        | Bin 0 -> 74272 bytes
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.cn.md b/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.cn.md
new file mode 100644
index 0000000..1266ece
--- /dev/null
+++ b/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.cn.md
@@ -0,0 +1,47 @@
+---
+layout: post-blog
+title: 下一代 Kylin:更强大和易用的 OLAP
+date: 2022-01-12 11:00:00
+author: Yang Li
+categories: cn_blog
+---
+
+## 01 Apache Kylin 的今天
+目前,Apache Kylin 的最新发布版本是 4.0.1。 Apache Kylin 4.0 是 Kylin 3.x(HBase Storage)版本后的一次重大版本更新,Kylin 4 使用 Parquet 这种真正的列式存储来代替 HBase 存储,从而提升文件扫描性能;同时,Kylin 4 重新实现了基于 Spark 的构建引擎和查询引擎,使得计算和存储的分离变为可能,更加适应云原生的技术趋势。
+Kylin 4.0 对构建和查询引擎做了全面更新,实现了去 Hadoop 部署,解决了初步上云的问题。除此之外,结合社区用户的反馈以及 OLAP 技术发展的趋势,Kylin 社区发现当前的 Kylin 仍然存在一些弱势与不足,比如业务语义层能力有待加强、预计算模型变更不够灵活等,基于这些不足可以将后续需要进行的工作总结为以下几个方面:
+
+- 对非技术人员友好的多维查询能力。多维模型是 Kylin 区别于一般 OLAP 引擎的关键。特点在于,以维度、度量为基础的模型概念对非技术人员更友好,更接近 “人人都是数据分析师” 的目标。非技术人员能用的多维查询能力,应该是 Kylin 技术后续的新重心。
+- Native Engine。Kylin 引擎在向量加速、指令级优化方面尚有很大的提升空间。Kylin 依赖的 Spark 社区也有很强的 Native Engine 需求,乐观估计,Native Engine 可以至少提升目前的 Kylin 3 倍以上性能,值得投入。
+- 更多云原生能力。Kylin 4.0 只完成了初步上云,实现了云上的快速部署、动态资源伸缩等功能,但仍有很多云原生的能力还有待开发。
+
+## 02 Apache Kylin 的定位 —— 多维数据库
+Kylin 的核心是一个多维数据库,是一种特殊的 OLAP 引擎。虽然从诞生以来,Kylin 一直都有关系数据库的能力,也常常与其他关系型 OLAP 引擎做对比,但真正让 Kylin 与众不同的是它的多维模型和多维数据库能力。考虑到 Kylin 的本质和未来广泛的业务用途(不仅是技术用途),我们将明确定位 Kylin 为一个多维数据库。我们也期望通过多维模型和预计算技术,Apache Kylin 能让普通人看得懂和用得起大数据,最终实现数据民主化。
+
+### 语义层
+多维数据库与关系型数据库的 关键区别在于业务表达能力。尽管 SQL 表达能力很强,是数据分析师的基本技能,但如果以 “人人都是分析师” 为目标,SQL 和关系数据库对非技术人员还是太难了。从非技术人员的视角,数据湖和数据仓库就好似一个黑暗的房间,知道其中有很多数据,却因为不懂数据库理论和 SQL,无法看清、理解、和使用这些数据。
+如何让数据湖(和数据仓库)对非技术人员也 “清澈见底”?这就需要引入一个对非技术人员更加友好的数据模型 -- 多维数据模型。如果说关系模型描述了数据的技术形态,那么多维模型则描述了数据的业务形态。在多维数据库中,度量对应了每个人都懂的业务指标,维度则是比较、观察这些业务指标的角度。要与上个月比较 KPI,要在平行事业部之间比较绩效,这些是每个非技术人员都理解的概念。通过将关系模型映射到多维模型,本质是在技术数据之上增强了业务语义,形成业务语义层,帮助非技术人员也能看懂、探索、使用数据。
+为了增强 Kylin 作为多维数据库的语义层能力,支持多维查询语言是 Kylin Roadmap 上的重点内容,比如 MDX 和 DAX。通过 MDX 可以将 Kylin 中的数据模型转换为业务友好的语言,赋予数据业务价值,方便对接 Excel、Tableau 等 BI 工具进行多维分析。
+
+### 预计算和灵活的模型
+继续通过预计算技术降低单查询成本,让普通人用得起大数据,也是 Kylin 不变的使命。如果说多维模型解决了非技术人员看得懂数据的问题,那么预计算则能解决普通人用得起数据的问题,两者都是数据民主化的必备条件。通过一次计算多次使用,数据成本可以被多个用户分摊,达到用户越多越便宜的规模效应。预计算是 Kylin 的传统强项,但是在预计算模型的变更方面缺乏一定的灵活性,为了加强 Kylin 的模型的灵活变更能力,并带来更多可优化的空间,Kylin 社区预计在未来的 Kylin 中提出全新的元数据结构,使预计算更灵活,能够应对随时可能发生变化的表结构或者业务需求。
+
+### 总结
+综上,我们将明确 Kylin 的技术定位是一个多维数据库,通过多维模型和预计算技术,让普通人看得懂和用得起大数据,最终实现数据民主化的美好愿景。同时,对于今天将 Kylin 用作 SQL 加速层的用户,Kylin 将继续保有完备的 SQL 接口,保证预计算技术可以同时被关系模型和多维模型使用。
+在下图中,我们能清晰地看到未来 Kylin 关注的方向,新增和修改的部分大致使用蓝色和橙色标示出来。
+
+![](/images/blog/the_future_of_kylin.png)
+
+## 03 Apache Kylin 升级计划
+基于 Kylin 作为一个多维数据库的定位,结合当前 Kylin 存在的有待加强的能力,同时为了支持 Schema Change 等用户期待已久的功能,我们计划在未来的 Kylin 中引入新的 DataModel 的元数据结构,不再向用户暴露 Cube 的元数据,将元数据依赖关系简化为 Model -> Table 。
+由于元数据是社区后续协作开发的基础和契约,全新元数据结构的设计开发将会是当前以及今后几个月内 Kylin 社区工作的重点,元数据设计以及讨论文档会在一个月内发布,欢迎大家踊跃参与讨论,不出意外地话 2022 年新的元数据结构就会与大家见面,敬请期待。
+除了元数据结构升级以外,和元数据升级配套的构建和查询引擎、语义层能力(MDX)、与 BI 工具更好集成、Native Engine 等也是 Kylin 社区一直在积极推进的重点工作,欢迎更多志同道合的小伙伴参与进来,共创社区。
+
+** Further Reading **
+- https://en.wikipedia.org/wiki/Data_model
+- https://en.wikipedia.org/wiki/Semantic_layer
+- https://en.wikipedia.org/wiki/Multidimensional_analysis
+- https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
+- https://en.wikipedia.org/wiki/XML_for_Analysis
+- https://en.wikipedia.org/wiki/SIMD
+- https://en.wikipedia.org/wiki/Cloud_native_computing
+- https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
diff --git a/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.md b/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.md
new file mode 100644
index 0000000..9512055
--- /dev/null
+++ b/website/_posts/blog/2022-01-12-The-Future-Of-Kylin.md
@@ -0,0 +1,53 @@
+---
+layout: post-blog
+title: The future of Apache Kylin:More powerful and easy-to-use OLAP
+date: 2022-01-12 11:00:00
+author: Yang Li
+categories: blog
+---
+
+## 01 Apache Kylin Today
+
+Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses Parquet to replace HBase as storage engine, so as to improve file scanning performance. At the same time, Kylin 4.0 reimplements the spark based build engine and query engine, making it possible to separate computing and storage, and better adapt to the technology trend of cloud native.
+
+Kylin 4.0 comprehensively updated the build and query engine, realized the deployment mode without Hadoop dependency, decrease the complexity of deployment. In addition, combined with the feedback of Kylin users and the trend of OLAP technology, Kylin community found that there are still some weaknesses and deficiencies in today's Apache Kylin, such as the ability of business semantic layer needs to be strengthened and the modification of model/cube is not flexible. With these, we thinki [...]
+
+- Multi-dimensional query ability friendly to non-technical personnel. Multi-dimensional model is the key to distinguish Kylin from general OLAP engine. The feature is that the model concept based on dimension and measurement is more friendly to non-technical personnel and closer to the goal of "everyone is a data analyst". The multi-dimensional query capability that non-technical personnel can use should be the new focus of Kylin technology.
+- Native Engine. The query engine of Kylin still has much room for improvement in vector acceleration and cpu instruction level optimization. The Spark community Kylin relies on also has a strong demand for native engine. It is optimistic that native engine can improve the performance of Kylin by at least three times, which is worthy of investment.
+- More cloud native capabilities. Kylin 4.0 has only completed the initial cloud deployment and realized the features of rapid deployment and dynamic resource scaling on the cloud, but there are still many cloud native capabilities to be developed.
+
+More explanations are following.
+
+## 02 KYLIN AS A MULTI-DIMENSIONAL DATABASE
+The core of Kylin is a multi-dimensional database, which is a special OLAP engine. Although Kylin has always had the ability of relational database since its birth, and it is often compared with other relational OLAP engines, what really makes Kylin different is multi-dimensional model and multi-dimensional database ability. Considering the essence of Kylin and its wide range of business uses in the future (not only technical uses), we will clearly position Kylin as a multi-dimensional d [...]
+
+### THE SEMANTIC LAYER
+The key difference between multi-dimensional database and relational database is business expression ability. Although SQL has strong expression ability and is the basic skill of data analysts, SQL and relational database are still too difficult for non-technical personnel if we aim at "everyone is a data analyst". From the perspective of non-technical personnel, the data lake and data warehouse are like a dark room. They know that there is a lot of data, but they can't see clearly, unde [...]
+How to make the Data Lake (and data warehouse) clear to non-technical personnel? This requires introducing a more friendly data model for non-technical personnel —— multi-dimensional data model. While the relational model describes the technical form of data, the multi-dimensional model describes the business form of data. In multi-dimensional database, measurement corresponds to business indicators that everyone understands, and dimension is the perspective of comparing and observing th [...]
+In order to enhance Kylin's ability as the semantic layer of multi-dimensional database, supporting multi-dimensional query language is the key content of Kylin roadmap, such as MDX and DAX. MDX can transform the data model in Kylin into a business friendly language, endow data with business value, and facilitate Kylin's multi-dimensional analysis with BI tools such as Excel and Tableau.
+
+### PRECOMPUTATION AND MODEL FLEXIBILITY
+It is kylin's unchanging mission to continue to reduce the cost of a single query through precomputation technology so that ordinary people can afford big data. If the multi-dimensional model solves the problem that non-technical personnel can understand data, then precomputation can solve the problem that ordinary people can afford data. Both are necessary conditions for data democratization. Through one calculation and multiple use, the data cost can be shared by multiple users to achi [...]
+
+### SUMMARY
+To sum up, we will make it clear that Kylin's technical position is a multi-dimensional database. Through multi-dimensional model and precomputation technology, ordinary people can understand and afford big data, and finally realize the vision of data democratization. Meanwhile, for today's users who use Kylin as the SQL acceleration layer, Kylin will continue to maintain a complete SQL interface to ensure that the precomputation technology can be used by both relational model and multi- [...]
+In the figure below, we can clearly see the direction of Kylin's attention in the future. The newly added and modified parts are roughly marked in blue and orange.
+
+![](/images/blog/the_future_of_kylin.png)
+
+## 03 THE FUTURE PLAN
+
+Based on Kylin's positioning as a multi-dimensional database, combined with the existing capabilities of Kylin that need to be strengthened, and in order to support the long-awaited features of users such as schema change, we plan to introduce a new metadata format of DataModel into Kylin : no longer expose Cube to users, but simplify the metadata dependency to 'Model -> Table'.
+As metadata is the basis and contract for the subsequent collaborative development of Kylin, the design and development of the new metadata format will be the focus of Kylin community's work at present and in the next few months. The metadata design and discussion proposal will be released later. You are welcome to participate in the discussion. Not surprisingly, the new metadata format will meet you this year.
+In addition to metadata format upgrading, the build and query engine which support metadata upgrade, semantic layer capability (MDX), better integration with BI tools and native engine are also the key work that Kylin community has been actively promoting. More like-minded users and developers are welcome to participate in development and promote Kylin community development jointly.
+
+** Further Reading **
+- https://en.wikipedia.org/wiki/Data_model
+- https://en.wikipedia.org/wiki/Semantic_layer
+- https://en.wikipedia.org/wiki/Multidimensional_analysis
+- https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
+- https://en.wikipedia.org/wiki/XML_for_Analysis
+- https://en.wikipedia.org/wiki/SIMD
+- https://en.wikipedia.org/wiki/Cloud_native_computing
+- https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
+
diff --git a/website/download/index.md b/website/download/index.md
index d0bf522..8d32eaf 100644
--- a/website/download/index.md
+++ b/website/download/index.md
@@ -8,7 +8,7 @@ You can verify the download by following these [procedures](https://www.apache.o
 
 #### v4.0.1
 - This is a bug-fix release after Kylin 4.0.0, with 8 new features/improvements and 6 bug fixes. Check the release notes.
-- [Release notes](/docs/release_notes.html), [installation guide](https://cwiki.apache.org/confluence/display/KYLIN/Installation+Guide) and [upgrade guide](https://cwiki.apache.org/confluence/display/KYLIN/How+to+upgrade)
+- [Release notes](/docs/release_notes.html), [installation guide](https://cwiki.apache.org/confluence/display/KYLIN/Installation+Guide) and [upgrade guide](/docs/howto/howto_upgrade.html)
 - Source download: [apache-kylin-4.0.1-source-release.zip](https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-source-release.zip) \[[asc](https://www.apache.org/dist/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-source-release.zip.asc)\] \[[sha256](https://www.apache.org/dist/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-source-release.zip.sha256)\]
 - Binary for the download (check this to see which binary you should choose [Hadoop Matrix supported](https://cwiki.apache.org/confluence/display/KYLIN/Support+Hadoop+Version+Matrix+of+Kylin+4)) :
   - for Apache Spark 2.4.7 [apache-kylin-4.0.1-bin-spark2.tar.gz](https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-bin-spark2.tar.gz) \[[asc](https://www.apache.org/dist/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-bin-spark2.tar.gz.asc)\] \[[sha256](https://www.apache.org/dist/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-bin-spark2.tar.gz.sha256)\] 
diff --git a/website/images/blog/the_future_of_kylin.png b/website/images/blog/the_future_of_kylin.png
new file mode 100644
index 0000000..2e29d61
Binary files /dev/null and b/website/images/blog/the_future_of_kylin.png differ