You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by nj...@apache.org on 2019/01/21 03:24:23 UTC

[kylin] branch document updated: Add blog for v2.6.0 release

This is an automated email from the ASF dual-hosted git repository.

nju_yaho pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


The following commit(s) were added to refs/heads/document by this push:
     new dc5d2b5  Add blog for v2.6.0 release
dc5d2b5 is described below

commit dc5d2b50f1f8a56b7a87196f2746ce5888d8af2d
Author: kyotoYaho <nj...@apache.org>
AuthorDate: Mon Jan 21 11:23:56 2019 +0800

    Add blog for v2.6.0 release
---
 .../_posts/blog/2019-01-18-release-v2.6.0.cn.md    | 76 ++++++++++++++++++++++
 website/_posts/blog/2019-01-18-release-v2.6.0.md   | 74 +++++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/website/_posts/blog/2019-01-18-release-v2.6.0.cn.md b/website/_posts/blog/2019-01-18-release-v2.6.0.cn.md
new file mode 100644
index 0000000..2c3f812
--- /dev/null
+++ b/website/_posts/blog/2019-01-18-release-v2.6.0.cn.md
@@ -0,0 +1,76 @@
+---
+layout: post-blog
+title:  Apache Kylin v2.6.0 正式发布
+date:   2019-01-18 20:00:00
+author: Yanghong Zhong
+categories: blog
+---
+
+近日Apache Kylin 社区很高兴地宣布,Apache Kylin 2.6.0 正式发布。
+
+Apache Kylin 是一个开源的分布式分析引擎,旨在为极大数据集提供 SQL 接口和多维分析(OLAP)的能力。
+
+这是继2.5.0 后的一个新功能版本。该版本引入了很多有价值的改进,完整的改动列表请参见[release notes](https://kylin.apache.org/docs/release_notes.html);这里挑一些主要改进做说明:
+
+### 针对以JDBC为数据源的SDK
+Kylin目前已经支持通过JDBC连接包括Amazon Redshift, SQL Server在内的多种数据源。
+为了便于开发者更便利地处理各种SQL dialect的不同以更加简单地开发新的基于JDBC的数据源,Kylin提供了相应的SDK和统一的API入口:
+* 同步元数据和数据
+* 构建cube
+* 当找不到相应的cube来解答查询时,下推查询到数据源
+
+更多内容参见 KYLIN-3552。
+
+### Memcached作Kylin的分布式缓存
+在过去,Kylin对查询结果的缓存不是十分高效,主要有以下两个方面的原因。
+一个是当Kylin的metadata发生变化时,会主动盲目地去删除大量有效的缓存,使得缓存会被频繁刷新而导致利用率很低。
+另一点是由于只使用本地缓存而导致Kylin server之间不能共享彼此的缓存,这样查询的缓存命中率就会降低。
+本地缓存还有一个缺点就是大小受到限制,不能像分布式缓存那样水平扩展。这样导致能缓存的查询结果量受到了限制。
+
+针对这些缺陷,我们改变了缓存失效的机制,不再主动去清理缓存,而是采取如下的方案:
+1. 在将查询结果放入缓存之前,根据当前的元数据信息计算一个数字签名,并与查询结果一同放入缓存中
+2. 从缓存中获取查询结果之后,根据当前的元数据信息计算一个数字签名,对比两者的数字签名是否一致。如果一致,那么缓存有效;反之,该缓存失效并删除
+
+我们还引入了Memcached作为Kylin的分布式缓存。这样Kylin server之间可以共享查询结果的缓存,而且由于Memcached server之间的独立性,非常易于水平拓展,更加有利于缓存更多的数据。
+相关开发任务是KYLIN-2895, KYLIN-2894, KYLIN-2896, KYLIN-2897, KYLIN-2898, KYLIN-2899。
+
+### ForkJoinPool简化fast cubing的线程模型
+在过去进行fast cubing时,Kylin使用自己定义的一系列线程,如split线程,task线程,main线程等等进行并发的cube构建。
+在这个线程模型中,线程之间的关系十分的复杂,而且对异常处理也十分容易出错。
+
+现在我们引入了ForkJoinPool,在主线程中处理split逻辑,构建cuboid的任务以及子任务都在fork join pool中执行,cuboid构建的结果可以被异步的收集并且可以更早地输出给下游的merge操作。更多内容参见 KYLIN-2932。
+
+### 改进HLLCounter的性能
+对于HLLCounter, 我们从两方面进行了改进:构建HLLCounter和计算调和平均的方式。
+1. 关于HLLCounter的构建,我们不再使用merge的方式,而是直接copy别的HLLCounter里面的registers
+2. 关于计算HLLCSnapshot里面的调和平均,做了以下三个方面的改进:
+* 缓存所有的1/2^r
+* 使用整型相加代替浮点型相加
+* 删除条件分支,例如无需检查registers[i]是不是为0
+
+更多内容参见 KYLIN-3656。
+
+### 改进Cube Planner算法
+在过去,cube planner的phase two增加未被预计算的cuboid的方式只能通过mandatory cuboid的方式。而一个cuboid是否为mandatory,又有两种方式:
+手动设置,查询时rollup的行数足够大。这里通过判断查询时rollup的行数是否足够大来判断是否为mandatory cuboid的方式有两大缺陷:
+* 一个是估算rollup的行数的算法不是很好
+* 一个是很难设立一个静态的阈值来做判定
+
+现在我们不再从rollup行数的角度看问题了。一切都是从cuboid行数的角度看问题,这样就和cost based的cube planner算法做了统一。
+为此我们通过使用rollup比率来改进了未被预先构建的cuboid的行数的估算,然后让cost based的cube planner算法来判定哪些未被构建的cuboid该被构建,哪些该被遗弃。
+通过这样的改进,无需通过设定静态的阈值来推荐mandatory cuboid了,而mandatory cuboid只能被手动设置,不能被推荐了。更多内容参见 KYLIN-3540。
+
+__下载__
+
+要下载Apache Kylin v2.6.0源代码或二进制包,请访问[下载页面](http://kylin.apache.org/download) .
+
+__升级__
+ 
+参考[升级指南](/docs/howto/howto_upgrade.html).
+
+__反馈__
+
+如果您遇到问题或疑问,请发送邮件至 Apache Kylin dev 或 user 邮件列表:dev@kylin.apache.org,user@kylin.apache.org; 在发送之前,请确保您已通过发送电子邮件至 dev-subscribe@kylin.apache.org 或 user-subscribe@kylin.apache.org订阅了邮件列表。
+
+
+_非常感谢所有贡献Apache Kylin的朋友!_
\ No newline at end of file
diff --git a/website/_posts/blog/2019-01-18-release-v2.6.0.md b/website/_posts/blog/2019-01-18-release-v2.6.0.md
new file mode 100644
index 0000000..d01292b
--- /dev/null
+++ b/website/_posts/blog/2019-01-18-release-v2.6.0.md
@@ -0,0 +1,74 @@
+---
+layout: post-blog
+title:  Apache Kylin v2.6.0 Release Announcement
+date:   2019-01-18 20:00:00
+author: Yanghong Zhong
+categories: blog
+---
+
+The Apache Kylin community is pleased to announce the release of Apache Kylin v2.6.0.
+
+Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Big Data supporting extremely large datasets.
+
+This is a major release after 2.5.0, including many enhancements. All of the changes can be found in the [release notes](https://kylin.apache.org/docs/release_notes.html). Here just highlight the major ones:
+
+### SDK for JDBC sources
+Apache Kylin has already supported several data sources like Amazon Redshift, SQL Server through JDBC. 
+To help developers handle SQL dialect differences and easily implement a new data source through JDBC, Kylin provides a new data source SDK with APIs for:
+* Synchronize metadata and data from JDBC source
+* Build cube from JDBC source
+* Query pushdown to JDBC source engine when cube is unmatched
+
+Check KYLIN-3552 for more.
+
+### Memcached as distributed cache
+In the past, query caches are not efficiently used in Kylin due to two aspects: aggressive cache expiration strategy and local cache. 
+Because of the aggressive cache expiration strategy, useful caches are often cleaned up unnecessarily. 
+Because query caches are stored in local servers, they cannot be shared between servers. 
+And because of the size limitation of local cache, not all useful query results can be cached.
+
+To deal with these shortcomings, we change the query cache expiration strategy by signature checking and introduce the memcached as Kylin's distributed cache so that Kylin servers are able to share cache between servers. 
+And it's easy to add memcached servers to scale out distributed cache. With enough memcached servers, we can cached things as much as possible. 
+Then we also introduce segment level query cache which can not only speed up query but also reduce the rpcs to HBase. 
+The related tasks are KYLIN-2895, KYLIN-2894, KYLIN-2896, KYLIN-2897, KYLIN-2898, KYLIN-2899.
+
+### ForkJoinPool for fast cubing
+In the past, fast cubing uses split threads, task threads and main thread to do the cube building, there is complex join and error handling logic.
+
+The new implement leverages the ForkJoinPool from JDK, the event split logic is handled in
+main thread. Cuboid task and sub-tasks are handled in fork join pool, cube results are collected
+async and can be write to output earlier. Check KYLIN-2932 for more.
+
+### Improve HLLCounter performance
+In the past, the way to create HLLCounter and to compute harmonic mean are not efficient. 
+
+The new implement improve the HLLCounter creation by copy register from another HLLCounter instead of merge. To compute harmonic mean in the HLLCSnapshot, it does the enhancement by 
+* using table to cache all 1/2^r  without computing on the fly
+* remove floating addition by using integer addition in the bigger loop
+* remove branch, e.g. needn't checking whether registers[i] is zero or not, although this is minor improvement.
+
+Check KYLIN-3656 for more.
+
+### Improve Cuboid Recommendation Algorithm
+In the past, to add cuboids which are not prebuilt, the cube planner turns to mandatory cuboids which are selected if its rollup row count is above some threshold. 
+There are two shortcomings:
+* The way to estimate the rollup row count is not good
+* It's hard to determine the threshold of rollup row count for recommending mandatory cuboids
+
+The new implement improves the way to estimate the row count of un-prebuilt cuboids by rollup ratio rather than exact rollup row count. 
+With better estimated row counts for un-prebuilt cuboids, the cost-based cube planner algorithm will decide which cuboid to be built or not and the threshold for previous mandatory cuboids is not needed. 
+By this improvement, we don't need the threshold for mandatory cuboids recommendation, and mandatory cuboids can only be manually set and will not be recommended. Check KYLIN-3540 for more.
+
+__Download__
+
+To download Apache Kylin v2.6.0 source code or binary package, visit the [download](http://kylin.apache.org/download) page.
+
+__Upgrade__
+ 
+Follow the [upgrade guide](/docs/howto/howto_upgrade.html).
+
+__Feedback__
+
+If you face issue or question, please send mail to Apache Kylin dev or user mailing list: dev@kylin.apache.org , user@kylin.apache.org; Before sending, please make sure you have subscribed the mailing list by dropping an email to dev-subscribe@kylin.apache.org or user-subscribe@kylin.apache.org.
+
+_Great thanks to everyone who contributed!_