You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Na Zhai <na...@kyligence.io> on 2019/03/18 03:54:57 UTC
答复: [Discussion] Enable shrunken dictionary by default

+1



发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用



________________________________
发件人: Billy Liu <bi...@apache.org>
发送时间: Monday, March 18, 2019 11:50:49 AM
收件人: dev
抄送: Xiaoxiang Yu
主题: Re: [Discussion] Enable shrunken dictionary by default

22 hours to 5 minutes, incredible progress.
+1

With Warm regards

Billy Liu

ShaoFeng Shi <sh...@apache.org> 于2019年3月18日周一 上午2:59写道：
>
> +1.
>
> Thanks to Xiaoxiang for raising this; Kylin has some advanced but hidden
> feature. As the function becomes stable, we should enable them by default
> to benefit all users.
>
> Please also raise similar discussion if you wish to enable some good
> features.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Zhong, Yanghong <ya...@ebay.com.invalid> 于2019年3月18日周一 上午10:39写道：
>
> > +1.
> >
> > Best regards,
> > Yanghong Zhong
> >
> > On 2019/3/18, 10:27 AM, "Xiaoxiang Yu" <xi...@kyligence.io> wrote:
> >
> >     Dear all,
> >     I suggest enable "kylin.dictionary.shrunken-from-global-enabled" by
> > default(it is disabled by default), because I found enable it will speed up
> > cube build process when cube have count distinct(bitmap) on a large
> > cardinality column. This feature is contributed in KYLIN-3491.
> >
> >     When using count distinct(bitmap) measure on a large cardinality
> > column(this require global dictionary), build base cuboid step need
> > frequent cache swap so it cannot finished within a reasonable period.
> > KYLIN-3491 add a new step to build separated dictionary for each InputSplit
> > before BuildBaseCuboid step. So mapper of BuildBaseCuboid step only has to
> > fetch a smaller dictionary for itself(without unused value), instead of a
> > larger global dictionary. It will reduce cache swap and make
> > BuildBaseCuboid step run as quick as possible.
> >
> >     In my test env, my hadoop cluster is a CDH cluster with 56 vcore and
> > 110GB Memory. I create a model with a fact table (153326740 rows) and three
> > dimension tables, there are three count distinct(bitmap) measure which the
> > largest cardinality of single column is 55200325. With ShrunkenDict
> > disabled, the BuildBaseCuboid cannot completed in 22 hours. Comparatively,
> > with ShrunkenDict enabled, build process completed in a reasonable
> > duration(Extra Dictionary cost 5 minutes, Build Base Cuboid costs 5
> > minutes).
> >
> >
> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F14030549%2F54363305-ad25e200-46a5-11e9-8bc7-fe2c385c0278.png&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=KuUcbcerY42oG4J11G1jlEcIs4v%2BPPVt40B9G9fqa80%3D&amp;reserved=0
> >
> >     If you want know more, please check
> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKYLIN-3491&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=T1P1rCA1munwUedC0PC4qttqbFqiDkda%2FZ%2BgqgkQn%2BE%3D&amp;reserved=0.
> > If you have any suggestion, please let me know.
> >
> >     ----------------
> >     Best wishes,
> >     Xiaoxiang Yu
> >
> >
> >
> >