You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Billy Liu <bi...@apache.org> on 2017/04/01 01:42:29 UTC

Re: Build Dimension Dictionary timeout For Count-Distinct columns

Please check https://issues.apache.org/jira/browse/KYLIN-1178
Suppose this issue would be fixed in Kylin 2.0

在 2017年3月30日 下午2:45,曾耀武 <ze...@immomo.com>写道:

>
>
>
>
>      嗨,少峰:
>
> 我使用kylin 1.6 版本,在测试  count-distinct  时有一个比较困扰的问题:
>
> 在用户级别达到4亿多的时候,我在计算uv 的时候kylin需要 对用户id 进行构建字典操作,官方说做全局字典的时候基数能达到20亿,
>
> 但是第四步在创建字典的时候好像是在本地执行的构建任务,特别消耗系统资源,cup 和内存几乎爆满。kylin
> 页面也没法访问,服务器的配置为推荐配置如下:
>
> KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m -XX:NewSize=3g
> -XX:MaxNewSize=3g -XX:SurvivorRatio=4 -XX:+CMSClassUnloadingEnabled   -
> XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError"
>
>
>
> 然而uv 计算在公司业务中是比较频繁的计算。
>
> 请教一下在这一步有没有什么好的优化建议或者对系统配置要求有什么可以提升的地方。
>
>
>
> regards
>
>
>
>
>
>
>

Re: 答复: Build Dimension Dictionary timeout For Count-Distinct columns

Posted by ShaoFeng Shi <sh...@apache.org>.
16G memory is not big enough I guess; try to allocate more mem
(gradually) to the job node.

Donald's suggestion is correct, use approximate count distinct if your
business scenario can accept, that is much cheaper.

@kaisen, @dayue, do you have more comments here?



2017-04-01 11:32 GMT+08:00 Donald,Zheng(vip.com) <do...@vipshop.com>:

> Using precise count distinct to calculate UV metric?
>
> Should switch to  approximate count distinct to avoid ‘Global Dictionary’
> building, if approximate result is  acceptable.
>
>
>
> *发件人:* Billy Liu [mailto:billyliu@apache.org]
> *发送时间:* 2017年4月1日 9:42
> *收件人:* user
> *主题:* Re: Build Dimension Dictionary timeout For Count-Distinct columns
>
>
>
> Please check https://issues.apache.org/jira/browse/KYLIN-1178
>
> Suppose this issue would be fixed in Kylin 2.0
>
>
>
> 在 2017年3月30日 下午2:45,曾耀武 <ze...@immomo.com>写道:
>
>
>
>
>
>      嗨,少峰:
>
> 我使用kylin 1.6 版本,在测试  count-distinct  时有一个比较困扰的问题:
>
> 在用户级别达到4亿多的时候,我在计算uv 的时候kylin需要 对用户id 进行构建字典操作,官方说做全局字典的时候基数能达到20亿,
>
> 但是第四步在创建字典的时候好像是在本地执行的构建任务,特别消耗系统资源,cup 和内存几乎爆满。kylin
> 页面也没法访问,服务器的配置为推荐配置如下:
>
> KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m -XX:NewSize=3g
> -XX:MaxNewSize=3g -XX:SurvivorRatio=4 -XX:+CMSClassUnloadingEnabled   -
> XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError"
>
>
>
> 然而uv 计算在公司业务中是比较频繁的计算。
>
> 请教一下在这一步有没有什么好的优化建议或者对系统配置要求有什么可以提升的地方。
>
>
>
> regards
>
>
>
>
>
>
>
>
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>



-- 
Best regards,

Shaofeng Shi 史少锋

答复: Build Dimension Dictionary timeout For Count-Distinct columns

Posted by "Donald,Zheng(vip.com)" <do...@vipshop.com>.
Using precise count distinct to calculate UV metric?
Should switch to  approximate count distinct to avoid ‘Global Dictionary’ building, if approximate result is  acceptable.

发件人: Billy Liu [mailto:billyliu@apache.org]
发送时间: 2017年4月1日 9:42
收件人: user
主题: Re: Build Dimension Dictionary timeout For Count-Distinct columns

Please check https://issues.apache.org/jira/browse/KYLIN-1178
Suppose this issue would be fixed in Kylin 2.0

在 2017年3月30日 下午2:45,曾耀武 <ze...@immomo.com>>写道:


     嗨,少峰:
我使用kylin 1.6 版本,在测试  count-distinct  时有一个比较困扰的问题:
在用户级别达到4亿多的时候,我在计算uv 的时候kylin需要 对用户id 进行构建字典操作,官方说做全局字典的时候基数能达到20亿,
但是第四步在创建字典的时候好像是在本地执行的构建任务,特别消耗系统资源,cup 和内存几乎爆满。kylin 页面也没法访问,服务器的配置为推荐配置如下:
KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m -XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4 -XX:+CMSClassUnloadingEnabled   -  XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70 -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError"

然而uv 计算在公司业务中是比较频繁的计算。
请教一下在这一步有没有什么好的优化建议或者对系统配置要求有什么可以提升的地方。

regards




本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.