You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "bhbdata2013@gmail.com" <bh...@gmail.com> on 2017/07/06 02:26:00 UTC

高基数维度构建求助

hi,我们这边遇到了高基数维度构建的问题,比如说userid这种可能会超过1亿的数据量。构建的时候很容易失败,求问kylin有这方面的构建成功的案例吗?
我们的集群比较大,但是kylin申请的资源不太合理,很容易出现数据倾斜的情况,一个reduce失败。



bhbdata2013@gmail.com

Re: 高基数维度构建求助

Posted by ShaoFeng Shi <sh...@apache.org>.
Firstly, for Ultra Hight Cardinality dimension, dictionary encoding does
not fit. You need change to "integer" or "fixed_length" encoding method.
For "userid" if it is a integer/long number, "integer" is best matched. The
reason is dictionary need load all values into memory, that will fill up
Java heap when the cardinality is high.

Besides, if you have multiple UHC dimension in one cube, you'd better
customize the aggregation group to avoid them mutual grouped.


在 2017年7月6日 上午10:26,bhbdata2013@gmail.com <bh...@gmail.com>写道:

> hi,我们这边遇到了高基数维度构建的问题,比如说userid这种可能会超过1亿的数据量。构建的时候很容易失败,
> 求问kylin有这方面的构建成功的案例吗?
> 我们的集群比较大,但是kylin申请的资源不太合理,很容易出现数据倾斜的情况,一个reduce失败。
>
>
>
> bhbdata2013@gmail.com
>



-- 
Best regards,

Shaofeng Shi 史少锋