You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by 热爱大发挥 <38...@qq.com> on 2016/01/31 08:02:35 UTC

cube 构建疑惑

我的fact表数据量为5000万左右, 给build cube 第二部的时候(Extact Fact Table Distinct Columns), reduce数量为什么都是1呢, 看了源代码确实是写死了1个,  这就导致了单个节点的负载过高,内存不足导致job退出了.
这个问题该如何解决呢, 能否订制每个步奏的mapreduce参数呢?

Re: cube 构建疑惑

Posted by ShaoFeng Shi <sh...@gmail.com>.

The screenshot couldn’t be shown up.

So far kylin doesn’t support customising mr configurations for each step. You can try to give the reducer more memory in conf/kylin_job_conf.xml as a workaround.

Besides, you need consider whether that ultra-high-cardinality column is meaningful as a dimension. If not, remove that; if yes and evening adding memory to the reducer still couldn’t work, then you can select “dictionary" as No in the “Advanced” tab, and set a max-length for that column, Kylin will not use dictionary to encode that, just copying the value to rowkey.

> On Jan 31, 2016, at 3:02 PM, 热爱大发挥 <38...@qq.com> wrote:
> 
> 我的fact表数据量为5000万左右, 给build cube 第二部的时候(Extact Fact Table Distinct Columns), reduce数量为什么都是1呢, 看了源代码确实是写死了1个,  这就导致了单个节点的负载过高,内存不足导致job退出了.
> 这个问题该如何解决呢, 能否订制每个步奏的mapreduce参数呢?
> 
> 
> 
>

回复： cube 构建疑惑

Posted by 热爱大发挥 <38...@qq.com>.

Take up your valuable time,thank you very much!




------------------ 原始邮件 ------------------
发件人: "ShaoFeng Shi";<sh...@gmail.com>;
发送时间: 2016年1月31日(星期天) 下午3:45
收件人: "user"<us...@kylin.apache.org>; 

主题: Re: cube 构建疑惑



for most of the cases, one reducer is okay for merge the distinct values from all dimension columns on fact table; but if there are multiple ultra high cardinality columns, using multiple reducers would gain better concurrency. Actually this is the task I'm doing today, as a part of work for another feature, it will be rollout in a certain release after 2.0


By the way, please try using English for getting wider audience.

发送自 Outlook Mobile





 On Sat, Jan 30, 2016 at 11:02 PM -0800, "热爱大发挥" <38...@qq.com> wrote:
 
   我的fact表数据量为5000万左右, 给build cube 第二部的时候(Extact Fact Table Distinct Columns), reduce数量为什么都是1呢, 看了源代码确实是写死了1个,  这就导致了单个节点的负载过高,内存不足导致job退出了.
这个问题该如何解决呢, 能否订制每个步奏的mapreduce参数呢?

Re: cube 构建疑惑

Posted by ShaoFeng Shi <sh...@gmail.com>.

for most of the cases, one reducer is okay for merge the distinct values from all dimension columns on fact table; but if there are multiple ultra high cardinality columns, using multiple reducers would gain better concurrency. Actually this is the task I'm doing today, as a part of work for another feature, it will be rollout in a certain release after 2.0
By the way, please try using English for getting wider audience.
发送自 Outlook Mobile




On Sat, Jan 30, 2016 at 11:02 PM -0800, "热爱大发挥" <38...@qq.com> wrote:










我的fact表数据量为5000万左右, 给build cube 第二部的时候(Extact Fact Table Distinct Columns), reduce数量为什么都是1呢, 看了源代码确实是写死了1个,  这就导致了单个节点的负载过高,内存不足导致job退出了.这个问题该如何解决呢, 能否订制每个步奏的mapreduce参数呢?