You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by greg gu <gu...@hotmail.com> on 2016/02/02 20:31:37 UTC

only one reducer in job

When I process the cube, I found there on only one reducer, which cause the job to run very long time.
I found this https://issues.apache.org/jira/browse/KYLIN-1066, it mentioned the issue is fixed.  
 
If there a way to change the number of reducer? 
 
Thanks,
 
 
 		 	   		  

Re: only one reducer in job

Posted by greg gu <gu...@hotmail.com>.
thanks,

looks like that job only run once, when rebuild the cube, that job didn't run, and the cube processing is fast now

Sent from my iPhone

> On Feb 2, 2016, at 6:02 PM, ShaoFeng Shi <sh...@apache.org> wrote:
> 
> KYLIN-1066 <https://issues.apache.org/jira/browse/KYLIN-1066> is irrelevant
> with your issue, it was an intermediate issue when developing v2.0, you can
> see its "affected Version" and "fixed Version" are all "v2.0";
> 
> The  "Kylin Hive Column Cardinality Job" uses 1 reducer to merge the
> HyperLogLog counters from mappers, to do a rough estimation on the column
> cardinality;  As the output from from each mapper is is a list of HLL
> object, instead of the full distinct values, the data size is small (1KB *
> # columns), so using 1 reducer to merge all output should be more efficient.
> 
> Besides, this job is not a step in cube building, and is invisible from UI
> so far, are you sure it is the slow one that you observed?
> 
> 
> 2016-02-03 8:11 GMT+08:00 greg gu <gu...@hotmail.com>:
> 
>> By the way, the job step that uses 1 reducer is "Kylin Hive Column
>> Cardinality Job ", is this expected?
>> 
>>> From: gugreg@hotmail.com
>>> To: dev@kylin.apache.org
>>> Subject: only one reducer in job
>>> Date: Tue, 2 Feb 2016 11:31:37 -0800
>>> 
>>> When I process the cube, I found there on only one reducer, which cause
>> the job to run very long time.
>>> I found this https://issues.apache.org/jira/browse/KYLIN-1066, it
>> mentioned the issue is fixed.
>>> 
>>> If there a way to change the number of reducer?
>>> 
>>> Thanks,
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi

Re: only one reducer in job

Posted by ShaoFeng Shi <sh...@apache.org>.
KYLIN-1066 <https://issues.apache.org/jira/browse/KYLIN-1066> is irrelevant
with your issue, it was an intermediate issue when developing v2.0, you can
see its "affected Version" and "fixed Version" are all "v2.0";

The  "Kylin Hive Column Cardinality Job" uses 1 reducer to merge the
HyperLogLog counters from mappers, to do a rough estimation on the column
cardinality;  As the output from from each mapper is is a list of HLL
object, instead of the full distinct values, the data size is small (1KB *
# columns), so using 1 reducer to merge all output should be more efficient.

Besides, this job is not a step in cube building, and is invisible from UI
so far, are you sure it is the slow one that you observed?


2016-02-03 8:11 GMT+08:00 greg gu <gu...@hotmail.com>:

> By the way, the job step that uses 1 reducer is "Kylin Hive Column
> Cardinality Job ", is this expected?
>
> > From: gugreg@hotmail.com
> > To: dev@kylin.apache.org
> > Subject: only one reducer in job
> > Date: Tue, 2 Feb 2016 11:31:37 -0800
> >
> > When I process the cube, I found there on only one reducer, which cause
> the job to run very long time.
> > I found this https://issues.apache.org/jira/browse/KYLIN-1066, it
> mentioned the issue is fixed.
> >
> > If there a way to change the number of reducer?
> >
> > Thanks,
> >
> >
> >
>
>



-- 
Best regards,

Shaofeng Shi

Re: only one reducer in job

Posted by yu feng <ol...@gmail.com>.
Hi,  as far as I know, KYLIN-1066
<https://issues.apache.org/jira/browse/KYLIN-1066> is not about what you
are talking about, this job "Kylin Hive Column Cardinality Job" is
submitted after you loading table, it's function is calculate Cardinality
 of every column using hyperLogLog, so it have to use one reducer(something
like remove duplicate value), you can check other side of this job to
increase executing speed.

2016-02-03 8:11 GMT+08:00 greg gu <gu...@hotmail.com>:

> By the way, the job step that uses 1 reducer is "Kylin Hive Column
> Cardinality Job ", is this expected?
>
> > From: gugreg@hotmail.com
> > To: dev@kylin.apache.org
> > Subject: only one reducer in job
> > Date: Tue, 2 Feb 2016 11:31:37 -0800
> >
> > When I process the cube, I found there on only one reducer, which cause
> the job to run very long time.
> > I found this https://issues.apache.org/jira/browse/KYLIN-1066, it
> mentioned the issue is fixed.
> >
> > If there a way to change the number of reducer?
> >
> > Thanks,
> >
> >
> >
>
>

RE: only one reducer in job

Posted by greg gu <gu...@hotmail.com>.
By the way, the job step that uses 1 reducer is "Kylin Hive Column Cardinality Job ", is this expected?
 
> From: gugreg@hotmail.com
> To: dev@kylin.apache.org
> Subject: only one reducer in job
> Date: Tue, 2 Feb 2016 11:31:37 -0800
> 
> When I process the cube, I found there on only one reducer, which cause the job to run very long time.
> I found this https://issues.apache.org/jira/browse/KYLIN-1066, it mentioned the issue is fixed.  
>  
> If there a way to change the number of reducer? 
>  
> Thanks,
>  
>  
>