You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Zhong, Yanghong" <nj...@126.com> on 2016/03/26 00:43:37 UTC

Can we choose layered cubing or im-memory cubing manually

For some cube building jobs, kylin chooses im-memory cubing. However, this choice is not good for some users due to  large memory cost. Customers may be able to tolerate with long cube building time, but is not able to tolerate with large memory cost, which may lead to cube building MR job failure. Therefore, it may be better to provide a parameter for setting whether automatically decide the strategy or manually.

Best regards,
Yanghong Zhong
yangzhong@ebay.com

Re: Can we choose layered cubing or im-memory cubing manually

Posted by Li Yang <li...@apache.org>.
Currently the choice of inmem/layer is a global parameter
"kylin.cube.algorithm" -- its value can be "auto", "layer", or "inmem". The
default is "auto".

For "auto", the choice is decided at runtime by looking at each cube's
stats. There's another global parameter
"kylin.cube.algorithm.auto.threshold" -- takes value 8 by default, to judge
whether a cube is built by inmem or layer depending on the
"mapperOverlapRatio" of a cube is below or above the threshold. A high
overlap ratio means mappers receive much duplicated data and will produce
similar output that requires further aggregation at reducer side. Such
cases are bad for inmem and thus layer is preferred. On the other hand, a
low overlap ration will choose inmem for the reverse of the same reason.

For troubleshooting, suggest first find in log to better understand the
stats of the cube.

-- mapperOverlapRatio for <seg> is <mapperOverlapRatio> and threshold is
<threshold>
-- The cube algorithm for <seg> is <alg>

Then you might adjust the threshold a bit to make it go to layer.

Meanwhile we are working to let you specify inmem/layer on a cube, but that
will be in a new release.



On Sun, Mar 27, 2016 at 1:46 AM, Zhong, Yanghong <nj...@126.com>
wrote:

> There’s no error information. The failure reason is the MR job time
> exceeds the timeout value, for some mappers retries their jobs. In this
> kind of situation, the total time of layered cubing may be more that
> immemory cubing. However, time of each layered job is less than the time of
> immemory cubing, which will be better for finishing the MR job in cluster.
>
> There’s one case like this:
>
> With "-Xmx3600m”, 22 of 27 mappers failed;
> While with "-Xmx7040m”, 5 of 27 mappers failed;
>
> Maybe just increasing the maximum jvm memory is not good.
>
> Above all, it may be better to provide the option for clients to manually
> choose which strategy to use.
>
> Best regards,
> Yanghong Zhong
>
> > On Mar 26, 2016, at 4:23 PM, ShaoFeng Shi <sh...@apache.org>
> wrote:
> >
> > Hi Yanghong,
> >
> > What's the detail error in such a failed MR job?
> >
> > 2016-03-26 7:43 GMT+08:00 Zhong, Yanghong <nj...@126.com>:
> >
> >> For some cube building jobs, kylin chooses im-memory cubing. However,
> this
> >> choice is not good for some users due to  large memory cost. Customers
> may
> >> be able to tolerate with long cube building time, but is not able to
> >> tolerate with large memory cost, which may lead to cube building MR job
> >> failure. Therefore, it may be better to provide a parameter for setting
> >> whether automatically decide the strategy or manually.
> >>
> >> Best regards,
> >> Yanghong Zhong
> >> yangzhong@ebay.com
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
>
>
>

Re: Can we choose layered cubing or im-memory cubing manually

Posted by "Zhong, Yanghong" <nj...@126.com>.
There’s no error information. The failure reason is the MR job time exceeds the timeout value, for some mappers retries their jobs. In this kind of situation, the total time of layered cubing may be more that immemory cubing. However, time of each layered job is less than the time of immemory cubing, which will be better for finishing the MR job in cluster.

There’s one case like this:

With "-Xmx3600m”, 22 of 27 mappers failed;
While with "-Xmx7040m”, 5 of 27 mappers failed;

Maybe just increasing the maximum jvm memory is not good.

Above all, it may be better to provide the option for clients to manually choose which strategy to use.

Best regards,
Yanghong Zhong

> On Mar 26, 2016, at 4:23 PM, ShaoFeng Shi <sh...@apache.org> wrote:
> 
> Hi Yanghong,
> 
> What's the detail error in such a failed MR job?
> 
> 2016-03-26 7:43 GMT+08:00 Zhong, Yanghong <nj...@126.com>:
> 
>> For some cube building jobs, kylin chooses im-memory cubing. However, this
>> choice is not good for some users due to  large memory cost. Customers may
>> be able to tolerate with long cube building time, but is not able to
>> tolerate with large memory cost, which may lead to cube building MR job
>> failure. Therefore, it may be better to provide a parameter for setting
>> whether automatically decide the strategy or manually.
>> 
>> Best regards,
>> Yanghong Zhong
>> yangzhong@ebay.com
>> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi



Re: Can we choose layered cubing or im-memory cubing manually

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Yanghong,

What's the detail error in such a failed MR job?

2016-03-26 7:43 GMT+08:00 Zhong, Yanghong <nj...@126.com>:

> For some cube building jobs, kylin chooses im-memory cubing. However, this
> choice is not good for some users due to  large memory cost. Customers may
> be able to tolerate with long cube building time, but is not able to
> tolerate with large memory cost, which may lead to cube building MR job
> failure. Therefore, it may be better to provide a parameter for setting
> whether automatically decide the strategy or manually.
>
> Best regards,
> Yanghong Zhong
> yangzhong@ebay.com
>



-- 
Best regards,

Shaofeng Shi