You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by lk_hadoop <lk...@163.com> on 2019/04/09 01:21:59 UTC

can't pass step Build Cube In-Mem

hi,all :
   I'm using kylin-2.6.1-cdh57, and the source row count is 500 million，I can success build cube . 
   but when I use the cube planner , it has one step : Build Cube In-Mem for job :OPTIMIZE CUBE
   the config about the kylin_job_conf_inmem.xml is :

   <property>
        <name>mapreduce.map.memory.mb</name>
        <value>9216</value>
        <description></description>
    </property>

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value>
        <description></description>
    </property>

    <property>
        <name>mapreduce.job.is-mem-hungry</name>
        <value>true</value>
    </property>
 
    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        <name>mapreduce.job.max.split.locations</name>
        <value>2000</value>
        <description>No description</description>
    </property>

    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>200</value>
        <description></description>
    </property>


    finally the map job will be killed for OnOutOfMemoryError  , but when I giev more mem for map job , I will get another error :  java.nio.BufferOverflowException
      
    why kylin will run the job inmem ? how can I avoid it ?
       


2019-04-08


lk_hadoop

回复: Re: 答复: can't pass step Build Cube In-Mem

Posted by lk_hadoop <lk...@163.com>.

Thank you very much !  @Long Chao 

2019-04-12 

lk_hadoop 



发件人：Long Chao <ch...@gmail.com>
发送时间：2019-04-12 18:17
主题：Re: Re: 答复: can't pass step Build Cube In-Mem
收件人："dev"<de...@kylin.apache.org>
抄送：

Hi lk, 
  I have fixed this issue, and the code is in Kylin's master branch now. 

  If your situation is very urgent, you can apply the commit[ 
https://github.com/apache/kylin/commit/ed266aa98d8524a344469b1e1ead8bfd462702d8] 
and build a new binary package. 

  Btw, to keep the previous behavior(optimize job using inmem algorithm as 
default), I just add a new configuration parameter( 
*kylin.cube.algorithm.inmem-auto-optimize*) to remove the above limitation, 
you need to set *kylin.cube.algorithm.inmem-auto-optimize *to *false*, and 
then the optimize job will use the algorithm as you configured(like: 
*kylin.cube.algorithm=layer*). 

On Thu, Apr 11, 2019 at 6:00 PM lk_hadoop <lk...@163.com> wrote: 

> thank you ~ @Long Chao 
> 
> 2019-04-11 
> 
> lk_hadoop 
> 
> 
> 
> 发件人：Long Chao <ch...@gmail.com> 
> 发送时间：2019-04-11 17:56 
> 主题：Re: 答复: can't pass step Build Cube In-Mem 
> 收件人："dev"<de...@kylin.apache.org> 
> 抄送： 
> 
> Hi lk, 
>      Optimize job will only build the newly generated cuboids in the 
> recommended cuboid list, usually the amount of them is not too large. 
>      So, by default, we use inmem algorithm to build those new cuboids, 
> but 
> now the algorithm can't be overwritten by properties file. 
> 
>      And I create a jira for this problem to make the algorithm 
> configurable. https://issues.apache.org/jira/browse/KYLIN-3950 
> 
> On Thu, Apr 11, 2019 at 5:49 PM lk_hadoop <lk...@163.com> wrote: 
> 
> > I think that's not too much : 
> > 
> > Cuboid Distribution 
> > Current Cuboid Distribution 
> > [Cuboid Count: 49] [Row Count: 1117994636] 
> > 
> > Recommend Cuboid Distribution 
> > [Cuboid Count: 168] [Row Count: 464893216] 
> > 
> > 
> > 2019-04-11 
> > 
> > lk_hadoop 
> > 
> > 
> > 
> > 发件人：Na Zhai <na...@kyligence.io> 
> > 发送时间：2019-04-11 17:42 
> > 主题：答复: can't pass step Build Cube In-Mem 
> > 收件人："dev@kylin.apache.org"<de...@kylin.apache.org> 
> > 抄送： 
> > 
> > Hi, lk_hadoop. 
> > 
> > 
> > 
> > Does Cube planner recommend too many cuboid? If so, it may cause OOM. 
> > 
> > 
> > 
> > 
> > 
> > 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 
> > 
> > 
> > 
> > ________________________________ 
> > 发件人: lk_hadoop <lk...@163.com> 
> > 发送时间: Tuesday, April 9, 2019 9:21:59 AM 
> > 收件人: dev 
> > 主题: can't pass step Build Cube In-Mem 
> > 
> > hi,all : 
> >    I'm using kylin-2.6.1-cdh57, and the source row count is 500 
> million，I 
> > can success build cube . 
> >    but when I use the cube planner , it has one step : Build Cube In-Mem 
> > for job :OPTIMIZE CUBE 
> >    the config about the kylin_job_conf_inmem.xml is : 
> > 
> >    <property> 
> >         <name>mapreduce.map.memory.mb</name> 
> >         <value>9216</value> 
> >         <description></description> 
> >     </property> 
> > 
> >     <property> 
> >         <name>mapreduce.map.java.opts</name> 
> >         <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value> 
> >         <description></description> 
> >     </property> 
> > 
> >     <property> 
> >         <name>mapreduce.job.is-mem-hungry</name> 
> >         <value>true</value> 
> >     </property> 
> > 
> >     <property> 
> >         <name>mapreduce.job.split.metainfo.maxsize</name> 
> >         <value>-1</value> 
> >         <description>The maximum permissible size of the split metainfo 
> > file. 
> >             The JobTracker won't attempt to read split metainfo files 
> > bigger than 
> >             the configured value. No limits if set to -1. 
> >         </description> 
> >     </property> 
> > 
> >     <property> 
> >         <name>mapreduce.job.max.split.locations</name> 
> >         <value>2000</value> 
> >         <description>No description</description> 
> >     </property> 
> > 
> >     <property> 
> >         <name>mapreduce.task.io.sort.mb</name> 
> >         <value>200</value> 
> >         <description></description> 
> >     </property> 
> > 
> > 
> >     finally the map job will be killed for OnOutOfMemoryError  , but 
> when 
> > I giev more mem for map job , I will get another error : 
> > java.nio.BufferOverflowException 
> > 
> >     why kylin will run the job inmem ? how can I avoid it ? 
> > 
> > 
> > 
> > 2019-04-08 
> > 
> > 
> > lk_hadoop

Re: Re: 答复: can't pass step Build Cube In-Mem

Posted by Long Chao <ch...@gmail.com>.

Hi lk,
  I have fixed this issue, and the code is in Kylin's master branch now.

  If your situation is very urgent, you can apply the commit[
https://github.com/apache/kylin/commit/ed266aa98d8524a344469b1e1ead8bfd462702d8]
and build a new binary package.

  Btw, to keep the previous behavior(optimize job using inmem algorithm as
default), I just add a new configuration parameter(
*kylin.cube.algorithm.inmem-auto-optimize*) to remove the above limitation,
you need to set *kylin.cube.algorithm.inmem-auto-optimize *to *false*, and
then the optimize job will use the algorithm as you configured(like:
*kylin.cube.algorithm=layer*).

On Thu, Apr 11, 2019 at 6:00 PM lk_hadoop <lk...@163.com> wrote:

> thank you ~ @Long Chao
>
> 2019-04-11
>
> lk_hadoop
>
>
>
> 发件人：Long Chao <ch...@gmail.com>
> 发送时间：2019-04-11 17:56
> 主题：Re: 答复: can't pass step Build Cube In-Mem
> 收件人："dev"<de...@kylin.apache.org>
> 抄送：
>
> Hi lk,
>      Optimize job will only build the newly generated cuboids in the
> recommended cuboid list, usually the amount of them is not too large.
>      So, by default, we use inmem algorithm to build those new cuboids,
> but
> now the algorithm can't be overwritten by properties file.
>
>      And I create a jira for this problem to make the algorithm
> configurable. https://issues.apache.org/jira/browse/KYLIN-3950
>
> On Thu, Apr 11, 2019 at 5:49 PM lk_hadoop <lk...@163.com> wrote:
>
> > I think that's not too much :
> >
> > Cuboid Distribution
> > Current Cuboid Distribution
> > [Cuboid Count: 49] [Row Count: 1117994636]
> >
> > Recommend Cuboid Distribution
> > [Cuboid Count: 168] [Row Count: 464893216]
> >
> >
> > 2019-04-11
> >
> > lk_hadoop
> >
> >
> >
> > 发件人：Na Zhai <na...@kyligence.io>
> > 发送时间：2019-04-11 17:42
> > 主题：答复: can't pass step Build Cube In-Mem
> > 收件人："dev@kylin.apache.org"<de...@kylin.apache.org>
> > 抄送：
> >
> > Hi, lk_hadoop.
> >
> >
> >
> > Does Cube planner recommend too many cuboid? If so, it may cause OOM.
> >
> >
> >
> >
> >
> > 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
> >
> >
> >
> > ________________________________
> > 发件人: lk_hadoop <lk...@163.com>
> > 发送时间: Tuesday, April 9, 2019 9:21:59 AM
> > 收件人: dev
> > 主题: can't pass step Build Cube In-Mem
> >
> > hi,all :
> >    I'm using kylin-2.6.1-cdh57, and the source row count is 500
> million，I
> > can success build cube .
> >    but when I use the cube planner , it has one step : Build Cube In-Mem
> > for job :OPTIMIZE CUBE
> >    the config about the kylin_job_conf_inmem.xml is :
> >
> >    <property>
> >         <name>mapreduce.map.memory.mb</name>
> >         <value>9216</value>
> >         <description></description>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.map.java.opts</name>
> >         <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value>
> >         <description></description>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.job.is-mem-hungry</name>
> >         <value>true</value>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.job.split.metainfo.maxsize</name>
> >         <value>-1</value>
> >         <description>The maximum permissible size of the split metainfo
> > file.
> >             The JobTracker won't attempt to read split metainfo files
> > bigger than
> >             the configured value. No limits if set to -1.
> >         </description>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.job.max.split.locations</name>
> >         <value>2000</value>
> >         <description>No description</description>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.task.io.sort.mb</name>
> >         <value>200</value>
> >         <description></description>
> >     </property>
> >
> >
> >     finally the map job will be killed for OnOutOfMemoryError  , but
> when
> > I giev more mem for map job , I will get another error :
> > java.nio.BufferOverflowException
> >
> >     why kylin will run the job inmem ? how can I avoid it ?
> >
> >
> >
> > 2019-04-08
> >
> >
> > lk_hadoop

回复: Re: 答复: can't pass step Build Cube In-Mem

Posted by lk_hadoop <lk...@163.com>.

thank you ~ @Long Chao 

2019-04-11 

lk_hadoop 



发件人：Long Chao <ch...@gmail.com>
发送时间：2019-04-11 17:56
主题：Re: 答复: can't pass step Build Cube In-Mem
收件人："dev"<de...@kylin.apache.org>
抄送：

Hi lk, 
     Optimize job will only build the newly generated cuboids in the 
recommended cuboid list, usually the amount of them is not too large. 
     So, by default, we use inmem algorithm to build those new cuboids, but 
now the algorithm can't be overwritten by properties file. 

     And I create a jira for this problem to make the algorithm 
configurable. https://issues.apache.org/jira/browse/KYLIN-3950 

On Thu, Apr 11, 2019 at 5:49 PM lk_hadoop <lk...@163.com> wrote: 

> I think that's not too much : 
> 
> Cuboid Distribution 
> Current Cuboid Distribution 
> [Cuboid Count: 49] [Row Count: 1117994636] 
> 
> Recommend Cuboid Distribution 
> [Cuboid Count: 168] [Row Count: 464893216] 
> 
> 
> 2019-04-11 
> 
> lk_hadoop 
> 
> 
> 
> 发件人：Na Zhai <na...@kyligence.io> 
> 发送时间：2019-04-11 17:42 
> 主题：答复: can't pass step Build Cube In-Mem 
> 收件人："dev@kylin.apache.org"<de...@kylin.apache.org> 
> 抄送： 
> 
> Hi, lk_hadoop. 
> 
> 
> 
> Does Cube planner recommend too many cuboid? If so, it may cause OOM. 
> 
> 
> 
> 
> 
> 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 
> 
> 
> 
> ________________________________ 
> 发件人: lk_hadoop <lk...@163.com> 
> 发送时间: Tuesday, April 9, 2019 9:21:59 AM 
> 收件人: dev 
> 主题: can't pass step Build Cube In-Mem 
> 
> hi,all : 
>    I'm using kylin-2.6.1-cdh57, and the source row count is 500 million，I 
> can success build cube . 
>    but when I use the cube planner , it has one step : Build Cube In-Mem 
> for job :OPTIMIZE CUBE 
>    the config about the kylin_job_conf_inmem.xml is : 
> 
>    <property> 
>         <name>mapreduce.map.memory.mb</name> 
>         <value>9216</value> 
>         <description></description> 
>     </property> 
> 
>     <property> 
>         <name>mapreduce.map.java.opts</name> 
>         <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value> 
>         <description></description> 
>     </property> 
> 
>     <property> 
>         <name>mapreduce.job.is-mem-hungry</name> 
>         <value>true</value> 
>     </property> 
> 
>     <property> 
>         <name>mapreduce.job.split.metainfo.maxsize</name> 
>         <value>-1</value> 
>         <description>The maximum permissible size of the split metainfo 
> file. 
>             The JobTracker won't attempt to read split metainfo files 
> bigger than 
>             the configured value. No limits if set to -1. 
>         </description> 
>     </property> 
> 
>     <property> 
>         <name>mapreduce.job.max.split.locations</name> 
>         <value>2000</value> 
>         <description>No description</description> 
>     </property> 
> 
>     <property> 
>         <name>mapreduce.task.io.sort.mb</name> 
>         <value>200</value> 
>         <description></description> 
>     </property> 
> 
> 
>     finally the map job will be killed for OnOutOfMemoryError  , but when 
> I giev more mem for map job , I will get another error : 
> java.nio.BufferOverflowException 
> 
>     why kylin will run the job inmem ? how can I avoid it ? 
> 
> 
> 
> 2019-04-08 
> 
> 
> lk_hadoop

Re: 答复: can't pass step Build Cube In-Mem

Posted by Long Chao <ch...@gmail.com>.

Hi lk,
     Optimize job will only build the newly generated cuboids in the
recommended cuboid list, usually the amount of them is not too large.
     So, by default, we use inmem algorithm to build those new cuboids, but
now the algorithm can't be overwritten by properties file.

     And I create a jira for this problem to make the algorithm
configurable. https://issues.apache.org/jira/browse/KYLIN-3950

On Thu, Apr 11, 2019 at 5:49 PM lk_hadoop <lk...@163.com> wrote:

> I think that's not too much :
>
> Cuboid Distribution
> Current Cuboid Distribution
> [Cuboid Count: 49] [Row Count: 1117994636]
>
> Recommend Cuboid Distribution
> [Cuboid Count: 168] [Row Count: 464893216]
>
>
> 2019-04-11
>
> lk_hadoop
>
>
>
> 发件人：Na Zhai <na...@kyligence.io>
> 发送时间：2019-04-11 17:42
> 主题：答复: can't pass step Build Cube In-Mem
> 收件人："dev@kylin.apache.org"<de...@kylin.apache.org>
> 抄送：
>
> Hi, lk_hadoop.
>
>
>
> Does Cube planner recommend too many cuboid? If so, it may cause OOM.
>
>
>
>
>
> 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> ________________________________
> 发件人: lk_hadoop <lk...@163.com>
> 发送时间: Tuesday, April 9, 2019 9:21:59 AM
> 收件人: dev
> 主题: can't pass step Build Cube In-Mem
>
> hi,all :
>    I'm using kylin-2.6.1-cdh57, and the source row count is 500 million，I
> can success build cube .
>    but when I use the cube planner , it has one step : Build Cube In-Mem
> for job :OPTIMIZE CUBE
>    the config about the kylin_job_conf_inmem.xml is :
>
>    <property>
>         <name>mapreduce.map.memory.mb</name>
>         <value>9216</value>
>         <description></description>
>     </property>
>
>     <property>
>         <name>mapreduce.map.java.opts</name>
>         <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value>
>         <description></description>
>     </property>
>
>     <property>
>         <name>mapreduce.job.is-mem-hungry</name>
>         <value>true</value>
>     </property>
>
>     <property>
>         <name>mapreduce.job.split.metainfo.maxsize</name>
>         <value>-1</value>
>         <description>The maximum permissible size of the split metainfo
> file.
>             The JobTracker won't attempt to read split metainfo files
> bigger than
>             the configured value. No limits if set to -1.
>         </description>
>     </property>
>
>     <property>
>         <name>mapreduce.job.max.split.locations</name>
>         <value>2000</value>
>         <description>No description</description>
>     </property>
>
>     <property>
>         <name>mapreduce.task.io.sort.mb</name>
>         <value>200</value>
>         <description></description>
>     </property>
>
>
>     finally the map job will be killed for OnOutOfMemoryError  , but when
> I giev more mem for map job , I will get another error :
> java.nio.BufferOverflowException
>
>     why kylin will run the job inmem ? how can I avoid it ?
>
>
>
> 2019-04-08
>
>
> lk_hadoop

回复: 答复: can't pass step Build Cube In-Mem

Posted by lk_hadoop <lk...@163.com>.

I think that's not too much :

Cuboid Distribution
Current Cuboid Distribution
[Cuboid Count: 49] [Row Count: 1117994636]

Recommend Cuboid Distribution
[Cuboid Count: 168] [Row Count: 464893216]


2019-04-11 

lk_hadoop 



发件人：Na Zhai <na...@kyligence.io>
发送时间：2019-04-11 17:42
主题：答复: can't pass step Build Cube In-Mem
收件人："dev@kylin.apache.org"<de...@kylin.apache.org>
抄送：

Hi, lk_hadoop. 



Does Cube planner recommend too many cuboid? If so, it may cause OOM. 





发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 



________________________________ 
发件人: lk_hadoop <lk...@163.com> 
发送时间: Tuesday, April 9, 2019 9:21:59 AM 
收件人: dev 
主题: can't pass step Build Cube In-Mem 

hi,all : 
   I'm using kylin-2.6.1-cdh57, and the source row count is 500 million，I can success build cube . 
   but when I use the cube planner , it has one step : Build Cube In-Mem for job :OPTIMIZE CUBE 
   the config about the kylin_job_conf_inmem.xml is : 

   <property> 
        <name>mapreduce.map.memory.mb</name> 
        <value>9216</value> 
        <description></description> 
    </property> 

    <property> 
        <name>mapreduce.map.java.opts</name> 
        <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value> 
        <description></description> 
    </property> 

    <property> 
        <name>mapreduce.job.is-mem-hungry</name> 
        <value>true</value> 
    </property> 

    <property> 
        <name>mapreduce.job.split.metainfo.maxsize</name> 
        <value>-1</value> 
        <description>The maximum permissible size of the split metainfo file. 
            The JobTracker won't attempt to read split metainfo files bigger than 
            the configured value. No limits if set to -1. 
        </description> 
    </property> 

    <property> 
        <name>mapreduce.job.max.split.locations</name> 
        <value>2000</value> 
        <description>No description</description> 
    </property> 

    <property> 
        <name>mapreduce.task.io.sort.mb</name> 
        <value>200</value> 
        <description></description> 
    </property> 


    finally the map job will be killed for OnOutOfMemoryError  , but when I giev more mem for map job , I will get another error :  java.nio.BufferOverflowException 

    why kylin will run the job inmem ? how can I avoid it ? 



2019-04-08 


lk_hadoop

答复: can't pass step Build Cube In-Mem

Posted by Na Zhai <na...@kyligence.io>.

Hi, lk_hadoop.



Does Cube planner recommend too many cuboid? If so, it may cause OOM.





发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用



________________________________
发件人: lk_hadoop <lk...@163.com>
发送时间: Tuesday, April 9, 2019 9:21:59 AM
收件人: dev
主题: can't pass step Build Cube In-Mem

hi,all :
   I'm using kylin-2.6.1-cdh57, and the source row count is 500 million，I can success build cube .
   but when I use the cube planner , it has one step : Build Cube In-Mem for job :OPTIMIZE CUBE
   the config about the kylin_job_conf_inmem.xml is :

   <property>
        <name>mapreduce.map.memory.mb</name>
        <value>9216</value>
        <description></description>
    </property>

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx8192m -XX:OnOutOfMemoryError='kill -9 %p'</value>
        <description></description>
    </property>

    <property>
        <name>mapreduce.job.is-mem-hungry</name>
        <value>true</value>
    </property>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        <name>mapreduce.job.max.split.locations</name>
        <value>2000</value>
        <description>No description</description>
    </property>

    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>200</value>
        <description></description>
    </property>


    finally the map job will be killed for OnOutOfMemoryError  , but when I giev more mem for map job , I will get another error :  java.nio.BufferOverflowException

    why kylin will run the job inmem ? how can I avoid it ?



2019-04-08


lk_hadoop