You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Huang Hua <hu...@mininglamp.com> on 2015/08/12 08:25:57 UTC

Consider adding yarn queue option

Our hadoop cluster has multiple YARN execution queues for running Hadoop
jobs(like MR, SPARK) at different resource capacity.

 

But the current implementation of IntermediateHiveTableStep doesn't have
option for users to specify the YARN queue, 

which basically runs the "hive -e" command in the *DEFAULT* queue.
Unfortunately, *DEFAULT* queue might not have enough resource configured.

 

I think it would be great to allow user specify the running queue for KYLIN
jobs, and as far as I know it can be accomplished easily:

1. In kylin.properties, specify the MR arugment like
"kylin.job.cmd.extra.args=-D mapreduce.job.queuename=your_yarn_queue"

2. Modify the KylinConfig to add an option of YARN queue

3. Modify the createIntermediateHiveTableStep method of AbstractJobBuilder
to append "SET mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
command

For step 2 & 3, it only needs a little bit of coding. 

 

I am not sure if the above approach is the best way of doing it, so I would
like to hear the opinions from KYLIN community.

 


Thanks,

Hua


Re: 答复: Consider adding yarn queue option

Posted by Luke Han <lu...@gmail.com>.
Cool, this works very well in eBay environment, but if your case has
different requirement,
please raise again here and let's discuss for the best solution.


Best Regards!
---------------------

Luke Han

On Wed, Aug 12, 2015 at 3:43 PM, hongbin ma <ma...@apache.org> wrote:

> awesome, thanks for the feedback.
>
> On Wed, Aug 12, 2015 at 3:26 PM, Huang Hua <hu...@mininglamp.com>
> wrote:
>
> > Thanks for the details. I will put the yarn queue setting in
> > kylin_job_conf.xml and test if it works.
> >
> > @Hongbin, yes, I think it works for me as we normally configure only one
> > dedicated queue for all jobs of a Kylin instance.
> >
> > Thanks,
> > Hua
> > > -----邮件原件-----
> > > 发件人: dev-return-3632-
> > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > > 3632-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 hongbin
> > > ma
> > > 发送时间: 2015年8月12日 14:50
> > > 收件人: dev
> > > 主题: Re: Consider adding yarn queue option
> > >
> > > This is a feature not very well documented, and it is a global
> > solution(you
> > > cannot specify different queue for different jobs within the same Kylin
> > > instance)
> > >
> > > @Hua, does it solve you problem?
> > >
> > > On Wed, Aug 12, 2015 at 2:42 PM, Shi, Shaofeng <sh...@ebay.com>
> wrote:
> > >
> > > > Hi Hua,
> > > >
> > > > We have the same requirement as you in eBay internal deployment, and
> > > > our solution is adding such hadoop property in the
> > > > $KYLIN_HOME/conf/kylin_job_conf.xml, like:
> > > >
> > > > <property>
> > > >   <name>mapreduce.job.queuename</name>
> > > >   <value>queue-name</value>
> > > >   <description>Job queue</description>
> > > >         </property>
> > > >
> > > > The properties in this xml will be applied when running hive command
> > > > and MR jobs;
> > > >
> > > >
> > > > From 0.8, we separate hive related to another file called
> > > > ³kylin_hive_conf.xml², which will only be applied when running hive
> > > > command:
> > > >
> https://github.com/apache/incubator-kylin/blob/0.8/conf/kylin_hive_con
> > > > f.xml
> > > >
> > > >
> > > > Basically, we don¹t want to add such hadoop configurations to
> > > > kylin.properties; kylin.properties is for Kylin specific;
> > > >
> > > > Just let me know if it answers your question;
> > > >
> > > >
> > > > On 8/12/15, 2:25 PM, "Huang Hua" <hu...@mininglamp.com> wrote:
> > > >
> > > > >Our hadoop cluster has multiple YARN execution queues for running
> > > > >Hadoop jobs(like MR, SPARK) at different resource capacity.
> > > > >
> > > > >
> > > > >
> > > > >But the current implementation of IntermediateHiveTableStep doesn't
> > > > >have option for users to specify the YARN queue,
> > > > >
> > > > >which basically runs the "hive -e" command in the *DEFAULT* queue.
> > > > >Unfortunately, *DEFAULT* queue might not have enough resource
> > > configured.
> > > > >
> > > > >
> > > > >
> > > > >I think it would be great to allow user specify the running queue
> for
> > > > >KYLIN jobs, and as far as I know it can be accomplished easily:
> > > > >
> > > > >1. In kylin.properties, specify the MR arugment like
> > > > >"kylin.job.cmd.extra.args=-D
> > > mapreduce.job.queuename=your_yarn_queue"
> > > > >
> > > > >2. Modify the KylinConfig to add an option of YARN queue
> > > > >
> > > > >3. Modify the createIntermediateHiveTableStep method of
> > > > >AbstractJobBuilder to append "SET
> > > mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
> > > > >command
> > > > >
> > > > >For step 2 & 3, it only needs a little bit of coding.
> > > > >
> > > > >
> > > > >
> > > > >I am not sure if the above approach is the best way of doing it, so
> I
> > > > >would like to hear the opinions from KYLIN community.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >Thanks,
> > > > >
> > > > >Hua
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> >
> >
> >
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: 答复: Consider adding yarn queue option

Posted by hongbin ma <ma...@apache.org>.
awesome, thanks for the feedback.

On Wed, Aug 12, 2015 at 3:26 PM, Huang Hua <hu...@mininglamp.com> wrote:

> Thanks for the details. I will put the yarn queue setting in
> kylin_job_conf.xml and test if it works.
>
> @Hongbin, yes, I think it works for me as we normally configure only one
> dedicated queue for all jobs of a Kylin instance.
>
> Thanks,
> Hua
> > -----邮件原件-----
> > 发件人: dev-return-3632-
> > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > 3632-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 hongbin
> > ma
> > 发送时间: 2015年8月12日 14:50
> > 收件人: dev
> > 主题: Re: Consider adding yarn queue option
> >
> > This is a feature not very well documented, and it is a global
> solution(you
> > cannot specify different queue for different jobs within the same Kylin
> > instance)
> >
> > @Hua, does it solve you problem?
> >
> > On Wed, Aug 12, 2015 at 2:42 PM, Shi, Shaofeng <sh...@ebay.com> wrote:
> >
> > > Hi Hua,
> > >
> > > We have the same requirement as you in eBay internal deployment, and
> > > our solution is adding such hadoop property in the
> > > $KYLIN_HOME/conf/kylin_job_conf.xml, like:
> > >
> > > <property>
> > >   <name>mapreduce.job.queuename</name>
> > >   <value>queue-name</value>
> > >   <description>Job queue</description>
> > >         </property>
> > >
> > > The properties in this xml will be applied when running hive command
> > > and MR jobs;
> > >
> > >
> > > From 0.8, we separate hive related to another file called
> > > ³kylin_hive_conf.xml², which will only be applied when running hive
> > > command:
> > > https://github.com/apache/incubator-kylin/blob/0.8/conf/kylin_hive_con
> > > f.xml
> > >
> > >
> > > Basically, we don¹t want to add such hadoop configurations to
> > > kylin.properties; kylin.properties is for Kylin specific;
> > >
> > > Just let me know if it answers your question;
> > >
> > >
> > > On 8/12/15, 2:25 PM, "Huang Hua" <hu...@mininglamp.com> wrote:
> > >
> > > >Our hadoop cluster has multiple YARN execution queues for running
> > > >Hadoop jobs(like MR, SPARK) at different resource capacity.
> > > >
> > > >
> > > >
> > > >But the current implementation of IntermediateHiveTableStep doesn't
> > > >have option for users to specify the YARN queue,
> > > >
> > > >which basically runs the "hive -e" command in the *DEFAULT* queue.
> > > >Unfortunately, *DEFAULT* queue might not have enough resource
> > configured.
> > > >
> > > >
> > > >
> > > >I think it would be great to allow user specify the running queue for
> > > >KYLIN jobs, and as far as I know it can be accomplished easily:
> > > >
> > > >1. In kylin.properties, specify the MR arugment like
> > > >"kylin.job.cmd.extra.args=-D
> > mapreduce.job.queuename=your_yarn_queue"
> > > >
> > > >2. Modify the KylinConfig to add an option of YARN queue
> > > >
> > > >3. Modify the createIntermediateHiveTableStep method of
> > > >AbstractJobBuilder to append "SET
> > mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
> > > >command
> > > >
> > > >For step 2 & 3, it only needs a little bit of coding.
> > > >
> > > >
> > > >
> > > >I am not sure if the above approach is the best way of doing it, so I
> > > >would like to hear the opinions from KYLIN community.
> > > >
> > > >
> > > >
> > > >
> > > >Thanks,
> > > >
> > > >Hua
> > > >
> > >
> > >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
>
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

答复: Consider adding yarn queue option

Posted by Huang Hua <hu...@mininglamp.com>.
Thanks for the details. I will put the yarn queue setting in kylin_job_conf.xml and test if it works.

@Hongbin, yes, I think it works for me as we normally configure only one dedicated queue for all jobs of a Kylin instance.

Thanks,
Hua
> -----邮件原件-----
> 发件人: dev-return-3632-
> huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> 3632-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 hongbin
> ma
> 发送时间: 2015年8月12日 14:50
> 收件人: dev
> 主题: Re: Consider adding yarn queue option
> 
> This is a feature not very well documented, and it is a global solution(you
> cannot specify different queue for different jobs within the same Kylin
> instance)
> 
> @Hua, does it solve you problem?
> 
> On Wed, Aug 12, 2015 at 2:42 PM, Shi, Shaofeng <sh...@ebay.com> wrote:
> 
> > Hi Hua,
> >
> > We have the same requirement as you in eBay internal deployment, and
> > our solution is adding such hadoop property in the
> > $KYLIN_HOME/conf/kylin_job_conf.xml, like:
> >
> > <property>
> >   <name>mapreduce.job.queuename</name>
> >   <value>queue-name</value>
> >   <description>Job queue</description>
> >         </property>
> >
> > The properties in this xml will be applied when running hive command
> > and MR jobs;
> >
> >
> > From 0.8, we separate hive related to another file called
> > ³kylin_hive_conf.xml², which will only be applied when running hive
> > command:
> > https://github.com/apache/incubator-kylin/blob/0.8/conf/kylin_hive_con
> > f.xml
> >
> >
> > Basically, we don¹t want to add such hadoop configurations to
> > kylin.properties; kylin.properties is for Kylin specific;
> >
> > Just let me know if it answers your question;
> >
> >
> > On 8/12/15, 2:25 PM, "Huang Hua" <hu...@mininglamp.com> wrote:
> >
> > >Our hadoop cluster has multiple YARN execution queues for running
> > >Hadoop jobs(like MR, SPARK) at different resource capacity.
> > >
> > >
> > >
> > >But the current implementation of IntermediateHiveTableStep doesn't
> > >have option for users to specify the YARN queue,
> > >
> > >which basically runs the "hive -e" command in the *DEFAULT* queue.
> > >Unfortunately, *DEFAULT* queue might not have enough resource
> configured.
> > >
> > >
> > >
> > >I think it would be great to allow user specify the running queue for
> > >KYLIN jobs, and as far as I know it can be accomplished easily:
> > >
> > >1. In kylin.properties, specify the MR arugment like
> > >"kylin.job.cmd.extra.args=-D
> mapreduce.job.queuename=your_yarn_queue"
> > >
> > >2. Modify the KylinConfig to add an option of YARN queue
> > >
> > >3. Modify the createIntermediateHiveTableStep method of
> > >AbstractJobBuilder to append "SET
> mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
> > >command
> > >
> > >For step 2 & 3, it only needs a little bit of coding.
> > >
> > >
> > >
> > >I am not sure if the above approach is the best way of doing it, so I
> > >would like to hear the opinions from KYLIN community.
> > >
> > >
> > >
> > >
> > >Thanks,
> > >
> > >Hua
> > >
> >
> >
> 
> 
> --
> Regards,
> 
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone



Re: Consider adding yarn queue option

Posted by hongbin ma <ma...@apache.org>.
This is a feature not very well documented, and it is a global solution(you
cannot specify different queue for different jobs within the same Kylin
instance)

@Hua, does it solve you problem?

On Wed, Aug 12, 2015 at 2:42 PM, Shi, Shaofeng <sh...@ebay.com> wrote:

> Hi Hua,
>
> We have the same requirement as you in eBay internal deployment, and our
> solution is adding such hadoop property in the
> $KYLIN_HOME/conf/kylin_job_conf.xml, like:
>
> <property>
>   <name>mapreduce.job.queuename</name>
>   <value>queue-name</value>
>   <description>Job queue</description>
>         </property>
>
> The properties in this xml will be applied when running hive command and
> MR jobs;
>
>
> From 0.8, we separate hive related to another file called
> ³kylin_hive_conf.xml², which will only be applied when running hive
> command:
> https://github.com/apache/incubator-kylin/blob/0.8/conf/kylin_hive_conf.xml
>
>
> Basically, we don¹t want to add such hadoop configurations to
> kylin.properties; kylin.properties is for Kylin specific;
>
> Just let me know if it answers your question;
>
>
> On 8/12/15, 2:25 PM, "Huang Hua" <hu...@mininglamp.com> wrote:
>
> >Our hadoop cluster has multiple YARN execution queues for running Hadoop
> >jobs(like MR, SPARK) at different resource capacity.
> >
> >
> >
> >But the current implementation of IntermediateHiveTableStep doesn't have
> >option for users to specify the YARN queue,
> >
> >which basically runs the "hive -e" command in the *DEFAULT* queue.
> >Unfortunately, *DEFAULT* queue might not have enough resource configured.
> >
> >
> >
> >I think it would be great to allow user specify the running queue for
> >KYLIN
> >jobs, and as far as I know it can be accomplished easily:
> >
> >1. In kylin.properties, specify the MR arugment like
> >"kylin.job.cmd.extra.args=-D mapreduce.job.queuename=your_yarn_queue"
> >
> >2. Modify the KylinConfig to add an option of YARN queue
> >
> >3. Modify the createIntermediateHiveTableStep method of AbstractJobBuilder
> >to append "SET mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
> >command
> >
> >For step 2 & 3, it only needs a little bit of coding.
> >
> >
> >
> >I am not sure if the above approach is the best way of doing it, so I
> >would
> >like to hear the opinions from KYLIN community.
> >
> >
> >
> >
> >Thanks,
> >
> >Hua
> >
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Consider adding yarn queue option

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
Hi Hua,

We have the same requirement as you in eBay internal deployment, and our
solution is adding such hadoop property in the
$KYLIN_HOME/conf/kylin_job_conf.xml, like:

<property>
  <name>mapreduce.job.queuename</name>
  <value>queue-name</value>
  <description>Job queue</description>
        </property>

The properties in this xml will be applied when running hive command and
MR jobs;


>From 0.8, we separate hive related to another file called
³kylin_hive_conf.xml², which will only be applied when running hive
command:
https://github.com/apache/incubator-kylin/blob/0.8/conf/kylin_hive_conf.xml


Basically, we don¹t want to add such hadoop configurations to
kylin.properties; kylin.properties is for Kylin specific;

Just let me know if it answers your question;
 

On 8/12/15, 2:25 PM, "Huang Hua" <hu...@mininglamp.com> wrote:

>Our hadoop cluster has multiple YARN execution queues for running Hadoop
>jobs(like MR, SPARK) at different resource capacity.
>
> 
>
>But the current implementation of IntermediateHiveTableStep doesn't have
>option for users to specify the YARN queue,
>
>which basically runs the "hive -e" command in the *DEFAULT* queue.
>Unfortunately, *DEFAULT* queue might not have enough resource configured.
>
> 
>
>I think it would be great to allow user specify the running queue for
>KYLIN
>jobs, and as far as I know it can be accomplished easily:
>
>1. In kylin.properties, specify the MR arugment like
>"kylin.job.cmd.extra.args=-D mapreduce.job.queuename=your_yarn_queue"
>
>2. Modify the KylinConfig to add an option of YARN queue
>
>3. Modify the createIntermediateHiveTableStep method of AbstractJobBuilder
>to append "SET mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
>command
>
>For step 2 & 3, it only needs a little bit of coding.
>
> 
>
>I am not sure if the above approach is the best way of doing it, so I
>would
>like to hear the opinions from KYLIN community.
>
> 
>
>
>Thanks,
>
>Hua
>