You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sagar Mehta <sa...@gmail.com> on 2013/04/25 03:22:57 UTC

Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Hi Guys,

We have a general purpose Hive cluster [about 200 nodes] which is used for
various jobs like

   - Production
   - Experimental/Research
   - Adhoc queries

We are using the fair-share scheduler to schedule them and for this we have
corresponding 3 pools in the scheduler.

*Here is what we want.*

*A hive query submitted by a user with user-name A should go to one of the
pools above based on a pre-defined mapping. We are wondering where/how to
specify this mapping?*

*We can do this manually by adding -Dmapred.job.queue.name="X" on a
particular job run.*

This puts the job on the map-reduce queue named "X" and the following
configuration in the fair-share scheduler

  <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>mapred.job.queue.name</value>
  </property>

maps this to a pool named "X" in the fair-share scheduler.

However we [while wearing our Hadoop developer/admin hat] don't want the
user/analyst to specify that so as to enforce some cluster-use policy.

Based on his/her username we want to automatically select which hadoop
queue and subsequently which fair-share scheduler pool, his/her job should
go to. I'm pretty sure this is a common use-case and wondering how to do
this in Hadoop.

Any help/insights/pointers would be greatly appreciated.

Sagar
PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Vinod,

Yes this is exactly what we are doing right now which works but is manual
and exposes the policy.
I think the JIRA than Sandy pointed out -
https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step
in that direction.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'standard' way to do this is using queu-acls to enforce a particular
> user to be able to submit jobs to a sub-set of queues and then let the user
> decide which of that subset of queues he wishes to submit a job to.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:
>
> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Vinod,

Yes this is exactly what we are doing right now which works but is manual
and exposes the policy.
I think the JIRA than Sandy pointed out -
https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step
in that direction.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'standard' way to do this is using queu-acls to enforce a particular
> user to be able to submit jobs to a sub-set of queues and then let the user
> decide which of that subset of queues he wishes to submit a job to.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:
>
> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Vinod,

Yes this is exactly what we are doing right now which works but is manual
and exposes the policy.
I think the JIRA than Sandy pointed out -
https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step
in that direction.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'standard' way to do this is using queu-acls to enforce a particular
> user to be able to submit jobs to a sub-set of queues and then let the user
> decide which of that subset of queues he wishes to submit a job to.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:
>
> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Vinod,

Yes this is exactly what we are doing right now which works but is manual
and exposes the policy.
I think the JIRA than Sandy pointed out -
https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step
in that direction.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'standard' way to do this is using queu-acls to enforce a particular
> user to be able to submit jobs to a sub-set of queues and then let the user
> decide which of that subset of queues he wishes to submit a job to.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:
>
> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The 'standard' way to do this is using queu-acls to enforce a particular user to be able to submit jobs to a sub-set of queues and then let the user decide which of that subset of queues he wishes to submit a job to.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:

> Hi Guys,
> 
> We have a general purpose Hive cluster [about 200 nodes] which is used for various jobs like
> Production
> Experimental/Research
> Adhoc queries
> We are using the fair-share scheduler to schedule them and for this we have corresponding 3 pools in the scheduler.
> 
> Here is what we want.
> 
> A hive query submitted by a user with user-name A should go to one of the pools above based on a pre-defined mapping. We are wondering where/how to specify this mapping?
> 
> We can do this manually by adding -Dmapred.job.queue.name="X" on a particular job run.
> 
> This puts the job on the map-reduce queue named "X" and the following configuration in the fair-share scheduler
> 
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
> 
> maps this to a pool named "X" in the fair-share scheduler.
> 
> However we [while wearing our Hadoop developer/admin hat] don't want the user/analyst to specify that so as to enforce some cluster-use policy.
> 
> Based on his/her username we want to automatically select which hadoop queue and subsequently which fair-share scheduler pool, his/her job should go to. I'm pretty sure this is a common use-case and wondering how to do this in Hadoop. 
> 
> Any help/insights/pointers would be greatly appreciated.
> 
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
> 
> 
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Nitin,

Thanks for your reply.

Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different
users.

I think the JIRA that Sandy pointed out below is a good first step in that
direction.

Sagar

On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <ni...@gmail.com>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
>
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
>
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
>
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
>
>
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Nitin,

Thanks for your reply.

Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different
users.

I think the JIRA that Sandy pointed out below is a good first step in that
direction.

Sagar

On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <ni...@gmail.com>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
>
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
>
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
>
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
>
>
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Nitin,

Thanks for your reply.

Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different
users.

I think the JIRA that Sandy pointed out below is a good first step in that
direction.

Sagar

On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <ni...@gmail.com>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
>
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
>
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
>
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
>
>
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Nitin,

Thanks for your reply.

Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different
users.

I think the JIRA that Sandy pointed out below is a good first step in that
direction.

Sagar

On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <ni...@gmail.com>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
>
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
>
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
>
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
>
>
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Nitin Pawar <ni...@gmail.com>.

the current capacity scheduler guarantees that which users can submit jobs
to which queue and other related features.
More of which you can read at
http://hadoop.apache.org/docs/stable/capacity_scheduler.html

but on the hive side, unless you set mapred.job.queue.name on the hive cli,
they will be submitted to default job queue.

So basically what you would like to do is create user, associate it with a
queue on scheduler and ask the user to modify its queue on local hiverc
file.

I am not sure if this can be part of hive's metastore. Because one user can
be allowed to submit the job to multiple queues and then best way to handle
it is via setting the property each time you open the session or via hiverc
file


On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>


-- 
Nitin Pawar

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Nitin Pawar <ni...@gmail.com>.

the current capacity scheduler guarantees that which users can submit jobs
to which queue and other related features.
More of which you can read at
http://hadoop.apache.org/docs/stable/capacity_scheduler.html

but on the hive side, unless you set mapred.job.queue.name on the hive cli,
they will be submitted to default job queue.

So basically what you would like to do is create user, associate it with a
queue on scheduler and ask the user to modify its queue on local hiverc
file.

I am not sure if this can be part of hive's metastore. Because one user can
be allowed to submit the job to multiple queues and then best way to handle
it is via setting the property each time you open the session or via hiverc
file


On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>


-- 
Nitin Pawar

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Sagar,

I'm glad to hear that it would help.  Unfortunately, we are no longer
adding features to CDH3, so you would have to upgrade to CDH4 or backport
it yourself to use it.

-Sandy


On Fri, Apr 26, 2013 at 10:27 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Sandy,
>
> Thanks for your prompt reply!!
>
> The jira that you pointed out would make it easy for us to do the
> automatic mapping and getting close towards enforcing a policy
> automatically. Any idea when it would be incorporated into cdh/hadoop
> releases and if it could be back-ported for cdh3u2 which we have currently
> running in production?
>
> Currently we are getting around this using the -Dmapred.job.queue.name="X"
> and the subsequent mapping of map-red job queue to Fair-share scheduler
> pool. We are using ACLs [more of a white-list] by
> configuring  mapred-queue-acls.xml to ensure people can only submit to the
> right queue.
>
> *Two limitations of this round-about approach are*
>
>    1. It is manual
>    2. It exposes the policy where user A is asked to submit jobs to queue
>    X and user B is asked to submit jobs to queue Y [with different scheduler
>    properties]. We want this to be completely transparent to the user of our
>    cluster.
>
> The jira above would be a great first step towards such automatic mapping!!
>
> Cheers,
> Sagar
>
>
> On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Sagar,

I'm glad to hear that it would help.  Unfortunately, we are no longer
adding features to CDH3, so you would have to upgrade to CDH4 or backport
it yourself to use it.

-Sandy


On Fri, Apr 26, 2013 at 10:27 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Sandy,
>
> Thanks for your prompt reply!!
>
> The jira that you pointed out would make it easy for us to do the
> automatic mapping and getting close towards enforcing a policy
> automatically. Any idea when it would be incorporated into cdh/hadoop
> releases and if it could be back-ported for cdh3u2 which we have currently
> running in production?
>
> Currently we are getting around this using the -Dmapred.job.queue.name="X"
> and the subsequent mapping of map-red job queue to Fair-share scheduler
> pool. We are using ACLs [more of a white-list] by
> configuring  mapred-queue-acls.xml to ensure people can only submit to the
> right queue.
>
> *Two limitations of this round-about approach are*
>
>    1. It is manual
>    2. It exposes the policy where user A is asked to submit jobs to queue
>    X and user B is asked to submit jobs to queue Y [with different scheduler
>    properties]. We want this to be completely transparent to the user of our
>    cluster.
>
> The jira above would be a great first step towards such automatic mapping!!
>
> Cheers,
> Sagar
>
>
> On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Sagar,

I'm glad to hear that it would help.  Unfortunately, we are no longer
adding features to CDH3, so you would have to upgrade to CDH4 or backport
it yourself to use it.

-Sandy


On Fri, Apr 26, 2013 at 10:27 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Sandy,
>
> Thanks for your prompt reply!!
>
> The jira that you pointed out would make it easy for us to do the
> automatic mapping and getting close towards enforcing a policy
> automatically. Any idea when it would be incorporated into cdh/hadoop
> releases and if it could be back-ported for cdh3u2 which we have currently
> running in production?
>
> Currently we are getting around this using the -Dmapred.job.queue.name="X"
> and the subsequent mapping of map-red job queue to Fair-share scheduler
> pool. We are using ACLs [more of a white-list] by
> configuring  mapred-queue-acls.xml to ensure people can only submit to the
> right queue.
>
> *Two limitations of this round-about approach are*
>
>    1. It is manual
>    2. It exposes the policy where user A is asked to submit jobs to queue
>    X and user B is asked to submit jobs to queue Y [with different scheduler
>    properties]. We want this to be completely transparent to the user of our
>    cluster.
>
> The jira above would be a great first step towards such automatic mapping!!
>
> Cheers,
> Sagar
>
>
> On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Sagar,

I'm glad to hear that it would help.  Unfortunately, we are no longer
adding features to CDH3, so you would have to upgrade to CDH4 or backport
it yourself to use it.

-Sandy


On Fri, Apr 26, 2013 at 10:27 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Sandy,
>
> Thanks for your prompt reply!!
>
> The jira that you pointed out would make it easy for us to do the
> automatic mapping and getting close towards enforcing a policy
> automatically. Any idea when it would be incorporated into cdh/hadoop
> releases and if it could be back-ported for cdh3u2 which we have currently
> running in production?
>
> Currently we are getting around this using the -Dmapred.job.queue.name="X"
> and the subsequent mapping of map-red job queue to Fair-share scheduler
> pool. We are using ACLs [more of a white-list] by
> configuring  mapred-queue-acls.xml to ensure people can only submit to the
> right queue.
>
> *Two limitations of this round-about approach are*
>
>    1. It is manual
>    2. It exposes the policy where user A is asked to submit jobs to queue
>    X and user B is asked to submit jobs to queue Y [with different scheduler
>    properties]. We want this to be completely transparent to the user of our
>    cluster.
>
> The jira above would be a great first step towards such automatic mapping!!
>
> Cheers,
> Sagar
>
>
> On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>>
>> -Sandy
>>
>>
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com>wrote:
>>
>>> Hi Guys,
>>>
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>>
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>>
>>> *Here is what we want.*
>>>
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>>
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>>
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>>
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>>
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>>
>>> Any help/insights/pointers would be greatly appreciated.
>>>
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>>
>>>
>>>
>>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Sandy,

Thanks for your prompt reply!!

The jira that you pointed out would make it easy for us to do the automatic
mapping and getting close towards enforcing a policy automatically. Any
idea when it would be incorporated into cdh/hadoop releases and if it could
be back-ported for cdh3u2 which we have currently running in production?

Currently we are getting around this using the -Dmapred.job.queue.name="X"
and the subsequent mapping of map-red job queue to Fair-share scheduler
pool. We are using ACLs [more of a white-list] by
configuring  mapred-queue-acls.xml to ensure people can only submit to the
right queue.

*Two limitations of this round-about approach are*

   1. It is manual
   2. It exposes the policy where user A is asked to submit jobs to queue X
   and user B is asked to submit jobs to queue Y [with different scheduler
   properties]. We want this to be completely transparent to the user of our
   cluster.

The jira above would be a great first step towards such automatic mapping!!

Cheers,
Sagar

On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Nitin Pawar <ni...@gmail.com>.

the current capacity scheduler guarantees that which users can submit jobs
to which queue and other related features.
More of which you can read at
http://hadoop.apache.org/docs/stable/capacity_scheduler.html

but on the hive side, unless you set mapred.job.queue.name on the hive cli,
they will be submitted to default job queue.

So basically what you would like to do is create user, associate it with a
queue on scheduler and ask the user to modify its queue on local hiverc
file.

I am not sure if this can be part of hive's metastore. Because one user can
be allowed to submit the job to multiple queues and then best way to handle
it is via setting the property each time you open the session or via hiverc
file


On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>


-- 
Nitin Pawar

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Sandy,

Thanks for your prompt reply!!

The jira that you pointed out would make it easy for us to do the automatic
mapping and getting close towards enforcing a policy automatically. Any
idea when it would be incorporated into cdh/hadoop releases and if it could
be back-ported for cdh3u2 which we have currently running in production?

Currently we are getting around this using the -Dmapred.job.queue.name="X"
and the subsequent mapping of map-red job queue to Fair-share scheduler
pool. We are using ACLs [more of a white-list] by
configuring  mapred-queue-acls.xml to ensure people can only submit to the
right queue.

*Two limitations of this round-about approach are*

   1. It is manual
   2. It exposes the policy where user A is asked to submit jobs to queue X
   and user B is asked to submit jobs to queue Y [with different scheduler
   properties]. We want this to be completely transparent to the user of our
   cluster.

The jira above would be a great first step towards such automatic mapping!!

Cheers,
Sagar

On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Sandy,

Thanks for your prompt reply!!

The jira that you pointed out would make it easy for us to do the automatic
mapping and getting close towards enforcing a policy automatically. Any
idea when it would be incorporated into cdh/hadoop releases and if it could
be back-ported for cdh3u2 which we have currently running in production?

Currently we are getting around this using the -Dmapred.job.queue.name="X"
and the subsequent mapping of map-red job queue to Fair-share scheduler
pool. We are using ACLs [more of a white-list] by
configuring  mapred-queue-acls.xml to ensure people can only submit to the
right queue.

*Two limitations of this round-about approach are*

   1. It is manual
   2. It exposes the policy where user A is asked to submit jobs to queue X
   and user B is asked to submit jobs to queue Y [with different scheduler
   properties]. We want this to be completely transparent to the user of our
   cluster.

The jira above would be a great first step towards such automatic mapping!!

Cheers,
Sagar

On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Nitin Pawar <ni...@gmail.com>.

the current capacity scheduler guarantees that which users can submit jobs
to which queue and other related features.
More of which you can read at
http://hadoop.apache.org/docs/stable/capacity_scheduler.html

but on the hive side, unless you set mapred.job.queue.name on the hive cli,
they will be submitted to default job queue.

So basically what you would like to do is create user, associate it with a
queue on scheduler and ask the user to modify its queue on local hiverc
file.

I am not sure if this can be part of hive's metastore. Because one user can
be allowed to submit the job to multiple queues and then best way to handle
it is via setting the property each time you open the session or via hiverc
file


On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>


-- 
Nitin Pawar

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sagar Mehta <sa...@gmail.com>.

Hi Sandy,

Thanks for your prompt reply!!

The jira that you pointed out would make it easy for us to do the automatic
mapping and getting close towards enforcing a policy automatically. Any
idea when it would be incorporated into cdh/hadoop releases and if it could
be back-ported for cdh3u2 which we have currently running in production?

Currently we are getting around this using the -Dmapred.job.queue.name="X"
and the subsequent mapping of map-red job queue to Fair-share scheduler
pool. We are using ACLs [more of a white-list] by
configuring  mapred-queue-acls.xml to ensure people can only submit to the
right queue.

*Two limitations of this round-about approach are*

   1. It is manual
   2. It exposes the policy where user A is asked to submit jobs to queue X
   and user B is asked to submit jobs to queue Y [with different scheduler
   properties]. We want this to be completely transparent to the user of our
   cluster.

The jira above would be a great first step towards such automatic mapping!!

Cheers,
Sagar

On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sagar,

This capability currently does not exist in the fair scheduler (or other
schedulers, as far as I know), but a JIRA has been filed recently that
addresses a similar need.   Would
https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
trying to do?  If not, would you mind filing a new JIRA for the
functionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The 'standard' way to do this is using queu-acls to enforce a particular user to be able to submit jobs to a sub-set of queues and then let the user decide which of that subset of queues he wishes to submit a job to.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:

> Hi Guys,
> 
> We have a general purpose Hive cluster [about 200 nodes] which is used for various jobs like
> Production
> Experimental/Research
> Adhoc queries
> We are using the fair-share scheduler to schedule them and for this we have corresponding 3 pools in the scheduler.
> 
> Here is what we want.
> 
> A hive query submitted by a user with user-name A should go to one of the pools above based on a pre-defined mapping. We are wondering where/how to specify this mapping?
> 
> We can do this manually by adding -Dmapred.job.queue.name="X" on a particular job run.
> 
> This puts the job on the map-reduce queue named "X" and the following configuration in the fair-share scheduler
> 
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
> 
> maps this to a pool named "X" in the fair-share scheduler.
> 
> However we [while wearing our Hadoop developer/admin hat] don't want the user/analyst to specify that so as to enforce some cluster-use policy.
> 
> Based on his/her username we want to automatically select which hadoop queue and subsequently which fair-share scheduler pool, his/her job should go to. I'm pretty sure this is a common use-case and wondering how to do this in Hadoop. 
> 
> Any help/insights/pointers would be greatly appreciated.
> 
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
> 
> 
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The 'standard' way to do this is using queu-acls to enforce a particular user to be able to submit jobs to a sub-set of queues and then let the user decide which of that subset of queues he wishes to submit a job to.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:

> Hi Guys,
> 
> We have a general purpose Hive cluster [about 200 nodes] which is used for various jobs like
> Production
> Experimental/Research
> Adhoc queries
> We are using the fair-share scheduler to schedule them and for this we have corresponding 3 pools in the scheduler.
> 
> Here is what we want.
> 
> A hive query submitted by a user with user-name A should go to one of the pools above based on a pre-defined mapping. We are wondering where/how to specify this mapping?
> 
> We can do this manually by adding -Dmapred.job.queue.name="X" on a particular job run.
> 
> This puts the job on the map-reduce queue named "X" and the following configuration in the fair-share scheduler
> 
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
> 
> maps this to a pool named "X" in the fair-share scheduler.
> 
> However we [while wearing our Hadoop developer/admin hat] don't want the user/analyst to specify that so as to enforce some cluster-use policy.
> 
> Based on his/her username we want to automatically select which hadoop queue and subsequently which fair-share scheduler pool, his/her job should go to. I'm pretty sure this is a common use-case and wondering how to do this in Hadoop. 
> 
> Any help/insights/pointers would be greatly appreciated.
> 
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
> 
> 
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sagar,

This capability currently does not exist in the fair scheduler (or other
schedulers, as far as I know), but a JIRA has been filed recently that
addresses a similar need.   Would
https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
trying to do?  If not, would you mind filing a new JIRA for the
functionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sagar,

This capability currently does not exist in the fair scheduler (or other
schedulers, as far as I know), but a JIRA has been filed recently that
addresses a similar need.   Would
https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
trying to do?  If not, would you mind filing a new JIRA for the
functionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sagar,

This capability currently does not exist in the fair scheduler (or other
schedulers, as far as I know), but a JIRA has been filed recently that
addresses a similar need.   Would
https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
trying to do?  If not, would you mind filing a new JIRA for the
functionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The 'standard' way to do this is using queu-acls to enforce a particular user to be able to submit jobs to a sub-set of queues and then let the user decide which of that subset of queues he wishes to submit a job to.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:

> Hi Guys,
> 
> We have a general purpose Hive cluster [about 200 nodes] which is used for various jobs like
> Production
> Experimental/Research
> Adhoc queries
> We are using the fair-share scheduler to schedule them and for this we have corresponding 3 pools in the scheduler.
> 
> Here is what we want.
> 
> A hive query submitted by a user with user-name A should go to one of the pools above based on a pre-defined mapping. We are wondering where/how to specify this mapping?
> 
> We can do this manually by adding -Dmapred.job.queue.name="X" on a particular job run.
> 
> This puts the job on the map-reduce queue named "X" and the following configuration in the fair-share scheduler
> 
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
> 
> maps this to a pool named "X" in the fair-share scheduler.
> 
> However we [while wearing our Hadoop developer/admin hat] don't want the user/analyst to specify that so as to enforce some cluster-use policy.
> 
> Based on his/her username we want to automatically select which hadoop queue and subsequently which fair-share scheduler pool, his/her job should go to. I'm pretty sure this is a common use-case and wondering how to do this in Hadoop. 
> 
> Any help/insights/pointers would be greatly appreciated.
> 
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
> 
> 
>