You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jiang licht <li...@yahoo.com> on 2010/06/29 00:03:37 UTC

specify which pool in fair scheduler to submit a pig job? Re: pig job priority control

In hadoop, I can specify job clients to use fair scheduler

<property>

 
       
  <name>mapred.jobtracker.taskScheduler</name>


       
  <value>org.apache.hadoop.mapred.FairScheduler</value>


       
</property>

and also I want to define different pools in xml file specified for "mapred.fairscheduler.allocation.file".

But how to tell a job to submit to different pools initially (according to fair scheduler documentation, a job can be rescheduled to other pools at run time via web ui)? Especially interested in this for submitting PIG jobs (what is the property name by using -D switch).

Thanks,

--Michael

--- On Mon, 6/28/10, jiang licht <li...@yahoo.com> wrote:

From: jiang licht <li...@yahoo.com>
Subject: Re: pig job priority control
To: pig-user@hadoop.apache.org
Date: Monday, June 28, 2010, 4:23 PM

Thanks, Ashutosh. This helps! so I believe other job configuration settings can also be done in this way to affect where and how a pig job is executed in a cluster ...


--Michael

--- On Mon, 6/28/10, Ashutosh Chauhan <as...@gmail.com> wrote:

From: Ashutosh Chauhan <as...@gmail.com>
Subject: Re: pig job priority control
To: pig-user@hadoop.apache.org
Date: Monday, June 28, 2010, 3:46 PM

On trunk:
set mapred.job.queue.name myFastLaneQueue ;

either on grunt or in script.

In 0.7 or earlier through -D switch at command line at the time of
invocation of pig:
-Dmapred.job.queue.name=myFastLaneQueue

Hope it helps,
Ashutosh

On Mon, Jun 28, 2010 at 13:29, jiang licht <li...@yahoo.com> wrote:
> Thanks, Jeff. How to submit a pig job to a queue used by these schedulers? I think a more general question is how to specify a customized job configuration for a pig job (in addition to specifying which queue to submit this job).
>
> Thanks,
>
> Michael
>
> --- On Mon, 6/21/10, Jeff Zhang <zj...@gmail.com> wrote:
>
> From: Jeff Zhang <zj...@gmail.com>
> Subject: Re: pig job priority control
> To: pig-user@hadoop.apache.org
> Date: Monday, June 21, 2010, 8:14 PM
>
> You can also change the task scheduler of hadoop .
> Please refer http://hadoop.apache.org/common/docs/r0.20.0/fair_scheduler.html
> http://hadoop.apache.org/common/docs/r0.20.0/capacity_scheduler.html
>
>
> On Tue, Jun 22, 2010 at 12:57 AM, jiang licht <li...@yahoo.com> wrote:
>> What is the best way to manage multiple pig jobs such that they can get chance to run simultaneously? W/o priority control, some job will block other jobs (a small job with e.g. a mapper and a reducer will have to wait for its turn). This is bad. For example, some job with big number of mappers (the number is bigger than the maximum number mappers that are allowed to run in the cluster) will consume all resources and other jobs submitted later will have to wait to load their mappers until it is done.
>>
>> Thanks!
>>
>>
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>
>