You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pierre ANCELOT <pi...@gmail.com> on 2010/06/30 12:28:03 UTC

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Hi,
Okay, so, if I set the 20 by default, I could maybe limit the number of
concurrent maps per node instead?
job.setNumReduceTasks exists but I see no equivalent for maps, though I
think there was a setNumMapTasks before...
Was it removed? Why?
Any idea about how to acheive this?

Thank you.


On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Hi Pierre,
>
> "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
> cannot be set per job. It is loaded only while bringing up the TaskTracker.
>
> Thanks
> Amareshwari
>
> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
>
> Hi everyone :)
> There's something I'm probably doing wrong but I can't seem to figure out
> what.
> I have two hadoop programs running one after the other.
> This is done because they don't have the same needs in term of processor in
> memory, so by separating them I optimize each task better.
> Fact is, I need for the first job on every node
> mapred.tasktracker.map.tasks.maximum set to 12.
> For the second task, I need it to be set to 20.
> so by default I set it to 12 and in the second job's code, I set this:
>
>        Configuration hadoopConfiguration = new Configuration();
>        hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
> 20);
>
> But when running the job, instead of having the 20 tasks on each node as
> expected, I have 12....
> Any idea please?
>
> Thank you.
> Pierre.
>
>
> --
> http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je protège"
> "I am what I protect"
>
>


-- 
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"

Re: Hadoop and SGE

Posted by "Hungsheng Tsao, Ph. D phone" <Hu...@oracle.com>.

Plwasw chwck thw docs die  oge6.2u 5
  or 6
There is integration of oge and hadoop

------- Original message -------
> From: Dmitry Pushkarev <um...@stanford.edu>
> To: common-user@hadoop.apache.org
> Sent: 30.6.'10,  6:59
>
> Dear Hadoop users,
>
> I'm in the process of building a new cluster for our lab and I'm trying 
> to
> run SGE simultaneously with hadoop. Idea is that each node would function 
> as
> datanode at all times, but depending on situation and a fraction of nodes
> will run SGE instead of plain. SGE jobs will not have access to HDFS or
> local filesystem (except for /tmp) and will run out of external NAS, they
> aren't supposed to be IO bound.
>
> I'm trying to figure out of what's the best way to setup this resource
> sharing. One way would be to shutdown tasktrackers on reserved nodes and 
> add
> them to SGE pool. Another way is run tasktrackers as SGE jobs and each
> tasktracker would shut down after some idle time.
>
> Has anyone tried something like this? I'd appreciate any advice.
>
> Thanks.
>
>

Hadoop and SGE

Posted by Dmitry Pushkarev <um...@stanford.edu>.

Dear Hadoop users,

I'm in the process of building a new cluster for our lab and I'm trying to
run SGE simultaneously with hadoop. Idea is that each node would function as
datanode at all times, but depending on situation and a fraction of nodes
will run SGE instead of plain. SGE jobs will not have access to HDFS or
local filesystem (except for /tmp) and will run out of external NAS, they
aren't supposed to be IO bound.  

I'm trying to figure out of what's the best way to setup this resource
sharing. One way would be to shutdown tasktrackers on reserved nodes and add
them to SGE pool. Another way is run tasktrackers as SGE jobs and each
tasktracker would shut down after some idle time. 

Has anyone tried something like this? I'd appreciate any advice.

Thanks.

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Ken Goodhope <ke...@gmail.com>.

What you want to do can be accomplished in the scheduler. Take a look
at the fair scheduler, specifically the user extensible options. There
you will find the ability to add some extra logic for deciding if a
task can be launched on a per job basis. Could be as simple as
deciding a particular job can't launch more than 12 tasks at a time.

Capacity scheduler might be able to do this too, but I'm not sure.

On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com> wrote:
> ok, well, thanks...
> I truely hoped a solution would exist for this.
> Thanks.
>
> Pierre.
>
> On Wed, Jun 30, 2010 at 3:56 PM, Yu Li <ca...@gmail.com> wrote:
>
>> Hi Pierre,
>>
>> The "setNumReduceTasks" method is for setting the number of reduce tasks to
>> launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
>> "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
>> tasks running *concurrently* on one node.
>> And as Amareshwari mentioned, the
>> "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
>> which could not be set per job. If you set
>> mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
>> tasks is larger than 20*<nodes number>, there would be 20 map tasks running
>> concurrently on a node. As I know, you probably need to restart the
>> tasktracker if you truely need to change the configuration.
>>
>> Best Regards,
>> Carp
>>
>> 2010/6/30 Pierre ANCELOT <pi...@gmail.com>
>>
>> > Sure, but not the number of tasks running concurrently on a node at the
>> > same
>> > time.
>> >
>> >
>> >
>> > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > The number of map tasks is determined by InputSplit.
>> > >
>> > > On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com>
>> wrote:
>> > > > Hi,
>> > > > Okay, so, if I set the 20 by default, I could maybe limit the number
>> of
>> > > > concurrent maps per node instead?
>> > > > job.setNumReduceTasks exists but I see no equivalent for maps, though
>> I
>> > > > think there was a setNumMapTasks before...
>> > > > Was it removed? Why?
>> > > > Any idea about how to acheive this?
>> > > >
>> > > > Thank you.
>> > > >
>> > > >
>> > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
>> > > > amarsri@yahoo-inc.com> wrote:
>> > > >
>> > > >> Hi Pierre,
>> > > >>
>> > > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level
>> > configuration,
>> > > >> cannot be set per job. It is loaded only while bringing up the
>> > > TaskTracker.
>> > > >>
>> > > >> Thanks
>> > > >> Amareshwari
>> > > >>
>> > > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
>> > > >>
>> > > >> Hi everyone :)
>> > > >> There's something I'm probably doing wrong but I can't seem to
>> figure
>> > > out
>> > > >> what.
>> > > >> I have two hadoop programs running one after the other.
>> > > >> This is done because they don't have the same needs in term of
>> > processor
>> > > in
>> > > >> memory, so by separating them I optimize each task better.
>> > > >> Fact is, I need for the first job on every node
>> > > >> mapred.tasktracker.map.tasks.maximum set to 12.
>> > > >> For the second task, I need it to be set to 20.
>> > > >> so by default I set it to 12 and in the second job's code, I set
>> this:
>> > > >>
>> > > >>        Configuration hadoopConfiguration = new Configuration();
>> > > >>
>> > >  hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
>> > > >> 20);
>> > > >>
>> > > >> But when running the job, instead of having the 20 tasks on each
>> node
>> > as
>> > > >> expected, I have 12....
>> > > >> Any idea please?
>> > > >>
>> > > >> Thank you.
>> > > >> Pierre.
>> > > >>
>> > --
> http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je protège"
> "I am what I protect"
>

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Pierre ANCELOT <pi...@gmail.com>.

ok, well, thanks...
I truely hoped a solution would exist for this.
Thanks.

Pierre.

On Wed, Jun 30, 2010 at 3:56 PM, Yu Li <ca...@gmail.com> wrote:

> Hi Pierre,
>
> The "setNumReduceTasks" method is for setting the number of reduce tasks to
> launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
> "mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
> tasks running *concurrently* on one node.
> And as Amareshwari mentioned, the
> "mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
> which could not be set per job. If you set
> mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
> tasks is larger than 20*<nodes number>, there would be 20 map tasks running
> concurrently on a node. As I know, you probably need to restart the
> tasktracker if you truely need to change the configuration.
>
> Best Regards,
> Carp
>
> 2010/6/30 Pierre ANCELOT <pi...@gmail.com>
>
> > Sure, but not the number of tasks running concurrently on a node at the
> > same
> > time.
> >
> >
> >
> > On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > The number of map tasks is determined by InputSplit.
> > >
> > > On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com>
> wrote:
> > > > Hi,
> > > > Okay, so, if I set the 20 by default, I could maybe limit the number
> of
> > > > concurrent maps per node instead?
> > > > job.setNumReduceTasks exists but I see no equivalent for maps, though
> I
> > > > think there was a setNumMapTasks before...
> > > > Was it removed? Why?
> > > > Any idea about how to acheive this?
> > > >
> > > > Thank you.
> > > >
> > > >
> > > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
> > > > amarsri@yahoo-inc.com> wrote:
> > > >
> > > >> Hi Pierre,
> > > >>
> > > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level
> > configuration,
> > > >> cannot be set per job. It is loaded only while bringing up the
> > > TaskTracker.
> > > >>
> > > >> Thanks
> > > >> Amareshwari
> > > >>
> > > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
> > > >>
> > > >> Hi everyone :)
> > > >> There's something I'm probably doing wrong but I can't seem to
> figure
> > > out
> > > >> what.
> > > >> I have two hadoop programs running one after the other.
> > > >> This is done because they don't have the same needs in term of
> > processor
> > > in
> > > >> memory, so by separating them I optimize each task better.
> > > >> Fact is, I need for the first job on every node
> > > >> mapred.tasktracker.map.tasks.maximum set to 12.
> > > >> For the second task, I need it to be set to 20.
> > > >> so by default I set it to 12 and in the second job's code, I set
> this:
> > > >>
> > > >>        Configuration hadoopConfiguration = new Configuration();
> > > >>
> > >  hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
> > > >> 20);
> > > >>
> > > >> But when running the job, instead of having the 20 tasks on each
> node
> > as
> > > >> expected, I have 12....
> > > >> Any idea please?
> > > >>
> > > >> Thank you.
> > > >> Pierre.
> > > >>
> > > >>
> > > >> --
> > > >> http://www.neko-consulting.com
> > > >> Ego sum quis ego servo
> > > >> "Je suis ce que je protège"
> > > >> "I am what I protect"
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > http://www.neko-consulting.com
> > > > Ego sum quis ego servo
> > > > "Je suis ce que je protège"
> > > > "I am what I protect"
> > > >
> > >
> >
> >
> >
> > --
> >  http://www.neko-consulting.com
> > Ego sum quis ego servo
> > "Je suis ce que je protège"
> > "I am what I protect"
> >
>



-- 
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Yu Li <ca...@gmail.com>.

Hi Pierre,

The "setNumReduceTasks" method is for setting the number of reduce tasks to
launch, it's equal to set the "mapred.reduce.tasks" parameter, while the
"mapred.tasktracker.reduce.tasks.maximum" parameter decides the number of
tasks running *concurrently* on one node.
And as Amareshwari mentioned, the
"mapred.tasktracker.map/reduce.tasks.maximum" is a cluster configuration
which could not be set per job. If you set
mapred.tasktracker.map.tasks.maximum to 20, and the overall number of map
tasks is larger than 20*<nodes number>, there would be 20 map tasks running
concurrently on a node. As I know, you probably need to restart the
tasktracker if you truely need to change the configuration.

Best Regards,
Carp

2010/6/30 Pierre ANCELOT <pi...@gmail.com>

> Sure, but not the number of tasks running concurrently on a node at the
> same
> time.
>
>
>
> On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > The number of map tasks is determined by InputSplit.
> >
> > On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com> wrote:
> > > Hi,
> > > Okay, so, if I set the 20 by default, I could maybe limit the number of
> > > concurrent maps per node instead?
> > > job.setNumReduceTasks exists but I see no equivalent for maps, though I
> > > think there was a setNumMapTasks before...
> > > Was it removed? Why?
> > > Any idea about how to acheive this?
> > >
> > > Thank you.
> > >
> > >
> > > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
> > > amarsri@yahoo-inc.com> wrote:
> > >
> > >> Hi Pierre,
> > >>
> > >> "mapred.tasktracker.map.tasks.maximum" is a cluster level
> configuration,
> > >> cannot be set per job. It is loaded only while bringing up the
> > TaskTracker.
> > >>
> > >> Thanks
> > >> Amareshwari
> > >>
> > >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
> > >>
> > >> Hi everyone :)
> > >> There's something I'm probably doing wrong but I can't seem to figure
> > out
> > >> what.
> > >> I have two hadoop programs running one after the other.
> > >> This is done because they don't have the same needs in term of
> processor
> > in
> > >> memory, so by separating them I optimize each task better.
> > >> Fact is, I need for the first job on every node
> > >> mapred.tasktracker.map.tasks.maximum set to 12.
> > >> For the second task, I need it to be set to 20.
> > >> so by default I set it to 12 and in the second job's code, I set this:
> > >>
> > >>        Configuration hadoopConfiguration = new Configuration();
> > >>
> >  hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
> > >> 20);
> > >>
> > >> But when running the job, instead of having the 20 tasks on each node
> as
> > >> expected, I have 12....
> > >> Any idea please?
> > >>
> > >> Thank you.
> > >> Pierre.
> > >>
> > >>
> > >> --
> > >> http://www.neko-consulting.com
> > >> Ego sum quis ego servo
> > >> "Je suis ce que je protège"
> > >> "I am what I protect"
> > >>
> > >>
> > >
> > >
> > > --
> > > http://www.neko-consulting.com
> > > Ego sum quis ego servo
> > > "Je suis ce que je protège"
> > > "I am what I protect"
> > >
> >
>
>
>
> --
>  http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je protège"
> "I am what I protect"
>

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

CapacityScheduler has a feature called 'High RAM Jobs' where-in you  
can specify, for a given job, that a single map/reduce task needs more  
than 1 slot. Thus you could consume all the map/reduce slots on a  
given TT for a single task of your job. This should suffice.

Arun

On Jun 30, 2010, at 5:09 AM, Pierre ANCELOT wrote:

> Sure, but not the number of tasks running concurrently on a node at  
> the same
> time.
>
>
>
> On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> The number of map tasks is determined by InputSplit.
>>
>> On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com>  
>> wrote:
>>> Hi,
>>> Okay, so, if I set the 20 by default, I could maybe limit the  
>>> number of
>>> concurrent maps per node instead?
>>> job.setNumReduceTasks exists but I see no equivalent for maps,  
>>> though I
>>> think there was a setNumMapTasks before...
>>> Was it removed? Why?
>>> Any idea about how to acheive this?
>>>
>>> Thank you.
>>>
>>>
>>> On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>>
>>>> Hi Pierre,
>>>>
>>>> "mapred.tasktracker.map.tasks.maximum" is a cluster level  
>>>> configuration,
>>>> cannot be set per job. It is loaded only while bringing up the
>> TaskTracker.
>>>>
>>>> Thanks
>>>> Amareshwari
>>>>
>>>> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
>>>>
>>>> Hi everyone :)
>>>> There's something I'm probably doing wrong but I can't seem to  
>>>> figure
>> out
>>>> what.
>>>> I have two hadoop programs running one after the other.
>>>> This is done because they don't have the same needs in term of  
>>>> processor
>> in
>>>> memory, so by separating them I optimize each task better.
>>>> Fact is, I need for the first job on every node
>>>> mapred.tasktracker.map.tasks.maximum set to 12.
>>>> For the second task, I need it to be set to 20.
>>>> so by default I set it to 12 and in the second job's code, I set  
>>>> this:
>>>>
>>>>       Configuration hadoopConfiguration = new Configuration();
>>>>
>> hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
>>>> 20);
>>>>
>>>> But when running the job, instead of having the 20 tasks on each  
>>>> node as
>>>> expected, I have 12....
>>>> Any idea please?
>>>>
>>>> Thank you.
>>>> Pierre.
>>>>
>>>>
>>>> --
>>>> http://www.neko-consulting.com
>>>> Ego sum quis ego servo
>>>> "Je suis ce que je protège"
>>>> "I am what I protect"
>>>>
>>>>
>>>
>>>
>>> --
>>> http://www.neko-consulting.com
>>> Ego sum quis ego servo
>>> "Je suis ce que je protège"
>>> "I am what I protect"
>>>
>>
>
>
>
> -- 
> http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je protège"
> "I am what I protect"

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Pierre ANCELOT <pi...@gmail.com>.

Sure, but not the number of tasks running concurrently on a node at the same
time.



On Wed, Jun 30, 2010 at 1:57 PM, Ted Yu <yu...@gmail.com> wrote:

> The number of map tasks is determined by InputSplit.
>
> On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com> wrote:
> > Hi,
> > Okay, so, if I set the 20 by default, I could maybe limit the number of
> > concurrent maps per node instead?
> > job.setNumReduceTasks exists but I see no equivalent for maps, though I
> > think there was a setNumMapTasks before...
> > Was it removed? Why?
> > Any idea about how to acheive this?
> >
> > Thank you.
> >
> >
> > On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Hi Pierre,
> >>
> >> "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
> >> cannot be set per job. It is loaded only while bringing up the
> TaskTracker.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
> >>
> >> Hi everyone :)
> >> There's something I'm probably doing wrong but I can't seem to figure
> out
> >> what.
> >> I have two hadoop programs running one after the other.
> >> This is done because they don't have the same needs in term of processor
> in
> >> memory, so by separating them I optimize each task better.
> >> Fact is, I need for the first job on every node
> >> mapred.tasktracker.map.tasks.maximum set to 12.
> >> For the second task, I need it to be set to 20.
> >> so by default I set it to 12 and in the second job's code, I set this:
> >>
> >>        Configuration hadoopConfiguration = new Configuration();
> >>
>  hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
> >> 20);
> >>
> >> But when running the job, instead of having the 20 tasks on each node as
> >> expected, I have 12....
> >> Any idea please?
> >>
> >> Thank you.
> >> Pierre.
> >>
> >>
> >> --
> >> http://www.neko-consulting.com
> >> Ego sum quis ego servo
> >> "Je suis ce que je protège"
> >> "I am what I protect"
> >>
> >>
> >
> >
> > --
> > http://www.neko-consulting.com
> > Ego sum quis ego servo
> > "Je suis ce que je protège"
> > "I am what I protect"
> >
>



-- 
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

Posted by Ted Yu <yu...@gmail.com>.

The number of map tasks is determined by InputSplit.

On Wednesday, June 30, 2010, Pierre ANCELOT <pi...@gmail.com> wrote:
> Hi,
> Okay, so, if I set the 20 by default, I could maybe limit the number of
> concurrent maps per node instead?
> job.setNumReduceTasks exists but I see no equivalent for maps, though I
> think there was a setNumMapTasks before...
> Was it removed? Why?
> Any idea about how to acheive this?
>
> Thank you.
>
>
> On Wed, Jun 30, 2010 at 12:08 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>> Hi Pierre,
>>
>> "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration,
>> cannot be set per job. It is loaded only while bringing up the TaskTracker.
>>
>> Thanks
>> Amareshwari
>>
>> On 6/30/10 3:05 PM, "Pierre ANCELOT" <pi...@gmail.com> wrote:
>>
>> Hi everyone :)
>> There's something I'm probably doing wrong but I can't seem to figure out
>> what.
>> I have two hadoop programs running one after the other.
>> This is done because they don't have the same needs in term of processor in
>> memory, so by separating them I optimize each task better.
>> Fact is, I need for the first job on every node
>> mapred.tasktracker.map.tasks.maximum set to 12.
>> For the second task, I need it to be set to 20.
>> so by default I set it to 12 and in the second job's code, I set this:
>>
>>        Configuration hadoopConfiguration = new Configuration();
>>        hadoopConfiguration.setInt("mapred.tasktracker.map.tasks.maximum",
>> 20);
>>
>> But when running the job, instead of having the 20 tasks on each node as
>> expected, I have 12....
>> Any idea please?
>>
>> Thank you.
>> Pierre.
>>
>>
>> --
>> http://www.neko-consulting.com
>> Ego sum quis ego servo
>> "Je suis ce que je protège"
>> "I am what I protect"
>>
>>
>
>
> --
> http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je protège"
> "I am what I protect"
>