You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jan Lukavský <ja...@firma.seznam.cz> on 2012/08/23 11:25:01 UTC

Running map tasks after all reduces have finished

Hi all,

we are seeing strange behaviour of JobTracker in the following scenario:
  - job finishes map phase and starts reduce
  - after the shuffle phase of all reducers we loose a tasktracker, that 
doesn't run any reducer - so all remaining reducers are still running in 
the reduce phase
  - map tasks that were running on the lost tasktracker are rescheduled
  - reduces may finish earlier than the rescheduled map tasks and so the 
job is blocked waiting for the maps to finish, although their output is 
simple discarded

Is this behaviour a bug or feature? :) I haven't found any JIRA that 
would describe it, if there exists one can anyone point me out?

Thanks,
  Jan


Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.

Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?

On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>>   - job finishes map phase and starts reduce
>>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>>   - map tasks that were running on the lost tasktracker are rescheduled
>>>   - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>>   Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.

Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?

On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>>   - job finishes map phase and starts reduce
>>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>>   - map tasks that were running on the lost tasktracker are rescheduled
>>>   - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>>   Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.

Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?

On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>>   - job finishes map phase and starts reduce
>>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>>   - map tasks that were running on the lost tasktracker are rescheduled
>>>   - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>>   Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.

Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?

On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>>   - job finishes map phase and starts reduce
>>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>>   - map tasks that were running on the lost tasktracker are rescheduled
>>>   - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>>   Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,

sorry I forgot to mention. We are using cdh3u3.

Jan

On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>>   - job finishes map phase and starts reduce
>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>>   - map tasks that were running on the lost tasktracker are rescheduled
>>   - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>>   Jan
>>
>
>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz


Re: Running map tasks after all reduces have finished

Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,

sorry I forgot to mention. We are using cdh3u3.

Jan

On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>>   - job finishes map phase and starts reduce
>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>>   - map tasks that were running on the lost tasktracker are rescheduled
>>   - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>>   Jan
>>
>
>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz


Re: Running map tasks after all reduces have finished

Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,

sorry I forgot to mention. We are using cdh3u3.

Jan

On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>>   - job finishes map phase and starts reduce
>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>>   - map tasks that were running on the lost tasktracker are rescheduled
>>   - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>>   Jan
>>
>
>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz


Re: Running map tasks after all reduces have finished

Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,

sorry I forgot to mention. We are using cdh3u3.

Jan

On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>>   - job finishes map phase and starts reduce
>>   - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>>   - map tasks that were running on the lost tasktracker are rescheduled
>>   - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>>   Jan
>>
>
>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz


Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,

What version/distribution of Hadoop are you noticing this on?

On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
>  - job finishes map phase and starts reduce
>  - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
>  - map tasks that were running on the lost tasktracker are rescheduled
>  - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
>  Jan
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,

What version/distribution of Hadoop are you noticing this on?

On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
>  - job finishes map phase and starts reduce
>  - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
>  - map tasks that were running on the lost tasktracker are rescheduled
>  - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
>  Jan
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,

What version/distribution of Hadoop are you noticing this on?

On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
>  - job finishes map phase and starts reduce
>  - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
>  - map tasks that were running on the lost tasktracker are rescheduled
>  - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
>  Jan
>



-- 
Harsh J

Re: Running map tasks after all reduces have finished

Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,

What version/distribution of Hadoop are you noticing this on?

On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
>  - job finishes map phase and starts reduce
>  - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
>  - map tasks that were running on the lost tasktracker are rescheduled
>  - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
>  Jan
>



-- 
Harsh J