You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jan Lukavský <ja...@firma.seznam.cz> on 2012/08/23 11:25:01 UTC
Running map tasks after all reduces have finished
Hi all,
we are seeing strange behaviour of JobTracker in the following scenario:
- job finishes map phase and starts reduce
- after the shuffle phase of all reducers we loose a tasktracker, that
doesn't run any reducer - so all remaining reducers are still running in
the reduce phase
- map tasks that were running on the lost tasktracker are rescheduled
- reduces may finish earlier than the rescheduled map tasks and so the
job is blocked waiting for the maps to finish, although their output is
simple discarded
Is this behaviour a bug or feature? :) I haven't found any JIRA that
would describe it, if there exists one can anyone point me out?
Thanks,
Jan
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.
Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?
On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>> - job finishes map phase and starts reduce
>>> - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>> - map tasks that were running on the lost tasktracker are rescheduled
>>> - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>> Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.
Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?
On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>> - job finishes map phase and starts reduce
>>> - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>> - map tasks that were running on the lost tasktracker are rescheduled
>>> - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>> Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.
Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?
On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>> - job finishes map phase and starts reduce
>>> - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>> - map tasks that were running on the lost tasktracker are rescheduled
>>> - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>> Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Thanks Jan. I'm moving this to cdh-user@cloudera.org
(http://groups.google.com/a/cloudera.org/forum/?fromgroups#!forum/cdh-user)
since it may be CDH3-specific.
Can you share your JobTracker log and a Job ID (That exhibited this
behavior) we can track?
On Thu, Aug 23, 2012 at 4:15 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi,
>
> sorry I forgot to mention. We are using cdh3u3.
>
> Jan
>
>
> On 23.8.2012 12:08, Harsh J wrote:
>>
>> Hey Jan,
>>
>> What version/distribution of Hadoop are you noticing this on?
>>
>> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
>> <ja...@firma.seznam.cz> wrote:
>>>
>>> Hi all,
>>>
>>> we are seeing strange behaviour of JobTracker in the following scenario:
>>> - job finishes map phase and starts reduce
>>> - after the shuffle phase of all reducers we loose a tasktracker, that
>>> doesn't run any reducer - so all remaining reducers are still running in
>>> the
>>> reduce phase
>>> - map tasks that were running on the lost tasktracker are rescheduled
>>> - reduces may finish earlier than the rescheduled map tasks and so the
>>> job
>>> is blocked waiting for the maps to finish, although their output is
>>> simple
>>> discarded
>>>
>>> Is this behaviour a bug or feature? :) I haven't found any JIRA that
>>> would
>>> describe it, if there exists one can anyone point me out?
>>>
>>> Thanks,
>>> Jan
>>>
>>
>>
>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,
sorry I forgot to mention. We are using cdh3u3.
Jan
On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>> - job finishes map phase and starts reduce
>> - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>> - map tasks that were running on the lost tasktracker are rescheduled
>> - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>> Jan
>>
>
>
--
Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5
jan.lukavsky@firma.seznam.cz
http://www.seznam.cz
Re: Running map tasks after all reduces have finished
Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,
sorry I forgot to mention. We are using cdh3u3.
Jan
On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>> - job finishes map phase and starts reduce
>> - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>> - map tasks that were running on the lost tasktracker are rescheduled
>> - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>> Jan
>>
>
>
--
Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5
jan.lukavsky@firma.seznam.cz
http://www.seznam.cz
Re: Running map tasks after all reduces have finished
Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,
sorry I forgot to mention. We are using cdh3u3.
Jan
On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>> - job finishes map phase and starts reduce
>> - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>> - map tasks that were running on the lost tasktracker are rescheduled
>> - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>> Jan
>>
>
>
--
Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5
jan.lukavsky@firma.seznam.cz
http://www.seznam.cz
Re: Running map tasks after all reduces have finished
Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,
sorry I forgot to mention. We are using cdh3u3.
Jan
On 23.8.2012 12:08, Harsh J wrote:
> Hey Jan,
>
> What version/distribution of Hadoop are you noticing this on?
>
> On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
> <ja...@firma.seznam.cz> wrote:
>> Hi all,
>>
>> we are seeing strange behaviour of JobTracker in the following scenario:
>> - job finishes map phase and starts reduce
>> - after the shuffle phase of all reducers we loose a tasktracker, that
>> doesn't run any reducer - so all remaining reducers are still running in the
>> reduce phase
>> - map tasks that were running on the lost tasktracker are rescheduled
>> - reduces may finish earlier than the rescheduled map tasks and so the job
>> is blocked waiting for the maps to finish, although their output is simple
>> discarded
>>
>> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
>> describe it, if there exists one can anyone point me out?
>>
>> Thanks,
>> Jan
>>
>
>
--
Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5
jan.lukavsky@firma.seznam.cz
http://www.seznam.cz
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
> - job finishes map phase and starts reduce
> - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
> - map tasks that were running on the lost tasktracker are rescheduled
> - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
> Jan
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
> - job finishes map phase and starts reduce
> - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
> - map tasks that were running on the lost tasktracker are rescheduled
> - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
> Jan
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
> - job finishes map phase and starts reduce
> - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
> - map tasks that were running on the lost tasktracker are rescheduled
> - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
> Jan
>
--
Harsh J
Re: Running map tasks after all reduces have finished
Posted by Harsh J <ha...@cloudera.com>.
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
<ja...@firma.seznam.cz> wrote:
> Hi all,
>
> we are seeing strange behaviour of JobTracker in the following scenario:
> - job finishes map phase and starts reduce
> - after the shuffle phase of all reducers we loose a tasktracker, that
> doesn't run any reducer - so all remaining reducers are still running in the
> reduce phase
> - map tasks that were running on the lost tasktracker are rescheduled
> - reduces may finish earlier than the rescheduled map tasks and so the job
> is blocked waiting for the maps to finish, although their output is simple
> discarded
>
> Is this behaviour a bug or feature? :) I haven't found any JIRA that would
> describe it, if there exists one can anyone point me out?
>
> Thanks,
> Jan
>
--
Harsh J