You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Majid Azimi <ma...@gmail.com> on 2012/12/09 13:09:12 UTC

can local disk of reduce task cause the job to fail?

Hi guys,

Hadoop the definitive guide says: reduce tasks will start only when all
maps has done their work.  Also this
link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:

>> The shuffle and sort phases occur simultaneously; while map-outputs are
being fetched they are merged.

What I have understood is that when a reducer task starts then all data it
needs(including a key and associated values) have been transferred to its
local node. Am I right? if this is true then, the node running reduce task
must have enough storage to hold all values associated with that key, else
The job will fail.

If no, then reduce job starts with some available data and shuffle + sort
phase feed reduce task contiguously, thus low storage on node does not
cause problem because data is coming on demand.

which of the two cases actually happen?

Re: can local disk of reduce task cause the job to fail?

Posted by jamal sasha <ja...@gmail.com>.

I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some small percentage of reducer being showed to be
completed even though mappers are still running

On Sun, Dec 9, 2012 at 12:15 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Reducer will not start executing until shuffle and sort phase is complete
>
> Sent from my iPhone
>
> On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:
>
> Hi guys,
>
> Hadoop the definitive guide says: reduce tasks will start only when all
> maps has done their work.  Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:
>
> >> The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged.
>
> What I have understood is that when a reducer task starts then all data it
> needs(including a key and associated values) have been transferred to its
> local node. Am I right? if this is true then, the node running reduce
> task must have enough storage to hold all values associated with that
> key, else The job will fail.
>
> If no, then reduce job starts with some available data and shuffle + sort
> phase feed reduce task contiguously, thus low storage on node does not
> cause problem because data is coming on demand.
>
> which of the two cases actually happen?
>
>

Re: can local disk of reduce task cause the job to fail?

Posted by jamal sasha <ja...@gmail.com>.

I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some small percentage of reducer being showed to be
completed even though mappers are still running

On Sun, Dec 9, 2012 at 12:15 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Reducer will not start executing until shuffle and sort phase is complete
>
> Sent from my iPhone
>
> On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:
>
> Hi guys,
>
> Hadoop the definitive guide says: reduce tasks will start only when all
> maps has done their work.  Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:
>
> >> The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged.
>
> What I have understood is that when a reducer task starts then all data it
> needs(including a key and associated values) have been transferred to its
> local node. Am I right? if this is true then, the node running reduce
> task must have enough storage to hold all values associated with that
> key, else The job will fail.
>
> If no, then reduce job starts with some available data and shuffle + sort
> phase feed reduce task contiguously, thus low storage on node does not
> cause problem because data is coming on demand.
>
> which of the two cases actually happen?
>
>

Re: can local disk of reduce task cause the job to fail?

Posted by jamal sasha <ja...@gmail.com>.

I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some small percentage of reducer being showed to be
completed even though mappers are still running

On Sun, Dec 9, 2012 at 12:15 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Reducer will not start executing until shuffle and sort phase is complete
>
> Sent from my iPhone
>
> On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:
>
> Hi guys,
>
> Hadoop the definitive guide says: reduce tasks will start only when all
> maps has done their work.  Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:
>
> >> The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged.
>
> What I have understood is that when a reducer task starts then all data it
> needs(including a key and associated values) have been transferred to its
> local node. Am I right? if this is true then, the node running reduce
> task must have enough storage to hold all values associated with that
> key, else The job will fail.
>
> If no, then reduce job starts with some available data and shuffle + sort
> phase feed reduce task contiguously, thus low storage on node does not
> cause problem because data is coming on demand.
>
> which of the two cases actually happen?
>
>

Re: can local disk of reduce task cause the job to fail?

Posted by jamal sasha <ja...@gmail.com>.

I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some small percentage of reducer being showed to be
completed even though mappers are still running

On Sun, Dec 9, 2012 at 12:15 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Reducer will not start executing until shuffle and sort phase is complete
>
> Sent from my iPhone
>
> On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:
>
> Hi guys,
>
> Hadoop the definitive guide says: reduce tasks will start only when all
> maps has done their work.  Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:
>
> >> The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged.
>
> What I have understood is that when a reducer task starts then all data it
> needs(including a key and associated values) have been transferred to its
> local node. Am I right? if this is true then, the node running reduce
> task must have enough storage to hold all values associated with that
> key, else The job will fail.
>
> If no, then reduce job starts with some available data and shuffle + sort
> phase feed reduce task contiguously, thus low storage on node does not
> cause problem because data is coming on demand.
>
> which of the two cases actually happen?
>
>

Re: can local disk of reduce task cause the job to fail?

Posted by Mohit Anchlia <mo...@gmail.com>.

Reducer will not start executing until shuffle and sort phase is complete

Sent from my iPhone

On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:

> Hi guys,
> 
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work.  Also this link says:
> 
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
> 
> What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail.
> 
> If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand.
> 
> which of the two cases actually happen?

Re: can local disk of reduce task cause the job to fail?

Posted by Mohit Anchlia <mo...@gmail.com>.

Reducer will not start executing until shuffle and sort phase is complete

Sent from my iPhone

On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:

> Hi guys,
> 
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work.  Also this link says:
> 
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
> 
> What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail.
> 
> If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand.
> 
> which of the two cases actually happen?

Re: can local disk of reduce task cause the job to fail?

Posted by Mohit Anchlia <mo...@gmail.com>.

Reducer will not start executing until shuffle and sort phase is complete

Sent from my iPhone

On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:

> Hi guys,
> 
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work.  Also this link says:
> 
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
> 
> What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail.
> 
> If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand.
> 
> which of the two cases actually happen?

Re: can local disk of reduce task cause the job to fail?

Posted by Mohit Anchlia <mo...@gmail.com>.

Reducer will not start executing until shuffle and sort phase is complete

Sent from my iPhone

On Dec 9, 2012, at 4:09 AM, Majid Azimi <ma...@gmail.com> wrote:

> Hi guys,
> 
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work.  Also this link says:
> 
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
> 
> What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail.
> 
> If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand.
> 
> which of the two cases actually happen?