You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2013/03/13 21:01:46 UTC

Will hadoop always spread the work evenly between nodes?

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots.
 Will Hadoop always allocate the work evenly, such that each node will get
10 input files and simultaneously start 10 mappers?  Is there a way to
force this behavior?

--Jeremy

Re: Will hadoop always spread the work evenly between nodes?

Posted by jeremy p <at...@gmail.com>.
Thank you, Jeff.  Actually, it turns out the scenario you outlined is much
closer to the situation I'm dealing with.  I'm going to ask a separate
question about that.

--Jeremy

On Wed, Mar 13, 2013 at 1:47 PM, Jeffrey Buell <jb...@vmware.com> wrote:

> I think in your case it will have to be even, because all the slots will
> get filled.  A more interesting case is if you have 40 nodes, will you get
> exactly 5 slots used for each of the nodes?  Or will some nodes get more
> than 5 mappers, and others less?  I don't remember the details, but I've
> had problems with unevenness in such scenarios.  At least in MR1, you can
> usually force evenness by adjusting the number of map and reduce slots per
> node.  In MR2 the slots are combined so achieving evenness will be more
> difficult.
>
> Jeff
>
> ------------------------------
> *From: *"jeremy p" <at...@gmail.com>
> *To: *user@hadoop.apache.org
> *Sent: *Wednesday, March 13, 2013 1:01:46 PM
> *Subject: *Will hadoop always spread the work evenly between nodes?
>
>
> Say I have 200 input files and 20 nodes, and each node has 10 mapper
> slots.  Will Hadoop always allocate the work evenly, such that each node
> will get 10 input files and simultaneously start 10 mappers?  Is there a
> way to force this behavior?
>
> --Jeremy
>
>

Re: Will hadoop always spread the work evenly between nodes?

Posted by jeremy p <at...@gmail.com>.
Thank you, Jeff.  Actually, it turns out the scenario you outlined is much
closer to the situation I'm dealing with.  I'm going to ask a separate
question about that.

--Jeremy

On Wed, Mar 13, 2013 at 1:47 PM, Jeffrey Buell <jb...@vmware.com> wrote:

> I think in your case it will have to be even, because all the slots will
> get filled.  A more interesting case is if you have 40 nodes, will you get
> exactly 5 slots used for each of the nodes?  Or will some nodes get more
> than 5 mappers, and others less?  I don't remember the details, but I've
> had problems with unevenness in such scenarios.  At least in MR1, you can
> usually force evenness by adjusting the number of map and reduce slots per
> node.  In MR2 the slots are combined so achieving evenness will be more
> difficult.
>
> Jeff
>
> ------------------------------
> *From: *"jeremy p" <at...@gmail.com>
> *To: *user@hadoop.apache.org
> *Sent: *Wednesday, March 13, 2013 1:01:46 PM
> *Subject: *Will hadoop always spread the work evenly between nodes?
>
>
> Say I have 200 input files and 20 nodes, and each node has 10 mapper
> slots.  Will Hadoop always allocate the work evenly, such that each node
> will get 10 input files and simultaneously start 10 mappers?  Is there a
> way to force this behavior?
>
> --Jeremy
>
>

Re: Will hadoop always spread the work evenly between nodes?

Posted by jeremy p <at...@gmail.com>.
Thank you, Jeff.  Actually, it turns out the scenario you outlined is much
closer to the situation I'm dealing with.  I'm going to ask a separate
question about that.

--Jeremy

On Wed, Mar 13, 2013 at 1:47 PM, Jeffrey Buell <jb...@vmware.com> wrote:

> I think in your case it will have to be even, because all the slots will
> get filled.  A more interesting case is if you have 40 nodes, will you get
> exactly 5 slots used for each of the nodes?  Or will some nodes get more
> than 5 mappers, and others less?  I don't remember the details, but I've
> had problems with unevenness in such scenarios.  At least in MR1, you can
> usually force evenness by adjusting the number of map and reduce slots per
> node.  In MR2 the slots are combined so achieving evenness will be more
> difficult.
>
> Jeff
>
> ------------------------------
> *From: *"jeremy p" <at...@gmail.com>
> *To: *user@hadoop.apache.org
> *Sent: *Wednesday, March 13, 2013 1:01:46 PM
> *Subject: *Will hadoop always spread the work evenly between nodes?
>
>
> Say I have 200 input files and 20 nodes, and each node has 10 mapper
> slots.  Will Hadoop always allocate the work evenly, such that each node
> will get 10 input files and simultaneously start 10 mappers?  Is there a
> way to force this behavior?
>
> --Jeremy
>
>

Re: Will hadoop always spread the work evenly between nodes?

Posted by jeremy p <at...@gmail.com>.
Thank you, Jeff.  Actually, it turns out the scenario you outlined is much
closer to the situation I'm dealing with.  I'm going to ask a separate
question about that.

--Jeremy

On Wed, Mar 13, 2013 at 1:47 PM, Jeffrey Buell <jb...@vmware.com> wrote:

> I think in your case it will have to be even, because all the slots will
> get filled.  A more interesting case is if you have 40 nodes, will you get
> exactly 5 slots used for each of the nodes?  Or will some nodes get more
> than 5 mappers, and others less?  I don't remember the details, but I've
> had problems with unevenness in such scenarios.  At least in MR1, you can
> usually force evenness by adjusting the number of map and reduce slots per
> node.  In MR2 the slots are combined so achieving evenness will be more
> difficult.
>
> Jeff
>
> ------------------------------
> *From: *"jeremy p" <at...@gmail.com>
> *To: *user@hadoop.apache.org
> *Sent: *Wednesday, March 13, 2013 1:01:46 PM
> *Subject: *Will hadoop always spread the work evenly between nodes?
>
>
> Say I have 200 input files and 20 nodes, and each node has 10 mapper
> slots.  Will Hadoop always allocate the work evenly, such that each node
> will get 10 input files and simultaneously start 10 mappers?  Is there a
> way to force this behavior?
>
> --Jeremy
>
>

Re: Will hadoop always spread the work evenly between nodes?

Posted by Jeffrey Buell <jb...@vmware.com>.
I think in your case it will have to be even, because all the slots will get filled. A more interesting case is if you have 40 nodes, will you get exactly 5 slots used for each of the nodes? Or will some nodes get more than 5 mappers, and others less? I don't remember the details, but I've had problems with unevenness in such scenarios. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. 

Jeff 

----- Original Message -----

From: "jeremy p" <at...@gmail.com> 
To: user@hadoop.apache.org 
Sent: Wednesday, March 13, 2013 1:01:46 PM 
Subject: Will hadoop always spread the work evenly between nodes? 

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior? 


--Jeremy 

Re: Will hadoop always spread the work evenly between nodes?

Posted by Jeffrey Buell <jb...@vmware.com>.
I think in your case it will have to be even, because all the slots will get filled. A more interesting case is if you have 40 nodes, will you get exactly 5 slots used for each of the nodes? Or will some nodes get more than 5 mappers, and others less? I don't remember the details, but I've had problems with unevenness in such scenarios. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. 

Jeff 

----- Original Message -----

From: "jeremy p" <at...@gmail.com> 
To: user@hadoop.apache.org 
Sent: Wednesday, March 13, 2013 1:01:46 PM 
Subject: Will hadoop always spread the work evenly between nodes? 

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior? 


--Jeremy 

Re: Will hadoop always spread the work evenly between nodes?

Posted by Jeffrey Buell <jb...@vmware.com>.
I think in your case it will have to be even, because all the slots will get filled. A more interesting case is if you have 40 nodes, will you get exactly 5 slots used for each of the nodes? Or will some nodes get more than 5 mappers, and others less? I don't remember the details, but I've had problems with unevenness in such scenarios. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. 

Jeff 

----- Original Message -----

From: "jeremy p" <at...@gmail.com> 
To: user@hadoop.apache.org 
Sent: Wednesday, March 13, 2013 1:01:46 PM 
Subject: Will hadoop always spread the work evenly between nodes? 

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior? 


--Jeremy 

Re: Will hadoop always spread the work evenly between nodes?

Posted by Jeffrey Buell <jb...@vmware.com>.
I think in your case it will have to be even, because all the slots will get filled. A more interesting case is if you have 40 nodes, will you get exactly 5 slots used for each of the nodes? Or will some nodes get more than 5 mappers, and others less? I don't remember the details, but I've had problems with unevenness in such scenarios. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. 

Jeff 

----- Original Message -----

From: "jeremy p" <at...@gmail.com> 
To: user@hadoop.apache.org 
Sent: Wednesday, March 13, 2013 1:01:46 PM 
Subject: Will hadoop always spread the work evenly between nodes? 

Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior? 


--Jeremy