You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Steve Lewis <lo...@gmail.com> on 2013/08/29 04:47:39 UTC

Some jobs seem to run forever

I have an issue that I am running a hadoop job on a 40 node cluster with
about 300 Map tasks and about 300 reduce tasks. Most tasks complete within
20 minutes but a few, typically less than 10 run for many hours.
If they complete I see nothing to suggest that the number of bytes read or
written or the number of records read or written is significantly different
from tasks that run much faster. I sometimes see multiple attempts -
usually only two and the cluster is doing nothing else.

Any suggested tuning?

Re: Some jobs seem to run forever

Posted by Steve Lewis <lo...@gmail.com>.
As I said in the original message bad partitioning was my original theory I
have had issues with it in the past and am careful with my partitioner. It
 was the first thing I looked for but I do not see any evidence that the
slower jobs have significantly more data than the faster ones and certainly
not enough to justify a radically different running time.


On Thu, Aug 29, 2013 at 9:29 AM, Charles Baker <cb...@sdl.com> wrote:

>  Hi Steve. Sounds like a classic case of uneven data distribution among
> the reducers. Most of your data is probably going to those 10 reducers that
> are taking many hours. You may want to adjust your key and/or partitioning
> strategy to better distribute the data amongst the reducers. If you’re
> using a hashing type of partitioning strategy, think about using a prime
> number of reducers. Primes are proven to have a more even distribution with
> a hash type strategy and this alone may get you pretty far. I have no idea
> what your workflow or cluster configuration is like but 300 reducers for
> 300 mappers doesn’t sound right. Try using a (prime) number of reducers
> that’s roughly  equal to 95% of the total reducer slots allocated on the
> cluster and go from there. Usually, the cluster should be configured for
> less reducers than mappers. If you have 12 cores per node (HT off), try 8
> mappers and 3 reducers per node.****
>
> ** **
>
> Good luck!****
>
> ** **
>
> Chuck****
>
> ** **
>
> ** **
>
> *From:* Steve Lewis [mailto:lordjoe2000@gmail.com]
> *Sent:* Wednesday, August 28, 2013 7:48 PM
> *To:* mapreduce-user
> *Subject:* Some jobs seem to run forever****
>
> ** **
>
> I have an issue that I am running a hadoop job on a 40 node cluster with
> about 300 Map tasks and about 300 reduce tasks. Most tasks complete within
> 20 minutes but a few, typically less than 10 run for many hours. ****
>
> If they complete I see nothing to suggest that the number of bytes read or
> written or the number of records read or written is significantly different
> from tasks that run much faster. I sometimes see multiple attempts -
> usually only two and the cluster is doing nothing else.****
>
> ** **
>
> Any suggested tuning?
> ****
>
> ** **
>
>  ****
>
>
>
> www.sdl.com
> <http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature>
>
>  *SDL PLC confidential, all rights reserved.* If you are not the intended
> recipient of this mail SDL requests and requires that you delete it without
> acting upon or copying any of its contents, and we further request that you
> advise us.
>
> SDL Enterprise Technologies, Inc. - all rights reserved. The information
> contained in this email may be confidential and/or legally privileged. It
> has been sent for the sole use of the intended recipient(s). If you are not
> the intended recipient of this mail, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited. If you have received this communication in error, please reply
> to the sender and destroy all copies of the message.
> Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880,
> USA
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

RE: Some jobs seem to run forever

Posted by Charles Baker <cb...@sdl.com>.
Hi Steve. Sounds like a classic case of uneven data distribution among the reducers. Most of your data is probably going to those 10 reducers that are taking many hours. You may want to adjust your key and/or partitioning strategy to better distribute the data amongst the reducers. If you're using a hashing type of partitioning strategy, think about using a prime number of reducers. Primes are proven to have a more even distribution with a hash type strategy and this alone may get you pretty far. I have no idea what your workflow or cluster configuration is like but 300 reducers for 300 mappers doesn't sound right. Try using a (prime) number of reducers that's roughly  equal to 95% of the total reducer slots allocated on the cluster and go from there. Usually, the cluster should be configured for less reducers than mappers. If you have 12 cores per node (HT off), try 8 mappers and 3 reducers per node.

Good luck!

Chuck


From: Steve Lewis [mailto:lordjoe2000@gmail.com]
Sent: Wednesday, August 28, 2013 7:48 PM
To: mapreduce-user
Subject: Some jobs seem to run forever

I have an issue that I am running a hadoop job on a 40 node cluster with about 300 Map tasks and about 300 reduce tasks. Most tasks complete within 20 minutes but a few, typically less than 10 run for many hours.
If they complete I see nothing to suggest that the number of bytes read or written or the number of records read or written is significantly different from tasks that run much faster. I sometimes see multiple attempts - usually only two and the cluster is doing nothing else.

Any suggested tuning?


</pre><font face="arial" size="2" color="#736F6E">



<a href="http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature">
<img src="http://www.sdl.com/Content/themes/common/images/SDL_logo_strapline_GCEM_EmailSig_150x68px.jpg" border=0><br><br>www.sdl.com
</a><br><br>

<font face="arial" size="1" color="#736F6E">

<b>SDL PLC confidential, all rights reserved.</b>

If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, 
and we further request that you advise us.<BR><BR>
SDL Enterprise Technologies, Inc. - all rights reserved.  The information contained in this email may be confidential and/or legally privileged. It has been sent for the sole use of the intended recipient(s). If you are not the intended recipient of this mail, you are hereby notified that any unauthorized review, use, disclosure, dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please reply to the sender and destroy all copies of the message.
<BR>Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880, USA
</font>