You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Doug Cook <na...@candiru.com> on 2009/03/13 14:51:01 UTC

Reduce task going away for 10 seconds at a time

Hi folks,

I've been debugging a severe performance problems with a Hadoop-based
application (a highly modified version of Nutch). I've recently upgraded to
Hadoop 0.19.1 from a much, much older version, and a reduce that used to
work just fine is now running orders of magnitude more slowly. 

>From the logs I can see that progress of my reduce stops for periods that
average almost exactly 10 seconds (with a very narrow distribution around 10
seconds), and it does so in various places in my code, but more or less in
proportion to how much time I'd expect the task would normally spend in that
particular place in the code, i.e. the behavior seems like my code is
randomly being interrupted for 10 seconds at a time. 

I'm planning to keep digging, but thought that these symptoms might sound
familiar to someone on this list. Ring any bells? Your help much
appreciated. 

Thanks!

Doug Cook
-- 
View this message in context: http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Reduce task going away for 10 seconds at a time

Posted by Aaron Kimball <aa...@cloudera.com>.

If you jstack the process in the middle of one of these pauses, can you see
where it's sticking?
- Aaron

On Fri, Mar 13, 2009 at 6:51 AM, Doug Cook <na...@candiru.com> wrote:

>
> Hi folks,
>
> I've been debugging a severe performance problems with a Hadoop-based
> application (a highly modified version of Nutch). I've recently upgraded to
> Hadoop 0.19.1 from a much, much older version, and a reduce that used to
> work just fine is now running orders of magnitude more slowly.
>
> From the logs I can see that progress of my reduce stops for periods that
> average almost exactly 10 seconds (with a very narrow distribution around
> 10
> seconds), and it does so in various places in my code, but more or less in
> proportion to how much time I'd expect the task would normally spend in
> that
> particular place in the code, i.e. the behavior seems like my code is
> randomly being interrupted for 10 seconds at a time.
>
> I'm planning to keep digging, but thought that these symptoms might sound
> familiar to someone on this list. Ring any bells? Your help much
> appreciated.
>
> Thanks!
>
> Doug Cook
> --
> View this message in context:
> http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>