You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2011/03/23 18:33:15 UTC

Streaming mappers frequently time out

My streaming mappers frequently die with this error:

Task attempt_201103101623_12864_m_000032_1 failed to report status for 602 seconds. Killing!

A repeated attempt of the same task generally succeeds, but it's very time-wasteful that the task has been held up by 10 minutes.  My mapper (and reducer) are C++ and use pthreads.  I start a reporter thread as soon as the task starts and that reporter thread sends periodic reporter and status messages to cout using the streaming reporter syntax, but I still get these errors occasionally.

Also, the task logs for such failed mappers are always either empty or unretrievable.  They don't show ten minutes of actual work on the worker thread while the reporter should have been reporting.  Rather, they are empty (or like I said, totally unretrievable).  It seems to me that Hadoop is failing to even start these tasks.  If the C++ binary had actually been kicked off, the logs would show SOME kind of output (on cerr) even if the reporter thread had not been started properly because I send output to cerr before even starting the reporter thread, in fact, before any pthread-related wonkery at all (I send output right from the entry to main(), yet the logs are empty), so I really think Hadoop isn't even starting the binary, but then waits ten minutes to kill the task anyway.

Has anyone else seen anything like this?

Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________


Re: Streaming mappers frequently time out

Posted by Keith Wiley <kw...@keithwiley.com>.
Maybe I could just turn it down to two or three minutes.  It wouldn't fix the problem where the task doesn't start, but it would kill it and restart it more quickly.

Thanks.

On Mar 23, 2011, at 11:46 AM, Jim Falgout wrote:

> I've run into that before. Try setting mapreduce.task.timeout. I seem to remember that setting it to zero may turn off the timeout, but of course can be dangerous if you have a runaway task. The default is 600 seconds ;-)
> 
> Check out http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html. It lists a bunch of map reduce properties. 


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________


RE: Streaming mappers frequently time out

Posted by Jim Falgout <ji...@pervasive.com>.
I've run into that before. Try setting mapreduce.task.timeout. I seem to remember that setting it to zero may turn off the timeout, but of course can be dangerous if you have a runaway task. The default is 600 seconds ;-)

Check out http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html. It lists a bunch of map reduce properties. 

-----Original Message-----
From: Keith Wiley [mailto:kwiley@keithwiley.com] 
Sent: Wednesday, March 23, 2011 12:33 PM
To: common-user@hadoop.apache.org
Subject: Streaming mappers frequently time out

My streaming mappers frequently die with this error:

Task attempt_201103101623_12864_m_000032_1 failed to report status for 602 seconds. Killing!

A repeated attempt of the same task generally succeeds, but it's very time-wasteful that the task has been held up by 10 minutes.  My mapper (and reducer) are C++ and use pthreads.  I start a reporter thread as soon as the task starts and that reporter thread sends periodic reporter and status messages to cout using the streaming reporter syntax, but I still get these errors occasionally.

Also, the task logs for such failed mappers are always either empty or unretrievable.  They don't show ten minutes of actual work on the worker thread while the reporter should have been reporting.  Rather, they are empty (or like I said, totally unretrievable).  It seems to me that Hadoop is failing to even start these tasks.  If the C++ binary had actually been kicked off, the logs would show SOME kind of output (on cerr) even if the reporter thread had not been started properly because I send output to cerr before even starting the reporter thread, in fact, before any pthread-related wonkery at all (I send output right from the entry to main(), yet the logs are empty), so I really think Hadoop isn't even starting the binary, but then waits ten minutes to kill the task anyway.

Has anyone else seen anything like this?

Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland ________________________________________________________________________________




Re: Streaming mappers frequently time out

Posted by Keith Wiley <kw...@keithwiley.com>.
On Mar 23, 2011, at 10:33 AM, Keith Wiley wrote:

> I start a reporter thread as soon as the task starts and that reporter thread sends periodic reporter and status messages to cout using the streaming reporter syntax, but I still get these errors occasionally.


Sorry, I meant to say my reporter thread sends counter an status messages to cerr, not cout.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
                                           --  Galileo Galilei
________________________________________________________________________________