You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ned Rockson <nr...@stanford.edu> on 2007/09/23 11:17:09 UTC

Parse reduce task fails to respond?

I get this message: Task failed to report status for 604 seconds.
Killing. often while running the parse reduce.  Usually this would be
because the machine went down, but the heartbeats are always up to
date.  Also, it will fail numerous times and the jobtracker will list
the task as failed, but if I try to re-parse the segment it throws an
error saying it's already parsed.  Has anyone else had this problem?

On a side note, I've had a problem with the parse phase before - it
would try to parse extremely long urls but I fixed that by searching
for control characters and urls longer than a few hundred characters
in the URL filters now.

RE: Parse reduce task fails to respond?

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Ned,

   I have seen this error before as well. For me, one of the reduce tasks
always used to get stuck and cause the error that you mentioned. The reason
you see the message saying that the segment is already parsed is because the
remaining reduces finished successfully and dumped their output in the
crawl_parse, parse_data and parse_text dirs in the segment folder. If you
wanna try reparsing the segment - you can delete/rename these directories
from that segment before retrying the parse.

-vishal.

-----Original Message-----
From: nedrocks@gmail.com [mailto:nedrocks@gmail.com] On Behalf Of Ned
Rockson
Sent: Sunday, September 23, 2007 2:47 PM
To: nutch-user@lucene.apache.org
Subject: Parse reduce task fails to respond?

I get this message: Task failed to report status for 604 seconds.
Killing. often while running the parse reduce.  Usually this would be
because the machine went down, but the heartbeats are always up to
date.  Also, it will fail numerous times and the jobtracker will list
the task as failed, but if I try to re-parse the segment it throws an
error saying it's already parsed.  Has anyone else had this problem?

On a side note, I've had a problem with the parse phase before - it
would try to parse extremely long urls but I fixed that by searching
for control characters and urls longer than a few hundred characters
in the URL filters now.