You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/08/03 14:31:16 UTC
[jira] Resolved: (NUTCH-696) Timeout for Parser
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche resolved NUTCH-696.
---------------------------------
Resolution: Fixed
Trunk : Committed revision 981829.
NutchBase : Committed revision 981835.
1.2 : Committed revision 981844.
> Timeout for Parser
> ------------------
>
> Key: NUTCH-696
> URL: https://issues.apache.org/jira/browse/NUTCH-696
> Project: Nutch
> Issue Type: Wish
> Components: parser
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Priority: Minor
> Fix For: 1.2, 2.0, nutchbase
>
> Attachments: timeout.patch
>
>
> I found that the parsing sometimes crashes due to a problem on a specific document, which is a bit of a shame as this blocks the rest of the segment and Hadoop ends up finding that the node does not respond. I was wondering about whether it would make sense to have a timeout mechanism for the parsing so that if a document is not parsed after a time t, it is simply treated as an exception and we can get on with the rest of the process.
> Does that make sense? Where do you think we should implement that, in ParseUtil?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.