You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/01/26 23:43:09 UTC

[jira] Commented: (NUTCH-190) ParseUtil drops reason for failed parse

    [ http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364145 ] 

stack@archive.org commented on NUTCH-190:
-----------------------------------------

Here's an example of failure output after patch is applied:

060126 141413 task_m_bx2ifn  Error parsing: http://techreports.jpl.nasa.gov/2000/00-1147.pdf: failed(2,202): Content truncated at 102013 bytes. Parser can't handle incomplete application/pdf file

> ParseUtil drops reason for failed parse
> ---------------------------------------
>
>          Key: NUTCH-190
>          URL: http://issues.apache.org/jira/browse/NUTCH-190
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: linux
>     Reporter: stack@archive.org
>     Priority: Minor
>  Attachments: ParseUtil_drops_failure_reason.patch
>
> Doing the below:
>     Parse parse;
>     ParseStatus parseStatus;
>     try {
>       parse = ParseUtil.parse(content);
>       parseStatus = parse.getData().getStatus();
>     } catch (Exception e) {
>       parseStatus = new ParseStatus(e);
>     }
>     if (!parseStatus.isSuccess()) {
>       LOG.warning("Error parsing: " + url + ": " + parseStatus);
>       parse = null;
>     }
> ...on failure, the LOG.warning never prints out the reason for failure.  Here's an example: "Error parsing: http://www.dfrc.nasa.gov/DTRS/1967/PDF/H-478.pdf: failed(0,0)".
> ParseUtil is dropping messages lovingly crafted by parsers.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira