You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "stack@archive.org (JIRA)" <ji...@apache.org> on 2006/01/26 23:43:09 UTC
[jira] Commented: (NUTCH-190) ParseUtil drops reason for failed
parse
[ http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364145 ]
stack@archive.org commented on NUTCH-190:
-----------------------------------------
Here's an example of failure output after patch is applied:
060126 141413 task_m_bx2ifn Error parsing: http://techreports.jpl.nasa.gov/2000/00-1147.pdf: failed(2,202): Content truncated at 102013 bytes. Parser can't handle incomplete application/pdf file
> ParseUtil drops reason for failed parse
> ---------------------------------------
>
> Key: NUTCH-190
> URL: http://issues.apache.org/jira/browse/NUTCH-190
> Project: Nutch
> Type: Bug
> Components: fetcher
> Versions: 0.8-dev
> Environment: linux
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: ParseUtil_drops_failure_reason.patch
>
> Doing the below:
> Parse parse;
> ParseStatus parseStatus;
> try {
> parse = ParseUtil.parse(content);
> parseStatus = parse.getData().getStatus();
> } catch (Exception e) {
> parseStatus = new ParseStatus(e);
> }
> if (!parseStatus.isSuccess()) {
> LOG.warning("Error parsing: " + url + ": " + parseStatus);
> parse = null;
> }
> ...on failure, the LOG.warning never prints out the reason for failure. Here's an example: "Error parsing: http://www.dfrc.nasa.gov/DTRS/1967/PDF/H-478.pdf: failed(0,0)".
> ParseUtil is dropping messages lovingly crafted by parsers.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira