You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2011/02/09 12:23:57 UTC

[jira] Commented: (NUTCH-965) Parsing takes up 100% CPU

    [ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992440#comment-12992440 ] 

Julien Nioche commented on NUTCH-965:
-------------------------------------

this should be optional but activated by default
the parsing is also done within the fetching so it would need modifying there as well
would be nice to have that in 1.3 
note : change the title to something like "skip parsing for truncated documents" would be more accurate description

> Parsing takes up 100% CPU
> -------------------------
>
>                 Key: NUTCH-965
>                 URL: https://issues.apache.org/jira/browse/NUTCH-965
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Alexis
>         Attachments: parserJob.patch
>
>
> The issue you're likely to run into when parsing truncated FLV files is described here:
> http://www.mail-archive.com/user@nutch.apache.org/msg01880.html
> The parser library gets stuck in infinite loop as it encounters corrupted data due to for example truncating big binary files at fetch time.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira