You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/06/22 10:34:25 UTC

[jira] Commented: (NUTCH-504) NUTCH-443 broke parsing during fetching

    [ https://issues.apache.org/jira/browse/NUTCH-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507162 ] 

Doğacan Güney commented on NUTCH-504:
-------------------------------------

Also, should we actually index documents even if their parses have failed? Since, when a url fails we replace its parse with an empty parse anyway, it may be a good idea to skip such documents.

> NUTCH-443 broke parsing during fetching
> ---------------------------------------
>
>                 Key: NUTCH-504
>                 URL: https://issues.apache.org/jira/browse/NUTCH-504
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: parse_in_fetchers.patch
>
>
> After NUTCH-443, if one is parsing during fetching and parsing for a url fails, that url doesn't get segment name or similar properties in its metadata. Because of this, indexer fails (because, index expects to see segment name for all parses, even those that failed).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.