You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/01/17 12:10:14 UTC

[jira] [Updated] (NUTCH-1520) SegmentMerger looses records

     [ https://issues.apache.org/jira/browse/NUTCH-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1520:
---------------------------------

    Attachment: NUTCH-1520-1.7-1.patch

Patch that only allows CrawlDatum's to pass that have a fetchStatus.
                
> SegmentMerger looses records
> ----------------------------
>
>                 Key: NUTCH-1520
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1520
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.7
>
>         Attachments: NUTCH-1520-1.7-1.patch
>
>
> It seems the SegmentMerger tool looses documents. You're likely to see less documents in an index if you index one or more already merged segments than if you index all unmerged segments.
> This is really nasty!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira