You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2009/01/21 16:02:01 UTC

[jira] Updated: (NUTCH-664) Possibility to update already stored documents.

     [ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney updated NUTCH-664:
--------------------------------

    Fix Version/s: 1.1

Moving this issue to 1.1.

> Possibility to update already stored documents.
> -----------------------------------------------
>
>                 Key: NUTCH-664
>                 URL: https://issues.apache.org/jira/browse/NUTCH-664
>             Project: Nutch
>          Issue Type: Wish
>            Reporter: Sergey Khilkov
>            Priority: Minor
>             Fix For: 1.1
>
>
> We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes any time we update some information about page. The information can be changed 1-3 times per day. At this moment we have to store changed info in database, but in this case we have lots of problems with sorting, search restricions and so on. Lucene itself allows delete single document and add new one into existing index. But there is a problem with hadoop... As I understand hadoop filesystem has no possibility to write in random positions. But it will be great feature if nutch will be able to update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.