You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vijay <vi...@gmail.com> on 2009/06/02 00:32:23 UTC

Question on Efficient field updates in the Lucene index in Nutch

Hi all,

      I have a question regarding field updates to the lucene index in
nutch.

       Suppose I am indexing webpages along with tags as an extra field. I
want to add an extra tag to a webpage. Is there a clean way for me to do
this without having to re-index the page with the updated tags field and
deleting duplicates?

      For example, can I create a new document with the same doc id as the
object made from the relevant URL in the past, with the new tag alone as a
field, with no content section. Could this enable adding the correct doc id
to the new tag postings list, thereby creating the same effect as indexing
the webpage afresh with all the given tags?
     Alternatively is there any other efficient way to do this?


Thanks a ton,
Vijay

Re: Question on Efficient field updates in the Lucene index in Nutch

Posted by Otis Gospodnetic <og...@yahoo.com>.
Unfortunately Lucene doesn't allow that.  You have to reindex the whole doc.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Vijay <vi...@gmail.com>
> To: nutch-user@lucene.apache.org; java-user@lucene.apache.org
> Sent: Monday, June 1, 2009 6:32:23 PM
> Subject: Question on Efficient field updates in the Lucene index in Nutch
> 
> Hi all,
> 
>       I have a question regarding field updates to the lucene index in
> nutch.
> 
>        Suppose I am indexing webpages along with tags as an extra field. I
> want to add an extra tag to a webpage. Is there a clean way for me to do
> this without having to re-index the page with the updated tags field and
> deleting duplicates?
> 
>       For example, can I create a new document with the same doc id as the
> object made from the relevant URL in the past, with the new tag alone as a
> field, with no content section. Could this enable adding the correct doc id
> to the new tag postings list, thereby creating the same effect as indexing
> the webpage afresh with all the given tags?
>      Alternatively is there any other efficient way to do this?
> 
> 
> Thanks a ton,
> Vijay