You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Alexander Kern <al...@gmail.com> on 2007/03/20 18:11:33 UTC

future releases: Append Function for Indexing

As for now, when ever the index of a document needs to be updated, the
complete document needs to be deleted, then newly indexed & finally
added to the index repository. If, however, information merely needed
to be added to the existing document (->appended), the described
procedure creates a great overhead. Up to now Lucene does not provide
an 'append' function.

My question is: Is an 'append' function (or something similar) planned
for future releases, and if not, which classes would be most suitable
for such a function (and contain the  functions needed to support such
a function) if I would want to edit the source to fit my needs.

Many thanks in advance - Alexander

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: future releases: Append Function for Indexing

Posted by Doron Cohen <DO...@il.ibm.com>.
> read each document from the db
> add the field
> search lucene based on the UID
> remove document from lucene based on UID
> build new lucene document that has only indexed fields and the UID
> add lucene document
>
> There are some newer Lucene 2+ features that might make this process
> faster, but I am not intimately familiar with them. Someone else
> might be able to elaborate.

IndexWriter.updateDocument(Term term, Document doc [,Analyzer a])
could simplify this to:
  - create updated document from the db
  - writer.updateDocument(idField, updatedDocument)

But regarding the original question, here still update==remove+add, ie no
"append/update-in-place" logic is supported (or planned).



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: future releases: Append Function for Indexing

Posted by robert engels <re...@ix.netcom.com>.
We have a 'reindex' process for this that works like this:

read each document from the db
add the field
search lucene based on the UID
remove document from lucene based on UID
build new lucene document that has only indexed fields and the UID
add lucene document

There are some newer Lucene 2+ features that might make this process  
faster, but I am not intimately familiar with them. Someone else  
might be able to elaborate.

R

On May 3, 2007, at 8:58 AM, Alexander Kern wrote:

> Many thanks for you answer - sorry i didn't come around to ask
> earlier, but that brings me to a new question: What if I do(!) need to
> append a field that needs to be indexed?
>
> On 3/20/07, robert engels <re...@ix.netcom.com> wrote:
>> If you have "unique ids" available to you, I think the best solution
>> to accomplish a lot of this would be to use a very simple embedded db
>> to store the documents (we use a version of JDBM). Just store the key
>> as a stored field in the Lucene document, and the document in JDBM.
>>
>> This has the added benefit that merges are much faster with large
>> documents), since there is no-rewriting/copying of document data
>> during a merge.
>>
>> This solution also makes append" trivial, unless you are appending
>> fields that need to be indexed.
>>
>> On Mar 20, 2007, at 12:11 PM, Alexander Kern wrote:
>>
>> > As for now, when ever the index of a document needs to be  
>> updated, the
>> > complete document needs to be deleted, then newly indexed & finally
>> > added to the index repository. If, however, information merely  
>> needed
>> > to be added to the existing document (->appended), the described
>> > procedure creates a great overhead. Up to now Lucene does not  
>> provide
>> > an 'append' function.
>> >
>> > My question is: Is an 'append' function (or something similar)  
>> planned
>> > for future releases, and if not, which classes would be most  
>> suitable
>> > for such a function (and contain the  functions needed to  
>> support such
>> > a function) if I would want to edit the source to fit my needs.
>> >
>> > Many thanks in advance - Alexander
>> >
>> >  
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
>
> _________________________________________
> Alexander Kern
> Leibenfrostgasse 8/11
> A-1040 Vienna, Austria
> ::phone:: +43 650 487 63 59
> ::mail:: alex.kern@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: future releases: Append Function for Indexing

Posted by Alexander Kern <al...@gmail.com>.
Many thanks for you answer - sorry i didn't come around to ask
earlier, but that brings me to a new question: What if I do(!) need to
append a field that needs to be indexed?

On 3/20/07, robert engels <re...@ix.netcom.com> wrote:
> If you have "unique ids" available to you, I think the best solution
> to accomplish a lot of this would be to use a very simple embedded db
> to store the documents (we use a version of JDBM). Just store the key
> as a stored field in the Lucene document, and the document in JDBM.
>
> This has the added benefit that merges are much faster with large
> documents), since there is no-rewriting/copying of document data
> during a merge.
>
> This solution also makes append" trivial, unless you are appending
> fields that need to be indexed.
>
> On Mar 20, 2007, at 12:11 PM, Alexander Kern wrote:
>
> > As for now, when ever the index of a document needs to be updated, the
> > complete document needs to be deleted, then newly indexed & finally
> > added to the index repository. If, however, information merely needed
> > to be added to the existing document (->appended), the described
> > procedure creates a great overhead. Up to now Lucene does not provide
> > an 'append' function.
> >
> > My question is: Is an 'append' function (or something similar) planned
> > for future releases, and if not, which classes would be most suitable
> > for such a function (and contain the  functions needed to support such
> > a function) if I would want to edit the source to fit my needs.
> >
> > Many thanks in advance - Alexander
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 

_________________________________________
Alexander Kern
Leibenfrostgasse 8/11
A-1040 Vienna, Austria
::phone:: +43 650 487 63 59
::mail:: alex.kern@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: future releases: Append Function for Indexing

Posted by robert engels <re...@ix.netcom.com>.
If you have "unique ids" available to you, I think the best solution  
to accomplish a lot of this would be to use a very simple embedded db  
to store the documents (we use a version of JDBM). Just store the key  
as a stored field in the Lucene document, and the document in JDBM.

This has the added benefit that merges are much faster with large  
documents), since there is no-rewriting/copying of document data  
during a merge.

This solution also makes append" trivial, unless you are appending  
fields that need to be indexed.

On Mar 20, 2007, at 12:11 PM, Alexander Kern wrote:

> As for now, when ever the index of a document needs to be updated, the
> complete document needs to be deleted, then newly indexed & finally
> added to the index repository. If, however, information merely needed
> to be added to the existing document (->appended), the described
> procedure creates a great overhead. Up to now Lucene does not provide
> an 'append' function.
>
> My question is: Is an 'append' function (or something similar) planned
> for future releases, and if not, which classes would be most suitable
> for such a function (and contain the  functions needed to support such
> a function) if I would want to edit the source to fit my needs.
>
> Many thanks in advance - Alexander
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org