You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ChadDavis <ch...@gmail.com> on 2008/11/10 20:22:45 UTC

incremental update of index

In the FAQ's it says that you have to do a manual incremental update:

How do I update a document or a set of documents that are already indexed?
>
> There is no direct update procedure in Lucene. To update an index
> incrementally you must first *delete* the documents that were updated, and
> *then re-add* them to the index.
>

How do I determine the existing document that matches the document I am
updating?

Re: incremental update of index

Posted by Erick Erickson <er...@gmail.com>.
It all depends on how many updates you're doing, which
you haven't told us <G>.

If a large majority of your index is being updated, there's
no particular reason to update, I'd build a new one.

Best
Erick

On Mon, Nov 10, 2008 at 3:09 PM, ChadDavis <ch...@gmail.com>wrote:

> That's what I thought.
>
> So, that leads me to  . . .  is it necessarily all that much faster to
> index
> in an incremental update fashion, rather than just clobbering the old
> index?
>
> On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > You have to have indexed something that uniquely identifies the
> > document in order to know what the old one is. Really, this is
> > the same question as updating, isn't it? If you could update
> > a document in place, you'd have to know what document
> > that was. If you know that information, you know which
> > document to delete.
> >
> > Note that lucene has no built-in document recognition. If I
> > add the same document to the index twice, Lucene will
> > happily consider them two *separate* documents. You have
> > to code your own notion of document meta-id (as distinct
> > from the Lucene doc id). It could be the URL, the file path
> > on disk, a document ID from your organization... the
> > possibilities are endless. Which is why Lucene can't do that
> > for you.
> >
> > Best
> > Erick
> >
> > On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis <chadmichaeldavis@gmail.com
> > >wrote:
> >
> > > In the FAQ's it says that you have to do a manual incremental update:
> > >
> > > How do I update a document or a set of documents that are already
> > indexed?
> > > >
> > > > There is no direct update procedure in Lucene. To update an index
> > > > incrementally you must first *delete* the documents that were
> updated,
> > > and
> > > > *then re-add* them to the index.
> > > >
> > >
> > > How do I determine the existing document that matches the document I am
> > > updating?
> > >
> >
>

Re: incremental update of index

Posted by ChadDavis <ch...@gmail.com>.
That's what I thought.

So, that leads me to  . . .  is it necessarily all that much faster to index
in an incremental update fashion, rather than just clobbering the old index?

On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <er...@gmail.com>wrote:

> You have to have indexed something that uniquely identifies the
> document in order to know what the old one is. Really, this is
> the same question as updating, isn't it? If you could update
> a document in place, you'd have to know what document
> that was. If you know that information, you know which
> document to delete.
>
> Note that lucene has no built-in document recognition. If I
> add the same document to the index twice, Lucene will
> happily consider them two *separate* documents. You have
> to code your own notion of document meta-id (as distinct
> from the Lucene doc id). It could be the URL, the file path
> on disk, a document ID from your organization... the
> possibilities are endless. Which is why Lucene can't do that
> for you.
>
> Best
> Erick
>
> On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis <chadmichaeldavis@gmail.com
> >wrote:
>
> > In the FAQ's it says that you have to do a manual incremental update:
> >
> > How do I update a document or a set of documents that are already
> indexed?
> > >
> > > There is no direct update procedure in Lucene. To update an index
> > > incrementally you must first *delete* the documents that were updated,
> > and
> > > *then re-add* them to the index.
> > >
> >
> > How do I determine the existing document that matches the document I am
> > updating?
> >
>

Re: incremental update of index

Posted by Erick Erickson <er...@gmail.com>.
You have to have indexed something that uniquely identifies the
document in order to know what the old one is. Really, this is
the same question as updating, isn't it? If you could update
a document in place, you'd have to know what document
that was. If you know that information, you know which
document to delete.

Note that lucene has no built-in document recognition. If I
add the same document to the index twice, Lucene will
happily consider them two *separate* documents. You have
to code your own notion of document meta-id (as distinct
from the Lucene doc id). It could be the URL, the file path
on disk, a document ID from your organization... the
possibilities are endless. Which is why Lucene can't do that
for you.

Best
Erick

On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis <ch...@gmail.com>wrote:

> In the FAQ's it says that you have to do a manual incremental update:
>
> How do I update a document or a set of documents that are already indexed?
> >
> > There is no direct update procedure in Lucene. To update an index
> > incrementally you must first *delete* the documents that were updated,
> and
> > *then re-add* them to the index.
> >
>
> How do I determine the existing document that matches the document I am
> updating?
>

Re: incremental update of index

Posted by komali <ko...@gmail.com>.

  If u want to reindex already that was indexed then just give create flag
as false 




ChadDavis wrote:
> 
> In the FAQ's it says that you have to do a manual incremental update:
> 
> How do I update a document or a set of documents that are already indexed?
>>
>> There is no direct update procedure in Lucene. To update an index
>> incrementally you must first *delete* the documents that were updated,
>> and
>> *then re-add* them to the index.
>>
> 
> How do I determine the existing document that matches the document I am
> updating?
> 
> 

-- 
View this message in context: http://www.nabble.com/incremental-update-of-index-tp20426316p23553924.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: incremental update of index

Posted by Donna L Gresh <gr...@us.ibm.com>.
ChadDavis <ch...@gmail.com> wrote on 11/10/2008 02:22:45 PM:

> In the FAQ's it says that you have to do a manual incremental update:
> 
> How do I update a document or a set of documents that are already 
indexed?
> >
> > There is no direct update procedure in Lucene. To update an index
> > incrementally you must first *delete* the documents that were updated, 
and
> > *then re-add* them to the index.
> >
> 
> How do I determine the existing document that matches the document I am
> updating?

Most applications of this sort include a unique identifier for a document, 
so
that it is easy to delete the document (by issuing a query on the 
identifier)