You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by oh...@cox.net on 2009/07/31 18:42:58 UTC

Seeking guidance for updating indexes

Hi,

I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working.  

I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp. updating indexes.

The situation I have is a little specific.  In particular, once a document is indexed via Lucene, we will, theoretically, never need to or want to remove that document.  But, we will have new documents that will need to be added periodically.  

In other words, I think the terminology would be that we woud just be "inserting" documents (and updating the Lucene index), never "updating" or "deleting" documents.

>From some research I've done, it seems like the way to accomplish this would be to just add the new documents, using Document.add(), as I did with the initial indexer, but having a new "update" app that makes sure that it is only adding documents that have not been added previously.

Is this correct?

Assuming that the above is correct, is it going to be possible to keep the search web app running while the new update app is doing its job?  

Are there things that I need to worry about in the update app, such as locking, etc.?   Note that we would only have a single update app running, i.e., we won't have any situations where we'd have multiple updates running simultaneously.

If so, what are they?

Specifically, what I'm looking for is, other than ensuring not to add previously-added documents, what is different between the original indexer code and the update indexer code?

Thanks,
Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Seeking guidance for updating indexes

Posted by oh...@cox.net.
Hi,

Phil and Ian,

Thanks for the responses and confirmations about this.  

Assuming that our requirements (as I described earlier) don't change, it looks like this updating/inserting thing should be pretty easy :)!

Later, and have a great weekend!

Jim



---- Phil Whelan <ph...@gmail.com> wrote: 
> Hi Jim,
> 
> There should not be much difference from the lucene end between a new
> index and index you want to update (add more documents to). As stated
> in the Lucene docs IndexWriter will create the index "if it does not
> already exist".
> 
>    http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexWriter.html
>    IndexWriter(Directory d, Analyzer a, IndexWriter.MaxFieldLength mfl)
>           Constructs an IndexWriter for the index in d, first creating
> it if it does not already exist.
> 
> Yes, you can search an index while adding to the index. But you will
> see a snapshot of the index at the time when you opened the searcher.
> You will need to re-open it to see changes that have been added since
> you last opened the searcher.
> 
> Lucene is very tolerant to most things. Just be careful not to have 2
> index writers writing to the same index and you should be ok. Even in
> that situation Lucene will just throw an Exception. I've been playing
> with Lucene for a long time and I've never corrupted an index yet,
> even when I do stupid things.
> 
> Thanks,
> Phil
> 
> On Fri, Jul 31, 2009 at 9:42 AM, <oh...@cox.net> wrote:
> > Hi,
> >
> > I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working.
> >
> > I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp. updating indexes.
> >
> > The situation I have is a little specific.  In particular, once a document is indexed via Lucene, we will, theoretically, never need to or want to remove that document.  But, we will have new documents that will need to be added periodically.
> >
> > In other words, I think the terminology would be that we woud just be "inserting" documents (and updating the Lucene index), never "updating" or "deleting" documents.
> >
> > From some research I've done, it seems like the way to accomplish this would be to just add the new documents, using Document.add(), as I did with the initial indexer, but having a new "update" app that makes sure that it is only adding documents that have not been added previously.
> >
> > Is this correct?
> >
> > Assuming that the above is correct, is it going to be possible to keep the search web app running while the new update app is doing its job?
> >
> > Are there things that I need to worry about in the update app, such as locking, etc.?   Note that we would only have a single update app running, i.e., we won't have any situations where we'd have multiple updates running simultaneously.
> >
> > If so, what are they?
> >
> > Specifically, what I'm looking for is, other than ensuring not to add previously-added documents, what is different between the original indexer code and the update indexer code?
> >
> > Thanks,
> > Jim
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Seeking guidance for updating indexes

Posted by Phil Whelan <ph...@gmail.com>.
Hi Jim,

There should not be much difference from the lucene end between a new
index and index you want to update (add more documents to). As stated
in the Lucene docs IndexWriter will create the index "if it does not
already exist".

   http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexWriter.html
   IndexWriter(Directory d, Analyzer a, IndexWriter.MaxFieldLength mfl)
          Constructs an IndexWriter for the index in d, first creating
it if it does not already exist.

Yes, you can search an index while adding to the index. But you will
see a snapshot of the index at the time when you opened the searcher.
You will need to re-open it to see changes that have been added since
you last opened the searcher.

Lucene is very tolerant to most things. Just be careful not to have 2
index writers writing to the same index and you should be ok. Even in
that situation Lucene will just throw an Exception. I've been playing
with Lucene for a long time and I've never corrupted an index yet,
even when I do stupid things.

Thanks,
Phil

On Fri, Jul 31, 2009 at 9:42 AM, <oh...@cox.net> wrote:
> Hi,
>
> I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working.
>
> I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp. updating indexes.
>
> The situation I have is a little specific.  In particular, once a document is indexed via Lucene, we will, theoretically, never need to or want to remove that document.  But, we will have new documents that will need to be added periodically.
>
> In other words, I think the terminology would be that we woud just be "inserting" documents (and updating the Lucene index), never "updating" or "deleting" documents.
>
> From some research I've done, it seems like the way to accomplish this would be to just add the new documents, using Document.add(), as I did with the initial indexer, but having a new "update" app that makes sure that it is only adding documents that have not been added previously.
>
> Is this correct?
>
> Assuming that the above is correct, is it going to be possible to keep the search web app running while the new update app is doing its job?
>
> Are there things that I need to worry about in the update app, such as locking, etc.?   Note that we would only have a single update app running, i.e., we won't have any situations where we'd have multiple updates running simultaneously.
>
> If so, what are they?
>
> Specifically, what I'm looking for is, other than ensuring not to add previously-added documents, what is different between the original indexer code and the update indexer code?
>
> Thanks,
> Jim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Seeking guidance for updating indexes

Posted by Ian Lea <ia...@gmail.com>.
You're pretty much spot on.  Read the FAQ entry "Does Lucene allow
searching and indexing simultaneously?" for one of your questions (the
answer is yes btw).  With only a single update app running there won't
be any locking issues.  When the updater code opens the index you'll
need to ensure that it doesn't create a new index, otherwise carry on
calling Document.add() as before.


--
Ian.

On Fri, Jul 31, 2009 at 5:42 PM, <oh...@cox.net> wrote:
> Hi,
>
> I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working.
>
> I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp. updating indexes.
>
> The situation I have is a little specific.  In particular, once a document is indexed via Lucene, we will, theoretically, never need to or want to remove that document.  But, we will have new documents that will need to be added periodically.
>
> In other words, I think the terminology would be that we woud just be "inserting" documents (and updating the Lucene index), never "updating" or "deleting" documents.
>
> From some research I've done, it seems like the way to accomplish this would be to just add the new documents, using Document.add(), as I did with the initial indexer, but having a new "update" app that makes sure that it is only adding documents that have not been added previously.
>
> Is this correct?
>
> Assuming that the above is correct, is it going to be possible to keep the search web app running while the new update app is doing its job?
>
> Are there things that I need to worry about in the update app, such as locking, etc.?   Note that we would only have a single update app running, i.e., we won't have any situations where we'd have multiple updates running simultaneously.
>
> If so, what are they?
>
> Specifically, what I'm looking for is, other than ensuring not to add previously-added documents, what is different between the original indexer code and the update indexer code?
>
> Thanks,
> Jim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org