You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Simon Willnauer <si...@googlemail.com> on 2006/07/23 19:10:47 UTC

Gdata - opening/closing index

Hello everyone,

You might have read some mails about the gdata server and what he does
so I assume that you are kind of familar with it. I need to index
every change to any entry in any feed to make the modifications
searchable. I'm especially worried about updates and inserts. So if I
index every change immediately I have to open and close the index
reader and writer all the time. This is not very efficient. I guess
wouldn't be too bad to have a little delay between the modification
and indexing e.g. the modification will be available for search a bit
later. Now the question is how does the indexer handle this? I could
index into a second index while the first index used for searching.
The indexer could index all entries in the queue and after a certain
amount of new index entries both indexes could be merged together. But
what happens if there is just one modification for 30 minutes. The
entry would not be searchable for a long time. I could use the second
index to search using a multisearcher but for that case I have to
close the indexwriter as well and it would be quiet tricky with
updates occurring in both index. instances.
This is quiet a interesting problem but I guess some of you run in
similar situations using lucene.
I'm looking forward to hear from you and your suggestions. I know that
this seems to be a question  for the user list but gdata is a lucene
project and should be discussed on the dev list :)


regards Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by Yonik Seeley <ys...@gmail.com>.
I think a lag between adding a new document and being able to find it
in a search is acceptable.

You could optionally provide better latency for low volume indicies by
having a maximumCommitFrequency parameter (say 60 seconds).  When the
first document is added in a while, a new IndexSearcher can be opened
immediately.  If more documents are added right after that, a new
IndexSearcher won't be opened  until 60 seconds have passed.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by Simon Willnauer <si...@googlemail.com>.
On 7/23/06, karl wettin <ka...@gmail.com> wrote:
> On Sun, 2006-07-23 at 19:10 +0200, Simon Willnauer wrote:
>
> > So if I index every change immediately I have to open and close the
> > index reader and writer all the time. This is not very efficient.
>
> How often do you plan to close the readers and writers?

Well as less as possible but this depends on how frequently updates
e.g. modifications occur.
I would make the factor of closing the index configurable to enable
users to choose how many documents can be added to the index before
closing it. This could / should be combined with an idle time.

> You will find the code in directory /index of the tar-ball I attached to
> issue 550 earlier today. It is a big fat layer of facade and decorators,
> and you will have use NotifiableIndex.openWhatNot and
> AutofreshedSearcher.getSearcher instead of creating your own instances.
>
> I'm sure someone have a reason not to use this solution, but it works
> great for me.

This is a lot of code without comments or java doc but it seems quiet
useful for that purpose. I will have a look at your code to grab some
ideas out of it.
I needed a similar thing for the storage so I keep track of the
references for each Searcher and decrement the reference when I don't
need the searcher anymore. If the indexwriter has been closed the last
remaining reference is decremented otherwise there is at least one ref
remaining. So the searcher destroys itself if there is no reference to
it anymore.

regards Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by karl wettin <ka...@gmail.com>.
On Sun, 2006-07-23 at 19:10 +0200, Simon Willnauer wrote:

> So if I index every change immediately I have to open and close the
> index reader and writer all the time. This is not very efficient.

How often do you plan to close the readers and writers?

> Now the question is how does the indexer handle this? I 3could index
> into a second index while the first index used for  searching. The
> indexer could index all entries in the queue and after  a certain
> amount of new index entries both indexes could be merged  together.

I have spinal problems with this.

> But what happens if there is just one modification for 30  minutes.
> The entry would not be searchable for a long time.

I use a facade that supply the IndexSearcher. All code call this facade
every time it needs a searcher, and never bound to anything. When an
IndexReader or IndexWriter commits changes, a new searcher is created in
the facade and the old searcher is placed in a que for later closing to
avoid errors for any thread currently using it. 
 
The only thing I have to worry about is how often I commit changes. I
close my index writer after one minute of idleness or so.

You will find the code in directory /index of the tar-ball I attached to
issue 550 earlier today. It is a big fat layer of facade and decorators,
and you will have use NotifiableIndex.openWhatNot and
AutofreshedSearcher.getSearcher instead of creating your own instances.

I'm sure someone have a reason not to use this solution, but it works
great for me.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by Yonik Seeley <ys...@gmail.com>.
On 7/25/06, Simon Willnauer <si...@googlemail.com> wrote:
> I was wondering how the solr server does handle updates to the index.
> I have to deal with inserts, deletes and updates in no specific order.
> So to delete and insert an entry is no problem as the ids are unique
> but for updateing a specific document in the index I have to close and
> reopen the index writer.

http://www.nabble.com/-jira--Created%3A-%28LUCENE-565%29-Supporting-deleteDocuments-in-IndexWriter-%28Code-and-Performance-Results-Provided%29-tf1580652.html#a5316811

For good performance, deletes/overwrites aren't actually done until a
"commit" is done (when you want a new view of the index....)

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by Simon Willnauer <si...@googlemail.com>.
I was wondering how the solr server does handle updates to the index.
I have to deal with inserts, deletes and updates in no specific order.
So to delete and insert an entry is no problem as the ids are unique
but for updateing a specific document in the index I have to close and
reopen the index writer. In the worst case I have to do that for each
document (if they are only updates). I could tread the update as a
insert and delete the old file via a timestamp but that might have
some I'll call it recovery overhead if for instance the server crashes
during indexing.

I'm just curious how solr or nutch solves this problem, if you have
these problems at all.

regards Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Gdata - opening/closing index

Posted by Yonik Seeley <ys...@gmail.com>.
> Now the question is how does the indexer handle this? I could
> index into a second index while the first index used for searching.

That's not necessary.  You can open an IndexSearcher for doing
searches, and go ahead and add documents directly to the same index.
The operations are not exclusive, it's just that changes won't be
visible until you close the IndexWriter and open a new IndexSearcher.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org