You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kay Kay <ka...@gmail.com> on 2009/01/08 18:27:57 UTC
IndexSearcher - architecture - shortest possible latency between
update of index (via IndexWriter/IndexReader) and querying the same using
IndexSearcher
Hi-
For one of our apps - we are doing a lot of additions and deletions
(high frequency) at any given time. Assuming the same index directory
under discussion between the writers ( IndexWriter and IndexReader, the
latter for deletions) and the readers (IndexSearcher to begin with) - we
want the IndexSearcher to retrieve the most updated index at the
shortest possible time (with more priority on the most updated data).
So when a IndexWriter and IndexReader updates a particular index
directory (with proper locking between themselves) and when we search
using IndexSearcher (that could have been initialized / warmed up in the
past )- will another search query initiated from the same IndexSearcher
instance consider the updated index in real time.
Are the IndexSearcher instances 'watching' the index directories for
changes and updating their data structures internally.
What would be the best / fastest way to make sure that IndexSearcher
instances return data that are semantically data ( assuming the
sequential order of data). What are the trade-offs that we can make here
- when it comes to design decisions of a Lucene-based application. Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexSearcher - architecture - shortest possible latency between
update of index (via IndexWriter/IndexReader) and querying the same using
IndexSearcher
Posted by Kay Kay <ka...@gmail.com>.
Thanks Erick for the clarifications regarding the same.
Assuming we have a RAMDirectory based inverted index (along with a
FSDirectory for a secondary storage index) - what would be the
limitation on the RAMDirectory capacity in terms of the size of the
index. (other than the main memory i.e. JRE allocated memory of course
). What are some of the ways we could tune them. There seems to be a
sizeInBytes methods available to retrieve the capacity. Is there a
flexible way to set the capacity though.
Apologies for the typo on the last paragraph. I meant to ask - what
would be the best / fastest way to make sure that IndexSearcher
instances return data that are semantically correct in the order of time.
1) we insert document A with term t1
2) search for all documents with term t1 - should return A
3) delete A
4) repeat the same query again - (all documents with term t1) - should
not return A among its results.
Erick Erickson wrote:
> This topic has been discussed *very* extensively, so I'd recommend you
> search the mail archive (see
> http://wiki.apache.org/lucene-java/MailingListArchives )
> since there are more good ideas there than I can remember. But the short
> answer is that you must open a new searcher for modifications to be seen.
>
> There are schemes for real-time updating, (see the archive) but they all
> take work. There are no out-of-the-box solutions that I know of.
>
> I don't understand your last paragraph at all.
>
> Best
> Erick
>
> On Thu, Jan 8, 2009 at 12:27 PM, Kay Kay <ka...@gmail.com> wrote:
>
>
>> Hi-
>> For one of our apps - we are doing a lot of additions and deletions (high
>> frequency) at any given time. Assuming the same index directory under
>> discussion between the writers ( IndexWriter and IndexReader, the latter
>> for deletions) and the readers (IndexSearcher to begin with) - we want the
>> IndexSearcher to retrieve the most updated index at the shortest possible
>> time (with more priority on the most updated data). So when a IndexWriter
>> and IndexReader updates a particular index directory (with proper locking
>> between themselves) and when we search using IndexSearcher (that could have
>> been initialized / warmed up in the past )- will another search query
>> initiated from the same IndexSearcher instance consider the updated index in
>> real time.
>> Are the IndexSearcher instances 'watching' the index directories for
>> changes and updating their data structures internally.
>>
>> What would be the best / fastest way to make sure that IndexSearcher
>> instances return data that are semantically data ( assuming the sequential
>> order of data). What are the trade-offs that we can make here - when it
>> comes to design decisions of a Lucene-based application. Thanks.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexSearcher - architecture - shortest possible latency between update of index (via IndexWriter/IndexReader) and querying the same using IndexSearcher
Posted by Erick Erickson <er...@gmail.com>.
This topic has been discussed *very* extensively, so I'd recommend you
search the mail archive (see
http://wiki.apache.org/lucene-java/MailingListArchives )
since there are more good ideas there than I can remember. But the short
answer is that you must open a new searcher for modifications to be seen.
There are schemes for real-time updating, (see the archive) but they all
take work. There are no out-of-the-box solutions that I know of.
I don't understand your last paragraph at all.
Best
Erick
On Thu, Jan 8, 2009 at 12:27 PM, Kay Kay <ka...@gmail.com> wrote:
> Hi-
> For one of our apps - we are doing a lot of additions and deletions (high
> frequency) at any given time. Assuming the same index directory under
> discussion between the writers ( IndexWriter and IndexReader, the latter
> for deletions) and the readers (IndexSearcher to begin with) - we want the
> IndexSearcher to retrieve the most updated index at the shortest possible
> time (with more priority on the most updated data). So when a IndexWriter
> and IndexReader updates a particular index directory (with proper locking
> between themselves) and when we search using IndexSearcher (that could have
> been initialized / warmed up in the past )- will another search query
> initiated from the same IndexSearcher instance consider the updated index in
> real time.
> Are the IndexSearcher instances 'watching' the index directories for
> changes and updating their data structures internally.
>
> What would be the best / fastest way to make sure that IndexSearcher
> instances return data that are semantically data ( assuming the sequential
> order of data). What are the trade-offs that we can make here - when it
> comes to design decisions of a Lucene-based application. Thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>