You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kay Kay <ka...@gmail.com> on 2009/01/08 18:27:57 UTC

IndexSearcher - architecture - shortest possible latency between update of index (via IndexWriter/IndexReader) and querying the same using IndexSearcher

Hi-
  For one of our apps - we are doing a lot of additions and deletions 
(high frequency) at any given time.  Assuming the same index directory 
under discussion between the  writers ( IndexWriter and IndexReader, the 
latter for deletions) and the readers (IndexSearcher to begin with) - we 
want the IndexSearcher to retrieve the most updated index at the 
shortest possible time (with more priority on the most updated data).  
So when a IndexWriter and IndexReader updates a particular index 
directory (with proper locking between themselves) and when we search 
using IndexSearcher (that could have been initialized / warmed up in the 
past )- will another search query initiated from the same IndexSearcher 
instance consider the updated index in real time.
  Are the IndexSearcher instances  'watching' the index directories for 
changes and updating their data structures internally.
 
  What would be the best / fastest way to make sure that IndexSearcher 
instances return data that are semantically data ( assuming the 
sequential order of data). What are the trade-offs that we can make here 
- when it comes to design decisions of a Lucene-based application. Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexSearcher - architecture - shortest possible latency between update of index (via IndexWriter/IndexReader) and querying the same using IndexSearcher

Posted by Kay Kay <ka...@gmail.com>.
Thanks Erick for the clarifications regarding the same.

Assuming we have a RAMDirectory based inverted index (along with a 
FSDirectory for a secondary storage index) - what would be the 
limitation on the RAMDirectory capacity in terms of the size of the 
index.  (other than the main memory i.e. JRE allocated memory of course 
). What are some of the ways we could tune them.  There seems to be a 
sizeInBytes methods available to retrieve the capacity. Is there a 
flexible way to set the capacity though.


Apologies for the typo on the last paragraph. I meant to ask - what 
would be the best / fastest way to make sure that IndexSearcher 
instances return data that are semantically correct in the order of time.

1) we insert document A with term t1
2) search for all documents with term t1 - should return A
3) delete A
4) repeat the same query again - (all documents with term t1) - should 
not return A among its results. 




Erick Erickson wrote:
> This topic has been discussed *very* extensively, so I'd recommend you
> search the mail archive (see
> http://wiki.apache.org/lucene-java/MailingListArchives )
> since there are more good ideas there than I can remember. But the short
> answer is that you must open a new searcher for modifications to be seen.
>
> There are schemes for real-time updating, (see the archive) but they all
> take work. There are no out-of-the-box solutions that I know of.
>
> I don't understand your last paragraph at all.
>
> Best
> Erick
>
> On Thu, Jan 8, 2009 at 12:27 PM, Kay Kay <ka...@gmail.com> wrote:
>
>   
>> Hi-
>>  For one of our apps - we are doing a lot of additions and deletions (high
>> frequency) at any given time.  Assuming the same index directory under
>> discussion between the  writers ( IndexWriter and IndexReader, the latter
>> for deletions) and the readers (IndexSearcher to begin with) - we want the
>> IndexSearcher to retrieve the most updated index at the shortest possible
>> time (with more priority on the most updated data).  So when a IndexWriter
>> and IndexReader updates a particular index directory (with proper locking
>> between themselves) and when we search using IndexSearcher (that could have
>> been initialized / warmed up in the past )- will another search query
>> initiated from the same IndexSearcher instance consider the updated index in
>> real time.
>>  Are the IndexSearcher instances  'watching' the index directories for
>> changes and updating their data structures internally.
>>
>>  What would be the best / fastest way to make sure that IndexSearcher
>> instances return data that are semantically data ( assuming the sequential
>> order of data). What are the trade-offs that we can make here - when it
>> comes to design decisions of a Lucene-based application. Thanks.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexSearcher - architecture - shortest possible latency between update of index (via IndexWriter/IndexReader) and querying the same using IndexSearcher

Posted by Erick Erickson <er...@gmail.com>.
This topic has been discussed *very* extensively, so I'd recommend you
search the mail archive (see
http://wiki.apache.org/lucene-java/MailingListArchives )
since there are more good ideas there than I can remember. But the short
answer is that you must open a new searcher for modifications to be seen.

There are schemes for real-time updating, (see the archive) but they all
take work. There are no out-of-the-box solutions that I know of.

I don't understand your last paragraph at all.

Best
Erick

On Thu, Jan 8, 2009 at 12:27 PM, Kay Kay <ka...@gmail.com> wrote:

> Hi-
>  For one of our apps - we are doing a lot of additions and deletions (high
> frequency) at any given time.  Assuming the same index directory under
> discussion between the  writers ( IndexWriter and IndexReader, the latter
> for deletions) and the readers (IndexSearcher to begin with) - we want the
> IndexSearcher to retrieve the most updated index at the shortest possible
> time (with more priority on the most updated data).  So when a IndexWriter
> and IndexReader updates a particular index directory (with proper locking
> between themselves) and when we search using IndexSearcher (that could have
> been initialized / warmed up in the past )- will another search query
> initiated from the same IndexSearcher instance consider the updated index in
> real time.
>  Are the IndexSearcher instances  'watching' the index directories for
> changes and updating their data structures internally.
>
>  What would be the best / fastest way to make sure that IndexSearcher
> instances return data that are semantically data ( assuming the sequential
> order of data). What are the trade-offs that we can make here - when it
> comes to design decisions of a Lucene-based application. Thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>