You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by codetester <in...@gmail.com> on 2008/02/01 23:04:33 UTC

Concurrent Indexing + Searching

Hi All,

A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
perform live searching and indexing. To achieve that, I tried the following

FSDirectory directory = FSDirectory.getDirectory(location);
IndexReader reader = IndexReader.open(directory );
IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
true); // <- I want to recreate the index every time
IndexSearcher searcher = new IndexSearcher( reader );

For Searching, I have the following code
QueryParser queryParser = new QueryParser("xyz", new StandardAnalyzer());
Hits hits = searcher .search(queryParser.parse(displayName + "*"));

And for adding records, I have the following code
 // Create doc object
 writer.addDocument(doc);

 IndexReader newIndexReader = reader.reopen() ;
 if ( newIndexReader != reader ) {
       reader.close() ;
 }
 reader = newIndexReader ;
 searcher.close() ;
 searcher = new IndexSearcher(reader );
        
So the issues that I face are 

1) The addition of new record is not reflected in the search ( even though I
have reinited IndexSearcher )

2) Obviously, the add record code is not thread safe. I am trying to close
and update the reference to IndexSearcher object. I could add a sync block,
but the bigger question would be that what is the ideal way to achieve this
case where I need to add and search record real-time ?

Thanks !




-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15234463.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
P.S.

About that write bombardment...its still very difficult for that to be a 
problem. Take a look at the tests. I start a bunch of threads searching 
as fast as they can, and a bunch of threads writing as fast as they can 
- nonstop. And still there are plenty of moments where the references 
hit 0 and things refresh. You would basically have to be getting DOS'd 
for this to matter that much.

ajay_garg wrote:
> Thanks Mark.
>
> Ok, I got your point. So it happens like this :
>
> a) If it is me, who is re-opening an IndxReader, at any time, but
> "manually-programmatically". That is, I don't want
> a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.
>
> b) If I do wish this automatic-reopening of index (using IndexAccessor),
> then I am forced to rely on all the indexer threads releasing the reference
> to IndexWriter, which by the way, as a developer, can never be sure of (that
> is, I don't have any control, as to when exactly all the threads leave the
> reference ).
>
> Will be obliged if you could give a confirmation to my understanding.
>
> Thanks
> Ajay Garg
>
> markrmiller wrote:
>   
>> You are right that if auto-commit=true and a user reopens an 
>> IndexReader, the docs will absolutely be visible as they are flushed. I 
>> think the part you are missing is that you need to be cooperating with 
>> the IndexAccessor: a user should not be reopening an IndexReader. The 
>> whole point of IndexAccessor is to coordinate these things...when a 
>> Writer is released, we know the index has changed, so that is when the 
>> IndexReaders are reopened for you. Because the IndexWriter is cached and 
>> shared by Threads, a thread might release the Writer while another is 
>> still using it...that is why things are not reopened and the Writer not 
>> closed until the last thread releases its reference to it. Essentially, 
>> IndexAccessor control visibility by controlling how current the view of 
>> the Readers is, by controlling their reopening -- a user should agree 
>> not to reopen -- just like he must agree not to use a ReadingWriter to 
>> delete.
>>
>> If you want to just set an IndexWriter to indexing for eternity and then 
>> have some Readers that you occasionally reopen, you don't need 
>> IndexAccessor. Its purpose is to coordinate ReaderReaders, 
>> WritingReaders, Searchers, and Writers for you. You are proposing to 
>> coordinate them yourself. IndexAccess reopens Readers for you after a 
>> Writer has been used, and enforces Lucene requirements, like a 
>> WritingReader cannot be used at the same time as a Writer...etc.
>>
>> Technically, IndexAccessor could reopen the readers every 2 
>> seconds...and then you would see your changes...instead it only tries to 
>> reopen them if a change has been made to the index...and it does not 
>> want to get greedy if a Writer is batch loading, so it waits for you to 
>> release the Writer. You can control how often the 'view' is updated by 
>> releasing the Writer more often -- say every 50 docs. Write 50 docs, 
>> release, get, write 50 docs.
>>
>> - Mark
>>
>> ajay_garg wrote:
>>     
>>> @Mark.
>>>
>>> I am sorry, but I need a bit more of explanation. So you mean to say ::
>>>
>>> "If auto-commit is false, then of course, docs will not be visible in the
>>> index, until all the threads release themselves out of a particular
>>> IndexWriter instance, and close() the IndexWriter instance.
>>> If auto-commit is true, even then the above holds true. In particular,
>>> let's
>>> say iI need an application 
>>> with the following requirements ::
>>>
>>> a) There are multiple indexer threads indexing on a SINGLE indexwriter
>>> instance with auto-commit true
>>> b) Each thread 'flushes' according to a pre-defined criteria at some
>>> point
>>> of time.
>>> c) The index should be updated immediately, that is, if any user re-opens
>>> the IndexSearcher, then the 
>>>     documents added till-that-snapshot-of-index must be visible. Note
>>> that
>>> the IndexWriter instance hasn't 
>>>     been closed as yet, the indexer threads will be indexing till
>>> eternity,
>>> so that IndexWriter instance will 
>>>     never be closed.
>>>
>>> So, you presume that building an application with the above requirements
>>> is
>>> impossible, even with auto-commit set to true. "
>>>
>>> ( If I sound ambiguous at any point, kindly forgive me for my lack of
>>> language skills. I will try to explain better, if need arises ).
>>>
>>> Looking forward to a reply
>>> Ajay Garg
>>>
>>> markrmiller wrote:
>>>   
>>>       
>>>> You are correct that autocommit=false means that docs will be in the 
>>>> index before the last thread releases its concurrent hold on a Writer, 
>>>> *but because IndexAccessor controls* *when the IndexSearchers are 
>>>> reopened*, those docs will still not be visible until the last thread 
>>>> holding a Writer releases it...that is when the reopening of Searchers 
>>>> occurs as well as when the Writer is closed.
>>>>
>>>> - Mark
>>>>
>>>> ajay_garg wrote:
>>>>     
>>>>         
>>>>> Hi. Sorry if I seem a stranger in this thread, but there is something
>>>>> that I
>>>>> can't resist clearing myself on.
>>>>>
>>>>> Mark, you say that the additional documents added to a index, won't
>>>>> show
>>>>> up
>>>>> until the # of threads accessing the index hits 0; and subsequently the
>>>>> indexwriter instance is closed.
>>>>>
>>>>> But I suppose that the autocommit=true, asserts that all flushed
>>>>> (Added)
>>>>> documents are immediately committed ( and hence visible ) in the index,
>>>>> and
>>>>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>>>>> required.
>>>>> ( Of course, re-opening an IndexSearcher instance is required ).
>>>>>
>>>>> Am I being dumb ?
>>>>>
>>>>> Looking eagerly for you to shed some light on my doubt.
>>>>>
>>>>> Thanks
>>>>> Ajay Garg
>>>>>
>>>>>
>>>>> codetester wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi All,
>>>>>>
>>>>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>>>>> perform live searching and indexing. To achieve that, I tried the
>>>>>> following
>>>>>>
>>>>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>>>>> IndexReader reader = IndexReader.open(directory );
>>>>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>>>>> true); // <- I want to recreate the index every time
>>>>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>>>>
>>>>>> For Searching, I have the following code
>>>>>> QueryParser queryParser = new QueryParser("xyz", new
>>>>>> StandardAnalyzer());
>>>>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>>>>
>>>>>> And for adding records, I have the following code
>>>>>>  // Create doc object
>>>>>>  writer.addDocument(doc);
>>>>>>
>>>>>>  IndexReader newIndexReader = reader.reopen() ;
>>>>>>  if ( newIndexReader != reader ) {
>>>>>>        reader.close() ;
>>>>>>  }
>>>>>>  reader = newIndexReader ;
>>>>>>  searcher.close() ;
>>>>>>  searcher = new IndexSearcher(reader );
>>>>>>         
>>>>>> So the issues that I face are 
>>>>>>
>>>>>> 1) The addition of new record is not reflected in the search ( even
>>>>>> though
>>>>>> I have reinited IndexSearcher )
>>>>>>
>>>>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>>>> close
>>>>>> and update the reference to IndexSearcher object. I could add a sync
>>>>>> block, but the bigger question would be that what is the ideal way to
>>>>>> achieve this case where I need to add and search record real-time ? 
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by ajay_garg <ga...@gmail.com>.
Thanks a ton Mark. I am really obliged to interact with you, who is never
hesistant to reply on the slightest of queries.

Thanks again.
Ajay Garg

markrmiller wrote:
> 
> 
>>  Again, if the
>> indexerThreads are bombarding the writer continuously, then the moment,
>> when
>> no indexer is accessing the writer, may never come. Thus, I invested some
>> of
>> my time, and wrote my own code, to control the sleeping of
>> indexerThreads.
>>   
> I don't know how much of a concern this is. All you can really do is 
> juggle the capabilities of Lucene, and Lucene was not designed to allow 
> continuous writes to the database that are instantly available. That is 
> one of the compromises of doing full text search over db. If you reopen 
> the index in the face of constant write bombardment, it will already 
> need to be reopened again immediately, and so on. You still need to 
> consider the cost of reopening huge indexes...its not going to be fast 
> enough to keep up with this kind of bombardment. I think you have to 
> limit the use case.
> 
> I suppose you could refresh the readers occasionally in a long line of 
> Writer get/release bombardment, but Lucene is just not in a position to 
> handle such an interactive index, and I don't think it will be too 
> fruitful trying to force it. If you correctly batch load, this is not 
> that big of a limitation. Updates generally come in two ways...random 
> updates here and there or a batch of updates at once - neither of these 
> cases will cause bombardment.
> 
> - Mark
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15305064.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
>  Again, if the
> indexerThreads are bombarding the writer continuously, then the moment, when
> no indexer is accessing the writer, may never come. Thus, I invested some of
> my time, and wrote my own code, to control the sleeping of indexerThreads.
>   
I don't know how much of a concern this is. All you can really do is 
juggle the capabilities of Lucene, and Lucene was not designed to allow 
continuous writes to the database that are instantly available. That is 
one of the compromises of doing full text search over db. If you reopen 
the index in the face of constant write bombardment, it will already 
need to be reopened again immediately, and so on. You still need to 
consider the cost of reopening huge indexes...its not going to be fast 
enough to keep up with this kind of bombardment. I think you have to 
limit the use case.

I suppose you could refresh the readers occasionally in a long line of 
Writer get/release bombardment, but Lucene is just not in a position to 
handle such an interactive index, and I don't think it will be too 
fruitful trying to force it. If you correctly batch load, this is not 
that big of a limitation. Updates generally come in two ways...random 
updates here and there or a batch of updates at once - neither of these 
cases will cause bombardment.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by ajay_garg <ga...@gmail.com>.
Thanks Mark. 

Just one last thing, this issue seems to be similar to the case, where the
Lucene source code says, that if an explicit "flush" method is called on an
IndexWriter instance, then again, it will wait for all the indexerThreads to
release the writer, and only then will the flush happen. Again, if the
indexerThreads are bombarding the writer continuously, then the moment, when
no indexer is accessing the writer, may never come. Thus, I invested some of
my time, and wrote my own code, to control the sleeping of indexerThreads.

Thanks Mark for your help.

Ajay Garg

markrmiller wrote:
> 
> 
> 
> ajay_garg wrote:
>> Thanks Mark.
>>
>> Ok, I got your point. So it happens like this :
>>
>> a) If it is me, who is re-opening an IndxReader, at any time, but
>> "manually-programmatically". That is, I don't want
>> a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.
>>   
> Sure...your kind of doing what IndexAccessor does...choosing when to 
> reopen the views using some metric. Just follow Lucene access rules (no 
> writing ops with a Reader while another thread uses a Writer etc.) Also, 
> you want to share Searchers and Writers across threads.
>> b) If I do wish this automatic-reopening of index (using IndexAccessor),
>> then I am forced to rely on all the indexer threads releasing the
>> reference
>> to IndexWriter, which by the way, as a developer, can never be sure of
>> (that
>> is, I don't have any control, as to when exactly all the threads leave
>> the
>> reference ).
>>   
> You have fairly decent control...its all running on the server. A client 
> would be making a call to the server, which would run the code. To 
> start, release in a finally block, and second, avoid any infinite loops 
> or what not, and you have a fair amount of control here. As long as your 
> computer can compute and make forward progress, even if any exception is 
> thrown, things will get released. One year plus at many sites and I have 
> never seen anything not get released unless the whole server went down, 
> in which case I cannot do anything anyway. Now if your constantly 
> bombarded with write operations that just never let up...sure - but your 
> still the code behind the curtain...you can write some code that looks 
> for such a bombardment. I think the control is pretty good. I guess the 
> point is that the client is not whats using IndexAccessor...its making a 
> request to the server which then uses IndexAccessor.
>> Will be obliged if you could give a confirmation to my understanding.
>>
>> Thanks
>> Ajay Garg
>>
>> markrmiller wrote:
>>   
>>> You are right that if auto-commit=true and a user reopens an 
>>> IndexReader, the docs will absolutely be visible as they are flushed. I 
>>> think the part you are missing is that you need to be cooperating with 
>>> the IndexAccessor: a user should not be reopening an IndexReader. The 
>>> whole point of IndexAccessor is to coordinate these things...when a 
>>> Writer is released, we know the index has changed, so that is when the 
>>> IndexReaders are reopened for you. Because the IndexWriter is cached and 
>>> shared by Threads, a thread might release the Writer while another is 
>>> still using it...that is why things are not reopened and the Writer not 
>>> closed until the last thread releases its reference to it. Essentially, 
>>> IndexAccessor control visibility by controlling how current the view of 
>>> the Readers is, by controlling their reopening -- a user should agree 
>>> not to reopen -- just like he must agree not to use a ReadingWriter to 
>>> delete.
>>>
>>> If you want to just set an IndexWriter to indexing for eternity and then 
>>> have some Readers that you occasionally reopen, you don't need 
>>> IndexAccessor. Its purpose is to coordinate ReaderReaders, 
>>> WritingReaders, Searchers, and Writers for you. You are proposing to 
>>> coordinate them yourself. IndexAccess reopens Readers for you after a 
>>> Writer has been used, and enforces Lucene requirements, like a 
>>> WritingReader cannot be used at the same time as a Writer...etc.
>>>
>>> Technically, IndexAccessor could reopen the readers every 2 
>>> seconds...and then you would see your changes...instead it only tries to 
>>> reopen them if a change has been made to the index...and it does not 
>>> want to get greedy if a Writer is batch loading, so it waits for you to 
>>> release the Writer. You can control how often the 'view' is updated by 
>>> releasing the Writer more often -- say every 50 docs. Write 50 docs, 
>>> release, get, write 50 docs.
>>>
>>> - Mark
>>>
>>> ajay_garg wrote:
>>>     
>>>> @Mark.
>>>>
>>>> I am sorry, but I need a bit more of explanation. So you mean to say ::
>>>>
>>>> "If auto-commit is false, then of course, docs will not be visible in
>>>> the
>>>> index, until all the threads release themselves out of a particular
>>>> IndexWriter instance, and close() the IndexWriter instance.
>>>> If auto-commit is true, even then the above holds true. In particular,
>>>> let's
>>>> say iI need an application 
>>>> with the following requirements ::
>>>>
>>>> a) There are multiple indexer threads indexing on a SINGLE indexwriter
>>>> instance with auto-commit true
>>>> b) Each thread 'flushes' according to a pre-defined criteria at some
>>>> point
>>>> of time.
>>>> c) The index should be updated immediately, that is, if any user
>>>> re-opens
>>>> the IndexSearcher, then the 
>>>>     documents added till-that-snapshot-of-index must be visible. Note
>>>> that
>>>> the IndexWriter instance hasn't 
>>>>     been closed as yet, the indexer threads will be indexing till
>>>> eternity,
>>>> so that IndexWriter instance will 
>>>>     never be closed.
>>>>
>>>> So, you presume that building an application with the above
>>>> requirements
>>>> is
>>>> impossible, even with auto-commit set to true. "
>>>>
>>>> ( If I sound ambiguous at any point, kindly forgive me for my lack of
>>>> language skills. I will try to explain better, if need arises ).
>>>>
>>>> Looking forward to a reply
>>>> Ajay Garg
>>>>
>>>> markrmiller wrote:
>>>>   
>>>>       
>>>>> You are correct that autocommit=false means that docs will be in the 
>>>>> index before the last thread releases its concurrent hold on a Writer, 
>>>>> *but because IndexAccessor controls* *when the IndexSearchers are 
>>>>> reopened*, those docs will still not be visible until the last thread 
>>>>> holding a Writer releases it...that is when the reopening of Searchers 
>>>>> occurs as well as when the Writer is closed.
>>>>>
>>>>> - Mark
>>>>>
>>>>> ajay_garg wrote:
>>>>>     
>>>>>         
>>>>>> Hi. Sorry if I seem a stranger in this thread, but there is something
>>>>>> that I
>>>>>> can't resist clearing myself on.
>>>>>>
>>>>>> Mark, you say that the additional documents added to a index, won't
>>>>>> show
>>>>>> up
>>>>>> until the # of threads accessing the index hits 0; and subsequently
>>>>>> the
>>>>>> indexwriter instance is closed.
>>>>>>
>>>>>> But I suppose that the autocommit=true, asserts that all flushed
>>>>>> (Added)
>>>>>> documents are immediately committed ( and hence visible ) in the
>>>>>> index,
>>>>>> and
>>>>>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>>>>>> required.
>>>>>> ( Of course, re-opening an IndexSearcher instance is required ).
>>>>>>
>>>>>> Am I being dumb ?
>>>>>>
>>>>>> Looking eagerly for you to shed some light on my doubt.
>>>>>>
>>>>>> Thanks
>>>>>> Ajay Garg
>>>>>>
>>>>>>
>>>>>> codetester wrote:
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>> Hi All,
>>>>>>>
>>>>>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene
>>>>>>> to
>>>>>>> perform live searching and indexing. To achieve that, I tried the
>>>>>>> following
>>>>>>>
>>>>>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>>>>>> IndexReader reader = IndexReader.open(directory );
>>>>>>> IndexWriter writer = new IndexWriter(directory , new
>>>>>>> SimpleAnalyzer(),
>>>>>>> true); // <- I want to recreate the index every time
>>>>>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>>>>>
>>>>>>> For Searching, I have the following code
>>>>>>> QueryParser queryParser = new QueryParser("xyz", new
>>>>>>> StandardAnalyzer());
>>>>>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>>>>>
>>>>>>> And for adding records, I have the following code
>>>>>>>  // Create doc object
>>>>>>>  writer.addDocument(doc);
>>>>>>>
>>>>>>>  IndexReader newIndexReader = reader.reopen() ;
>>>>>>>  if ( newIndexReader != reader ) {
>>>>>>>        reader.close() ;
>>>>>>>  }
>>>>>>>  reader = newIndexReader ;
>>>>>>>  searcher.close() ;
>>>>>>>  searcher = new IndexSearcher(reader );
>>>>>>>         
>>>>>>> So the issues that I face are 
>>>>>>>
>>>>>>> 1) The addition of new record is not reflected in the search ( even
>>>>>>> though
>>>>>>> I have reinited IndexSearcher )
>>>>>>>
>>>>>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>>>>> close
>>>>>>> and update the reference to IndexSearcher object. I could add a sync
>>>>>>> block, but the bigger question would be that what is the ideal way
>>>>>>> to
>>>>>>> achieve this case where I need to add and search record real-time ? 
>>>>>>>
>>>>>>> Thanks !
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>   
>>>>>>       
>>>>>>           
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>         
>>>>   
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15289328.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.

ajay_garg wrote:
> Thanks Mark.
>
> Ok, I got your point. So it happens like this :
>
> a) If it is me, who is re-opening an IndxReader, at any time, but
> "manually-programmatically". That is, I don't want
> a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.
>   
Sure...your kind of doing what IndexAccessor does...choosing when to 
reopen the views using some metric. Just follow Lucene access rules (no 
writing ops with a Reader while another thread uses a Writer etc.) Also, 
you want to share Searchers and Writers across threads.
> b) If I do wish this automatic-reopening of index (using IndexAccessor),
> then I am forced to rely on all the indexer threads releasing the reference
> to IndexWriter, which by the way, as a developer, can never be sure of (that
> is, I don't have any control, as to when exactly all the threads leave the
> reference ).
>   
You have fairly decent control...its all running on the server. A client 
would be making a call to the server, which would run the code. To 
start, release in a finally block, and second, avoid any infinite loops 
or what not, and you have a fair amount of control here. As long as your 
computer can compute and make forward progress, even if any exception is 
thrown, things will get released. One year plus at many sites and I have 
never seen anything not get released unless the whole server went down, 
in which case I cannot do anything anyway. Now if your constantly 
bombarded with write operations that just never let up...sure - but your 
still the code behind the curtain...you can write some code that looks 
for such a bombardment. I think the control is pretty good. I guess the 
point is that the client is not whats using IndexAccessor...its making a 
request to the server which then uses IndexAccessor.
> Will be obliged if you could give a confirmation to my understanding.
>
> Thanks
> Ajay Garg
>
> markrmiller wrote:
>   
>> You are right that if auto-commit=true and a user reopens an 
>> IndexReader, the docs will absolutely be visible as they are flushed. I 
>> think the part you are missing is that you need to be cooperating with 
>> the IndexAccessor: a user should not be reopening an IndexReader. The 
>> whole point of IndexAccessor is to coordinate these things...when a 
>> Writer is released, we know the index has changed, so that is when the 
>> IndexReaders are reopened for you. Because the IndexWriter is cached and 
>> shared by Threads, a thread might release the Writer while another is 
>> still using it...that is why things are not reopened and the Writer not 
>> closed until the last thread releases its reference to it. Essentially, 
>> IndexAccessor control visibility by controlling how current the view of 
>> the Readers is, by controlling their reopening -- a user should agree 
>> not to reopen -- just like he must agree not to use a ReadingWriter to 
>> delete.
>>
>> If you want to just set an IndexWriter to indexing for eternity and then 
>> have some Readers that you occasionally reopen, you don't need 
>> IndexAccessor. Its purpose is to coordinate ReaderReaders, 
>> WritingReaders, Searchers, and Writers for you. You are proposing to 
>> coordinate them yourself. IndexAccess reopens Readers for you after a 
>> Writer has been used, and enforces Lucene requirements, like a 
>> WritingReader cannot be used at the same time as a Writer...etc.
>>
>> Technically, IndexAccessor could reopen the readers every 2 
>> seconds...and then you would see your changes...instead it only tries to 
>> reopen them if a change has been made to the index...and it does not 
>> want to get greedy if a Writer is batch loading, so it waits for you to 
>> release the Writer. You can control how often the 'view' is updated by 
>> releasing the Writer more often -- say every 50 docs. Write 50 docs, 
>> release, get, write 50 docs.
>>
>> - Mark
>>
>> ajay_garg wrote:
>>     
>>> @Mark.
>>>
>>> I am sorry, but I need a bit more of explanation. So you mean to say ::
>>>
>>> "If auto-commit is false, then of course, docs will not be visible in the
>>> index, until all the threads release themselves out of a particular
>>> IndexWriter instance, and close() the IndexWriter instance.
>>> If auto-commit is true, even then the above holds true. In particular,
>>> let's
>>> say iI need an application 
>>> with the following requirements ::
>>>
>>> a) There are multiple indexer threads indexing on a SINGLE indexwriter
>>> instance with auto-commit true
>>> b) Each thread 'flushes' according to a pre-defined criteria at some
>>> point
>>> of time.
>>> c) The index should be updated immediately, that is, if any user re-opens
>>> the IndexSearcher, then the 
>>>     documents added till-that-snapshot-of-index must be visible. Note
>>> that
>>> the IndexWriter instance hasn't 
>>>     been closed as yet, the indexer threads will be indexing till
>>> eternity,
>>> so that IndexWriter instance will 
>>>     never be closed.
>>>
>>> So, you presume that building an application with the above requirements
>>> is
>>> impossible, even with auto-commit set to true. "
>>>
>>> ( If I sound ambiguous at any point, kindly forgive me for my lack of
>>> language skills. I will try to explain better, if need arises ).
>>>
>>> Looking forward to a reply
>>> Ajay Garg
>>>
>>> markrmiller wrote:
>>>   
>>>       
>>>> You are correct that autocommit=false means that docs will be in the 
>>>> index before the last thread releases its concurrent hold on a Writer, 
>>>> *but because IndexAccessor controls* *when the IndexSearchers are 
>>>> reopened*, those docs will still not be visible until the last thread 
>>>> holding a Writer releases it...that is when the reopening of Searchers 
>>>> occurs as well as when the Writer is closed.
>>>>
>>>> - Mark
>>>>
>>>> ajay_garg wrote:
>>>>     
>>>>         
>>>>> Hi. Sorry if I seem a stranger in this thread, but there is something
>>>>> that I
>>>>> can't resist clearing myself on.
>>>>>
>>>>> Mark, you say that the additional documents added to a index, won't
>>>>> show
>>>>> up
>>>>> until the # of threads accessing the index hits 0; and subsequently the
>>>>> indexwriter instance is closed.
>>>>>
>>>>> But I suppose that the autocommit=true, asserts that all flushed
>>>>> (Added)
>>>>> documents are immediately committed ( and hence visible ) in the index,
>>>>> and
>>>>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>>>>> required.
>>>>> ( Of course, re-opening an IndexSearcher instance is required ).
>>>>>
>>>>> Am I being dumb ?
>>>>>
>>>>> Looking eagerly for you to shed some light on my doubt.
>>>>>
>>>>> Thanks
>>>>> Ajay Garg
>>>>>
>>>>>
>>>>> codetester wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi All,
>>>>>>
>>>>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>>>>> perform live searching and indexing. To achieve that, I tried the
>>>>>> following
>>>>>>
>>>>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>>>>> IndexReader reader = IndexReader.open(directory );
>>>>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>>>>> true); // <- I want to recreate the index every time
>>>>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>>>>
>>>>>> For Searching, I have the following code
>>>>>> QueryParser queryParser = new QueryParser("xyz", new
>>>>>> StandardAnalyzer());
>>>>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>>>>
>>>>>> And for adding records, I have the following code
>>>>>>  // Create doc object
>>>>>>  writer.addDocument(doc);
>>>>>>
>>>>>>  IndexReader newIndexReader = reader.reopen() ;
>>>>>>  if ( newIndexReader != reader ) {
>>>>>>        reader.close() ;
>>>>>>  }
>>>>>>  reader = newIndexReader ;
>>>>>>  searcher.close() ;
>>>>>>  searcher = new IndexSearcher(reader );
>>>>>>         
>>>>>> So the issues that I face are 
>>>>>>
>>>>>> 1) The addition of new record is not reflected in the search ( even
>>>>>> though
>>>>>> I have reinited IndexSearcher )
>>>>>>
>>>>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>>>> close
>>>>>> and update the reference to IndexSearcher object. I could add a sync
>>>>>> block, but the bigger question would be that what is the ideal way to
>>>>>> achieve this case where I need to add and search record real-time ? 
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by ajay_garg <ga...@gmail.com>.
Thanks Mark.

Ok, I got your point. So it happens like this :

a) If it is me, who is re-opening an IndxReader, at any time, but
"manually-programmatically". That is, I don't want
a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.

b) If I do wish this automatic-reopening of index (using IndexAccessor),
then I am forced to rely on all the indexer threads releasing the reference
to IndexWriter, which by the way, as a developer, can never be sure of (that
is, I don't have any control, as to when exactly all the threads leave the
reference ).

Will be obliged if you could give a confirmation to my understanding.

Thanks
Ajay Garg

markrmiller wrote:
> 
> You are right that if auto-commit=true and a user reopens an 
> IndexReader, the docs will absolutely be visible as they are flushed. I 
> think the part you are missing is that you need to be cooperating with 
> the IndexAccessor: a user should not be reopening an IndexReader. The 
> whole point of IndexAccessor is to coordinate these things...when a 
> Writer is released, we know the index has changed, so that is when the 
> IndexReaders are reopened for you. Because the IndexWriter is cached and 
> shared by Threads, a thread might release the Writer while another is 
> still using it...that is why things are not reopened and the Writer not 
> closed until the last thread releases its reference to it. Essentially, 
> IndexAccessor control visibility by controlling how current the view of 
> the Readers is, by controlling their reopening -- a user should agree 
> not to reopen -- just like he must agree not to use a ReadingWriter to 
> delete.
> 
> If you want to just set an IndexWriter to indexing for eternity and then 
> have some Readers that you occasionally reopen, you don't need 
> IndexAccessor. Its purpose is to coordinate ReaderReaders, 
> WritingReaders, Searchers, and Writers for you. You are proposing to 
> coordinate them yourself. IndexAccess reopens Readers for you after a 
> Writer has been used, and enforces Lucene requirements, like a 
> WritingReader cannot be used at the same time as a Writer...etc.
> 
> Technically, IndexAccessor could reopen the readers every 2 
> seconds...and then you would see your changes...instead it only tries to 
> reopen them if a change has been made to the index...and it does not 
> want to get greedy if a Writer is batch loading, so it waits for you to 
> release the Writer. You can control how often the 'view' is updated by 
> releasing the Writer more often -- say every 50 docs. Write 50 docs, 
> release, get, write 50 docs.
> 
> - Mark
> 
> ajay_garg wrote:
>> @Mark.
>>
>> I am sorry, but I need a bit more of explanation. So you mean to say ::
>>
>> "If auto-commit is false, then of course, docs will not be visible in the
>> index, until all the threads release themselves out of a particular
>> IndexWriter instance, and close() the IndexWriter instance.
>> If auto-commit is true, even then the above holds true. In particular,
>> let's
>> say iI need an application 
>> with the following requirements ::
>>
>> a) There are multiple indexer threads indexing on a SINGLE indexwriter
>> instance with auto-commit true
>> b) Each thread 'flushes' according to a pre-defined criteria at some
>> point
>> of time.
>> c) The index should be updated immediately, that is, if any user re-opens
>> the IndexSearcher, then the 
>>     documents added till-that-snapshot-of-index must be visible. Note
>> that
>> the IndexWriter instance hasn't 
>>     been closed as yet, the indexer threads will be indexing till
>> eternity,
>> so that IndexWriter instance will 
>>     never be closed.
>>
>> So, you presume that building an application with the above requirements
>> is
>> impossible, even with auto-commit set to true. "
>>
>> ( If I sound ambiguous at any point, kindly forgive me for my lack of
>> language skills. I will try to explain better, if need arises ).
>>
>> Looking forward to a reply
>> Ajay Garg
>>
>> markrmiller wrote:
>>   
>>> You are correct that autocommit=false means that docs will be in the 
>>> index before the last thread releases its concurrent hold on a Writer, 
>>> *but because IndexAccessor controls* *when the IndexSearchers are 
>>> reopened*, those docs will still not be visible until the last thread 
>>> holding a Writer releases it...that is when the reopening of Searchers 
>>> occurs as well as when the Writer is closed.
>>>
>>> - Mark
>>>
>>> ajay_garg wrote:
>>>     
>>>> Hi. Sorry if I seem a stranger in this thread, but there is something
>>>> that I
>>>> can't resist clearing myself on.
>>>>
>>>> Mark, you say that the additional documents added to a index, won't
>>>> show
>>>> up
>>>> until the # of threads accessing the index hits 0; and subsequently the
>>>> indexwriter instance is closed.
>>>>
>>>> But I suppose that the autocommit=true, asserts that all flushed
>>>> (Added)
>>>> documents are immediately committed ( and hence visible ) in the index,
>>>> and
>>>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>>>> required.
>>>> ( Of course, re-opening an IndexSearcher instance is required ).
>>>>
>>>> Am I being dumb ?
>>>>
>>>> Looking eagerly for you to shed some light on my doubt.
>>>>
>>>> Thanks
>>>> Ajay Garg
>>>>
>>>>
>>>> codetester wrote:
>>>>   
>>>>       
>>>>> Hi All,
>>>>>
>>>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>>>> perform live searching and indexing. To achieve that, I tried the
>>>>> following
>>>>>
>>>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>>>> IndexReader reader = IndexReader.open(directory );
>>>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>>>> true); // <- I want to recreate the index every time
>>>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>>>
>>>>> For Searching, I have the following code
>>>>> QueryParser queryParser = new QueryParser("xyz", new
>>>>> StandardAnalyzer());
>>>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>>>
>>>>> And for adding records, I have the following code
>>>>>  // Create doc object
>>>>>  writer.addDocument(doc);
>>>>>
>>>>>  IndexReader newIndexReader = reader.reopen() ;
>>>>>  if ( newIndexReader != reader ) {
>>>>>        reader.close() ;
>>>>>  }
>>>>>  reader = newIndexReader ;
>>>>>  searcher.close() ;
>>>>>  searcher = new IndexSearcher(reader );
>>>>>         
>>>>> So the issues that I face are 
>>>>>
>>>>> 1) The addition of new record is not reflected in the search ( even
>>>>> though
>>>>> I have reinited IndexSearcher )
>>>>>
>>>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>>> close
>>>>> and update the reference to IndexSearcher object. I could add a sync
>>>>> block, but the bigger question would be that what is the ideal way to
>>>>> achieve this case where I need to add and search record real-time ? 
>>>>>
>>>>> Thanks !
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>         
>>>>   
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15288452.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
You are right that if auto-commit=true and a user reopens an 
IndexReader, the docs will absolutely be visible as they are flushed. I 
think the part you are missing is that you need to be cooperating with 
the IndexAccessor: a user should not be reopening an IndexReader. The 
whole point of IndexAccessor is to coordinate these things...when a 
Writer is released, we know the index has changed, so that is when the 
IndexReaders are reopened for you. Because the IndexWriter is cached and 
shared by Threads, a thread might release the Writer while another is 
still using it...that is why things are not reopened and the Writer not 
closed until the last thread releases its reference to it. Essentially, 
IndexAccessor control visibility by controlling how current the view of 
the Readers is, by controlling their reopening -- a user should agree 
not to reopen -- just like he must agree not to use a ReadingWriter to 
delete.

If you want to just set an IndexWriter to indexing for eternity and then 
have some Readers that you occasionally reopen, you don't need 
IndexAccessor. Its purpose is to coordinate ReaderReaders, 
WritingReaders, Searchers, and Writers for you. You are proposing to 
coordinate them yourself. IndexAccess reopens Readers for you after a 
Writer has been used, and enforces Lucene requirements, like a 
WritingReader cannot be used at the same time as a Writer...etc.

Technically, IndexAccessor could reopen the readers every 2 
seconds...and then you would see your changes...instead it only tries to 
reopen them if a change has been made to the index...and it does not 
want to get greedy if a Writer is batch loading, so it waits for you to 
release the Writer. You can control how often the 'view' is updated by 
releasing the Writer more often -- say every 50 docs. Write 50 docs, 
release, get, write 50 docs.

- Mark

ajay_garg wrote:
> @Mark.
>
> I am sorry, but I need a bit more of explanation. So you mean to say ::
>
> "If auto-commit is false, then of course, docs will not be visible in the
> index, until all the threads release themselves out of a particular
> IndexWriter instance, and close() the IndexWriter instance.
> If auto-commit is true, even then the above holds true. In particular, let's
> say iI need an application 
> with the following requirements ::
>
> a) There are multiple indexer threads indexing on a SINGLE indexwriter
> instance with auto-commit true
> b) Each thread 'flushes' according to a pre-defined criteria at some point
> of time.
> c) The index should be updated immediately, that is, if any user re-opens
> the IndexSearcher, then the 
>     documents added till-that-snapshot-of-index must be visible. Note that
> the IndexWriter instance hasn't 
>     been closed as yet, the indexer threads will be indexing till eternity,
> so that IndexWriter instance will 
>     never be closed.
>
> So, you presume that building an application with the above requirements is
> impossible, even with auto-commit set to true. "
>
> ( If I sound ambiguous at any point, kindly forgive me for my lack of
> language skills. I will try to explain better, if need arises ).
>
> Looking forward to a reply
> Ajay Garg
>
> markrmiller wrote:
>   
>> You are correct that autocommit=false means that docs will be in the 
>> index before the last thread releases its concurrent hold on a Writer, 
>> *but because IndexAccessor controls* *when the IndexSearchers are 
>> reopened*, those docs will still not be visible until the last thread 
>> holding a Writer releases it...that is when the reopening of Searchers 
>> occurs as well as when the Writer is closed.
>>
>> - Mark
>>
>> ajay_garg wrote:
>>     
>>> Hi. Sorry if I seem a stranger in this thread, but there is something
>>> that I
>>> can't resist clearing myself on.
>>>
>>> Mark, you say that the additional documents added to a index, won't show
>>> up
>>> until the # of threads accessing the index hits 0; and subsequently the
>>> indexwriter instance is closed.
>>>
>>> But I suppose that the autocommit=true, asserts that all flushed (Added)
>>> documents are immediately committed ( and hence visible ) in the index,
>>> and
>>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>>> required.
>>> ( Of course, re-opening an IndexSearcher instance is required ).
>>>
>>> Am I being dumb ?
>>>
>>> Looking eagerly for you to shed some light on my doubt.
>>>
>>> Thanks
>>> Ajay Garg
>>>
>>>
>>> codetester wrote:
>>>   
>>>       
>>>> Hi All,
>>>>
>>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>>> perform live searching and indexing. To achieve that, I tried the
>>>> following
>>>>
>>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>>> IndexReader reader = IndexReader.open(directory );
>>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>>> true); // <- I want to recreate the index every time
>>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>>
>>>> For Searching, I have the following code
>>>> QueryParser queryParser = new QueryParser("xyz", new
>>>> StandardAnalyzer());
>>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>>
>>>> And for adding records, I have the following code
>>>>  // Create doc object
>>>>  writer.addDocument(doc);
>>>>
>>>>  IndexReader newIndexReader = reader.reopen() ;
>>>>  if ( newIndexReader != reader ) {
>>>>        reader.close() ;
>>>>  }
>>>>  reader = newIndexReader ;
>>>>  searcher.close() ;
>>>>  searcher = new IndexSearcher(reader );
>>>>         
>>>> So the issues that I face are 
>>>>
>>>> 1) The addition of new record is not reflected in the search ( even
>>>> though
>>>> I have reinited IndexSearcher )
>>>>
>>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>> close
>>>> and update the reference to IndexSearcher object. I could add a sync
>>>> block, but the bigger question would be that what is the ideal way to
>>>> achieve this case where I need to add and search record real-time ? 
>>>>
>>>> Thanks !
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by ajay_garg <ga...@gmail.com>.
@Mark.

I am sorry, but I need a bit more of explanation. So you mean to say ::

"If auto-commit is false, then of course, docs will not be visible in the
index, until all the threads release themselves out of a particular
IndexWriter instance, and close() the IndexWriter instance.
If auto-commit is true, even then the above holds true. In particular, let's
say iI need an application 
with the following requirements ::

a) There are multiple indexer threads indexing on a SINGLE indexwriter
instance with auto-commit true
b) Each thread 'flushes' according to a pre-defined criteria at some point
of time.
c) The index should be updated immediately, that is, if any user re-opens
the IndexSearcher, then the 
    documents added till-that-snapshot-of-index must be visible. Note that
the IndexWriter instance hasn't 
    been closed as yet, the indexer threads will be indexing till eternity,
so that IndexWriter instance will 
    never be closed.

So, you presume that building an application with the above requirements is
impossible, even with auto-commit set to true. "

( If I sound ambiguous at any point, kindly forgive me for my lack of
language skills. I will try to explain better, if need arises ).

Looking forward to a reply
Ajay Garg

markrmiller wrote:
> 
> You are correct that autocommit=false means that docs will be in the 
> index before the last thread releases its concurrent hold on a Writer, 
> *but because IndexAccessor controls* *when the IndexSearchers are 
> reopened*, those docs will still not be visible until the last thread 
> holding a Writer releases it...that is when the reopening of Searchers 
> occurs as well as when the Writer is closed.
> 
> - Mark
> 
> ajay_garg wrote:
>> Hi. Sorry if I seem a stranger in this thread, but there is something
>> that I
>> can't resist clearing myself on.
>>
>> Mark, you say that the additional documents added to a index, won't show
>> up
>> until the # of threads accessing the index hits 0; and subsequently the
>> indexwriter instance is closed.
>>
>> But I suppose that the autocommit=true, asserts that all flushed (Added)
>> documents are immediately committed ( and hence visible ) in the index,
>> and
>> no explicit cclosing ( releasiing ) of the Indexwriter instance is
>> required.
>> ( Of course, re-opening an IndexSearcher instance is required ).
>>
>> Am I being dumb ?
>>
>> Looking eagerly for you to shed some light on my doubt.
>>
>> Thanks
>> Ajay Garg
>>
>>
>> codetester wrote:
>>   
>>> Hi All,
>>>
>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>> perform live searching and indexing. To achieve that, I tried the
>>> following
>>>
>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>> IndexReader reader = IndexReader.open(directory );
>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>> true); // <- I want to recreate the index every time
>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>
>>> For Searching, I have the following code
>>> QueryParser queryParser = new QueryParser("xyz", new
>>> StandardAnalyzer());
>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>
>>> And for adding records, I have the following code
>>>  // Create doc object
>>>  writer.addDocument(doc);
>>>
>>>  IndexReader newIndexReader = reader.reopen() ;
>>>  if ( newIndexReader != reader ) {
>>>        reader.close() ;
>>>  }
>>>  reader = newIndexReader ;
>>>  searcher.close() ;
>>>  searcher = new IndexSearcher(reader );
>>>         
>>> So the issues that I face are 
>>>
>>> 1) The addition of new record is not reflected in the search ( even
>>> though
>>> I have reinited IndexSearcher )
>>>
>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>> close
>>> and update the reference to IndexSearcher object. I could add a sync
>>> block, but the bigger question would be that what is the ideal way to
>>> achieve this case where I need to add and search record real-time ? 
>>>
>>> Thanks !
>>>
>>>
>>>
>>>
>>>
>>>     
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15262305.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
You are correct that autocommit=false means that docs will be in the 
index before the last thread releases its concurrent hold on a Writer, 
*but because IndexAccessor controls* *when the IndexSearchers are 
reopened*, those docs will still not be visible until the last thread 
holding a Writer releases it...that is when the reopening of Searchers 
occurs as well as when the Writer is closed.

- Mark

ajay_garg wrote:
> Hi. Sorry if I seem a stranger in this thread, but there is something that I
> can't resist clearing myself on.
>
> Mark, you say that the additional documents added to a index, won't show up
> until the # of threads accessing the index hits 0; and subsequently the
> indexwriter instance is closed.
>
> But I suppose that the autocommit=true, asserts that all flushed (Added)
> documents are immediately committed ( and hence visible ) in the index, and
> no explicit cclosing ( releasiing ) of the Indexwriter instance is required.
> ( Of course, re-opening an IndexSearcher instance is required ).
>
> Am I being dumb ?
>
> Looking eagerly for you to shed some light on my doubt.
>
> Thanks
> Ajay Garg
>
>
> codetester wrote:
>   
>> Hi All,
>>
>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>> perform live searching and indexing. To achieve that, I tried the
>> following
>>
>> FSDirectory directory = FSDirectory.getDirectory(location);
>> IndexReader reader = IndexReader.open(directory );
>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>> true); // <- I want to recreate the index every time
>> IndexSearcher searcher = new IndexSearcher( reader );
>>
>> For Searching, I have the following code
>> QueryParser queryParser = new QueryParser("xyz", new StandardAnalyzer());
>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>
>> And for adding records, I have the following code
>>  // Create doc object
>>  writer.addDocument(doc);
>>
>>  IndexReader newIndexReader = reader.reopen() ;
>>  if ( newIndexReader != reader ) {
>>        reader.close() ;
>>  }
>>  reader = newIndexReader ;
>>  searcher.close() ;
>>  searcher = new IndexSearcher(reader );
>>         
>> So the issues that I face are 
>>
>> 1) The addition of new record is not reflected in the search ( even though
>> I have reinited IndexSearcher )
>>
>> 2) Obviously, the add record code is not thread safe. I am trying to close
>> and update the reference to IndexSearcher object. I could add a sync
>> block, but the bigger question would be that what is the ideal way to
>> achieve this case where I need to add and search record real-time ? 
>>
>> Thanks !
>>
>>
>>
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by ajay_garg <ga...@gmail.com>.
Hi. Sorry if I seem a stranger in this thread, but there is something that I
can't resist clearing myself on.

Mark, you say that the additional documents added to a index, won't show up
until the # of threads accessing the index hits 0; and subsequently the
indexwriter instance is closed.

But I suppose that the autocommit=true, asserts that all flushed (Added)
documents are immediately committed ( and hence visible ) in the index, and
no explicit cclosing ( releasiing ) of the Indexwriter instance is required.
( Of course, re-opening an IndexSearcher instance is required ).

Am I being dumb ?

Looking eagerly for you to shed some light on my doubt.

Thanks
Ajay Garg


codetester wrote:
> 
> Hi All,
> 
> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
> perform live searching and indexing. To achieve that, I tried the
> following
> 
> FSDirectory directory = FSDirectory.getDirectory(location);
> IndexReader reader = IndexReader.open(directory );
> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
> true); // <- I want to recreate the index every time
> IndexSearcher searcher = new IndexSearcher( reader );
> 
> For Searching, I have the following code
> QueryParser queryParser = new QueryParser("xyz", new StandardAnalyzer());
> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
> 
> And for adding records, I have the following code
>  // Create doc object
>  writer.addDocument(doc);
> 
>  IndexReader newIndexReader = reader.reopen() ;
>  if ( newIndexReader != reader ) {
>        reader.close() ;
>  }
>  reader = newIndexReader ;
>  searcher.close() ;
>  searcher = new IndexSearcher(reader );
>         
> So the issues that I face are 
> 
> 1) The addition of new record is not reflected in the search ( even though
> I have reinited IndexSearcher )
> 
> 2) Obviously, the add record code is not thread safe. I am trying to close
> and update the reference to IndexSearcher object. I could add a sync
> block, but the bigger question would be that what is the ideal way to
> achieve this case where I need to add and search record real-time ? 
> 
> Thanks !
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Concurrent-Indexing-%2B-Searching-tp15234463p15255394.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
> 1) I should be calling release of writer and searcher after every call. Is
> it always mandatory in cases like searcher, when I am sure that I havn't
> written anything since the last search ?
>   
You have to be careful here. It works like this: a single searcher is 
cached and returned every time. Once all references to the cached Writer 
are returned, all of the cached Searchers (one per Similarity your 
using) are reopened -- but only after all the Searcher references are 
returned. So you must return the Searcher as soon as you are done with 
the search...otherwise when you return the last reference to the cached 
Writer it will wait around until you do return that Searcher. Use it and 
return it as quick as you can. The cost is very small, its just a 
reference count decrement to release. You do have to pay the sync cost, 
but thats the cost of sharing resources across threads. Test for speed 
if your worried...its beyond anything I have needed.

Be careful with the Writer -- you want to return it fairly often as 
well, but you will will want to batch load if you are adding a lot of 
docs at once. Get the Writer, add all the docs, release the Writer. But 
keep in mind that you won't see the added docs until the # of threads 
referencing the Writer hits 0 -- you might want to release it every 50 
docs or something (arbitrary there). If your just updating a doc or 
adding a doc randomly, get it, update/add, release it.

Always release the Writers and Searchers in a finally block to ensure 
they get released regardless of exceptions.
> 2) Based on 1), is it okay to cache the instance of writer and Searcher
> object locally ?
>   
I wouldn't, but you can. You will hold things up though...everything 
works based on them getting released. The IndexAccessor code properly 
caches them for you. That one of its main goals...properly caching 
Writers/Searchers and reopening Searchers when a Writer has made a 
change. If you hold a Searcher out, when a Writer is released by the 
last thread that had a reference to it, the thread that released the 
Writer will be hung up waiting around for that Searcher to get released. 
You wouldnt want this to be a long time.
> 3) Are there any plans to push these to the trunk? Also, are there any
> blocking/critical issues  before we can start using it in production ?
>   
Its doubtful. The original code has been around for years and has yet to 
see any trunk excitement. I think the commiters prefer to keep this type 
of thing out of the core and generally prefer Solr. I think that since 
many of the committers work on/use Solr, there hasn't been much 
incentive for them to use LuceneIndexAccessor. Who knows really though. 
I only know that I have no say in the matter <g>

No blocking or critical issues that I know of. This is based on work I 
did over a year ago (based on the original LuceneIndexAcessor code of 
course), and while its not the same code, I have been using that code at 
6 24/7 sites for about a year now on index sizes ranging from 200,000 to 
3 million article sized documents. I did this based on my experience 
with that.

This is the code that I plan to use for any future projects, so feel 
free to email me with any questions or suggestions. I have had a great 
experience with this model of operating an interactive, multi-threaded, 
Lucene index. I'll be on any bugs like white on rice <g> I am very 
confident in the code though. Feel free to extend the test classes if 
you are worried about anything in particular.

- Mark
>
> Thanks!
>
>
>
> On Feb 2, 2008 3:41 AM, Mark Miller <ma...@gmail.com> wrote:
>
>   
>> You are not seeing the doc because you need to close the IndexWriter
>> first.
>>
>> To have an interactive index you can:
>>
>> A: roll your own.
>> B: use Solr.
>> C: use the original LuceneIndexAccessor
>> https://issues.apache.org/jira/browse/LUCENE-390
>> D: use my updated IndexAccessor
>> https://issues.apache.org/jira/browse/LUCENE-1026
>>
>> I have actually just added the ability to warm searchers before putting
>> them into to use for option D, but i havn't gotten around to posting the
>> new code yet.
>>
>>
>> - Mark Miller
>>
>>
>>
>>
>> codetester wrote:
>>     
>>> Hi All,
>>>
>>> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
>>> perform live searching and indexing. To achieve that, I tried the
>>>       
>> following
>>     
>>> FSDirectory directory = FSDirectory.getDirectory(location);
>>> IndexReader reader = IndexReader.open(directory );
>>> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
>>> true); // <- I want to recreate the index every time
>>> IndexSearcher searcher = new IndexSearcher( reader );
>>>
>>> For Searching, I have the following code
>>> QueryParser queryParser = new QueryParser("xyz", new
>>>       
>> StandardAnalyzer());
>>     
>>> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>>>
>>> And for adding records, I have the following code
>>>  // Create doc object
>>>  writer.addDocument(doc);
>>>
>>>  IndexReader newIndexReader = reader.reopen() ;
>>>  if ( newIndexReader != reader ) {
>>>        reader.close() ;
>>>  }
>>>  reader = newIndexReader ;
>>>  searcher.close() ;
>>>  searcher = new IndexSearcher(reader );
>>>
>>> So the issues that I face are
>>>
>>> 1) The addition of new record is not reflected in the search ( even
>>>       
>> though I
>>     
>>> have reinited IndexSearcher )
>>>
>>> 2) Obviously, the add record code is not thread safe. I am trying to
>>>       
>> close
>>     
>>> and update the reference to IndexSearcher object. I could add a sync
>>>       
>> block,
>>     
>>> but the bigger question would be that what is the ideal way to achieve
>>>       
>> this
>>     
>>> case where I need to add and search record real-time ?
>>>
>>> Thanks !
>>>
>>>
>>>
>>>
>>>
>>>       
>>  ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concurrent Indexing + Searching

Posted by Infinite Tester <in...@gmail.com>.
Thanks Mark!

Option D looks great. Regarding that option,  I have couple of questions
based on my first glance at the code ( more specifically SimpleSearchServer
)

1) I should be calling release of writer and searcher after every call. Is
it always mandatory in cases like searcher, when I am sure that I havn't
written anything since the last search ?

2) Based on 1), is it okay to cache the instance of writer and Searcher
object locally ?

3) Are there any plans to push these to the trunk? Also, are there any
blocking/critical issues  before we can start using it in production ?


Thanks!



On Feb 2, 2008 3:41 AM, Mark Miller <ma...@gmail.com> wrote:

> You are not seeing the doc because you need to close the IndexWriter
> first.
>
> To have an interactive index you can:
>
> A: roll your own.
> B: use Solr.
> C: use the original LuceneIndexAccessor
> https://issues.apache.org/jira/browse/LUCENE-390
> D: use my updated IndexAccessor
> https://issues.apache.org/jira/browse/LUCENE-1026
>
> I have actually just added the ability to warm searchers before putting
> them into to use for option D, but i havn't gotten around to posting the
> new code yet.
>
>
> - Mark Miller
>
>
>
>
> codetester wrote:
> > Hi All,
> >
> > A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
> > perform live searching and indexing. To achieve that, I tried the
> following
> >
> > FSDirectory directory = FSDirectory.getDirectory(location);
> > IndexReader reader = IndexReader.open(directory );
> > IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
> > true); // <- I want to recreate the index every time
> > IndexSearcher searcher = new IndexSearcher( reader );
> >
> > For Searching, I have the following code
> > QueryParser queryParser = new QueryParser("xyz", new
> StandardAnalyzer());
> > Hits hits = searcher .search(queryParser.parse(displayName + "*"));
> >
> > And for adding records, I have the following code
> >  // Create doc object
> >  writer.addDocument(doc);
> >
> >  IndexReader newIndexReader = reader.reopen() ;
> >  if ( newIndexReader != reader ) {
> >        reader.close() ;
> >  }
> >  reader = newIndexReader ;
> >  searcher.close() ;
> >  searcher = new IndexSearcher(reader );
> >
> > So the issues that I face are
> >
> > 1) The addition of new record is not reflected in the search ( even
> though I
> > have reinited IndexSearcher )
> >
> > 2) Obviously, the add record code is not thread safe. I am trying to
> close
> > and update the reference to IndexSearcher object. I could add a sync
> block,
> > but the bigger question would be that what is the ideal way to achieve
> this
> > case where I need to add and search record real-time ?
> >
> > Thanks !
> >
> >
> >
> >
> >
>
>  ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Concurrent Indexing + Searching

Posted by Mark Miller <ma...@gmail.com>.
You are not seeing the doc because you need to close the IndexWriter first.

To have an interactive index you can:

A: roll your own.
B: use Solr.
C: use the original LuceneIndexAccessor 
https://issues.apache.org/jira/browse/LUCENE-390
D: use my updated IndexAccessor 
https://issues.apache.org/jira/browse/LUCENE-1026

I have actually just added the ability to warm searchers before putting 
them into to use for option D, but i havn't gotten around to posting the 
new code yet.


- Mark Miller




codetester wrote:
> Hi All,
>
> A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
> perform live searching and indexing. To achieve that, I tried the following
>
> FSDirectory directory = FSDirectory.getDirectory(location);
> IndexReader reader = IndexReader.open(directory );
> IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
> true); // <- I want to recreate the index every time
> IndexSearcher searcher = new IndexSearcher( reader );
>
> For Searching, I have the following code
> QueryParser queryParser = new QueryParser("xyz", new StandardAnalyzer());
> Hits hits = searcher .search(queryParser.parse(displayName + "*"));
>
> And for adding records, I have the following code
>  // Create doc object
>  writer.addDocument(doc);
>
>  IndexReader newIndexReader = reader.reopen() ;
>  if ( newIndexReader != reader ) {
>        reader.close() ;
>  }
>  reader = newIndexReader ;
>  searcher.close() ;
>  searcher = new IndexSearcher(reader );
>         
> So the issues that I face are 
>
> 1) The addition of new record is not reflected in the search ( even though I
> have reinited IndexSearcher )
>
> 2) Obviously, the add record code is not thread safe. I am trying to close
> and update the reference to IndexSearcher object. I could add a sync block,
> but the bigger question would be that what is the ideal way to achieve this
> case where I need to add and search record real-time ?
>
> Thanks !
>
>
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org