You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Jay Yu <yu...@AI.SRI.COM> on 2007/09/19 19:22:41 UTC

thread safe shared IndexSearcher

In a multithread app like web app, a shared IndexSearcher could throw a
AlreadyClosedException when another thread is trying to update the 
underlying IndexReader by closing the shared searcher after the index is 
updated. Searching over the past discussions on this mailing list, I 
found several approaches to solve the problem.
1. use solr
2. use DelayCloseIndexSearcher
3. use LuceneIndexAccessor


the first one is not feasible for us; some people seemed to have 
problems with No. 2 and I do not find a lot of discussions around No.3.

I wonder if anyone has good experience on No 2 and 3?
Or do I miss other better solutions?

Thanks for any suggestion/comment!

Jay

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

Agreed. Perhaps I will abandon the static init. I really only put it as 
an option due to your synchronized cost concerns (a preload allows non 
synched read only access to the indexaccessor cache). Due keep in mind 
that you don't have to use it though...if you dont preload, accessors 
are created on demand but require you to go through a synch block.

I have some ideas and I will be making an attempt to smooth this all out 
tonight. Thanks for your input.

- Mark

Jay Yu wrote:
> I agree with you on the compromise aspect of the design.
> In particular, I think it's hard to preload all the index accessors in 
> the static init while allowing users specify the analyzer for each dir 
> without requiring complicated config file ans using reflection.
> So a good compromise might be abandon preload the accessors. After 
> all, the accessors are cached and not created often.
>
> Thanks!
>
> Jay
>
>
> Mark Miller wrote:
>> I think its just a compromise in the design, though it could be 
>> improved. You only ever want a single Writer at a time on the index. 
>> Those two flags are really just hints for when a Writer is first 
>> opened...should it auto-commit and should it overwrite/create...if a 
>> thread tries to writer concurrently with another thread, they will 
>> briefly share a Writer, but generally a new Writer is created fairly 
>> often.
>>
>> The general strategy should be to pick constant values and always 
>> pass them. There is an opening for the issue that you have a Writer 
>> and are adding a doc, and then before releasing that Writer, another 
>> Writer from another thread tries to clear the index with a 
>> create=true, and it won't work. That's not a big concern though.
>>
>> So the problem really is that these params control what happens when 
>> a new writer is created, but your not guaranteed to be creating a 
>> Writer, it may be cached. You really should pass the same autocommit 
>> flag , though its not necessary. I am open to suggestions for a more 
>> coherent design, but functionally, it does work. I am also thinking 
>> about how to handle the Analyzer, and I think the solution (the need 
>> to init some indexaccessor params) might involve all these issues.
>>
>> - Mark
>>
>> Jay Yu wrote:
>>> Mark,
>>>
>>> Looking at your implementation of the DefaultIndexAccessor regarding 
>>> the writer, I think there could be a problem: you have only one 
>>> cached writer but the getWriter(boolean, boolean) allows 2 booleans, 
>>> so ideally, you need 4 cached writer. Otherwise if one starts with a 
>>> writer that over writes the existing index, then later he cannot 
>>> append docs to the index.
>>> Do I miss sth here or you have not finished the implementation of 
>>> getWriter yet?
>>>
>>> Thanks!
>>>
>>> Jay
>>>
>>> Mark Miller wrote:
>>>> Ah, thanks for catching that. One of the pieces I did not 
>>>> finish...the keyword analyzer was placeholder code.
>>>>
>>>> I will take your comments into account and update the code.
>>>>
>>>> I have some other pieces to polish as well. Previously, I extended 
>>>> and built upon the original code, but I can't give it away, so this 
>>>> is my attempt at something lessor, but cleaner.
>>>>
>>>> Jay Yu wrote:
>>>>> Thanks for the tip.
>>>>> One small improvement on the IndexAccessorFactory might be to 
>>>>> allow user to specify the Analyzer instead of using a default 
>>>>> KeywordAnalyzer, which of course will make your static init of the 
>>>>> cached accessors difficult unless you add more interfaces to the 
>>>>> accessor to allow reset analyzer/Dir as in my own version.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Jay
>>>>>
>>>>> Mark Miller wrote:
>>>>>> One final note....if you are using the IndexAccessor and you are 
>>>>>> only accessing the index from one JVM, you can use the 
>>>>>> NoLockFactory and save some sync cost there.
>>>>>>
>>>>>> Jay Yu wrote:
>>>>>>> Mark,
>>>>>>>
>>>>>>> Great effort getting the original lucene index accessor package 
>>>>>>> in this shape. I am sure this will benefit a lot of people using 
>>>>>>> Lucene in a multithread env.
>>>>>>> I have a quick question to ask you:
>>>>>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>>>>>> accessor?
>>>>>>>
>>>>>>> I will take a look at your codes to see if I could help. I used 
>>>>>>> a slightly modified version of the original package in my 
>>>>>>> project but it breaks some of my tests. I hope your version 
>>>>>>> works better.
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>> I have sat down and rewrote IndexAccessor from scratch. I 
>>>>>>>> copied in the same reference counting logic, pruned some 
>>>>>>>> things, and tried to make the whole package a bit simpler to 
>>>>>>>> use. I have a few things to do, but its pretty solid already. 
>>>>>>>> The only major thing I'd still like to do is add an option to 
>>>>>>>> warm searchers before putting them in the Searcher cache. Id 
>>>>>>>> like to writer some more tests as well. Any help greatly 
>>>>>>>> appreciated if your interested in using the thing.
>>>>>>>>
>>>>>>>>
>>>>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is a an example of a class that can be instantiated in one 
>>>>>>>> of multiple threads and read /modify a single index without 
>>>>>>>> worrying about what any
>>>>>>>> of the other threads are doing to the index at any given time. 
>>>>>>>> This is a very simple example of how to use the IndexAccessor 
>>>>>>>> and not necessarily an
>>>>>>>> example of best practices. The main idea is that you get your 
>>>>>>>> Writer, Searcher, or Reader, and then be sure to release it as 
>>>>>>>> soon as your done with it
>>>>>>>> in a finally block. For loading, you will want to load many 
>>>>>>>> docs with a Writer (batch them) before releasing it, but 
>>>>>>>> remember that Readers will not get a new view
>>>>>>>> of the index until you release all of the Writers. So beware 
>>>>>>>> hogging a Writer unless you thats what your intending.
>>>>>>>>
>>>>>>>> JavaDoc:
>>>>>>>> http://myhardshadow.com/indexaccessorapi/
>>>>>>>>
>>>>>>>> Code:
>>>>>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>>>>>
>>>>>>>> Jar:
>>>>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>>>>>
>>>>>>>>
>>>>>>>> Your synchronized block concerns:
>>>>>>>>
>>>>>>>> The synchronized blocks that control accesss to the 
>>>>>>>> IndexAccessor do not have a huge impact on performance. Keep in 
>>>>>>>> mind that all of the work is not done in a synchonrized block, 
>>>>>>>> just the retrieval of the Searcher, Writer, Reader. Even if the 
>>>>>>>> synchronization makes the method twice as expensive, it is 
>>>>>>>> still overpowered by the cost of parsing queries and searching 
>>>>>>>> the index. This applies with or without contention. I wrote a 
>>>>>>>> simple test and included the output below. You might use the 
>>>>>>>> IBM Lock Analyzer for Java to further analyze these costs. 
>>>>>>>> Trust me, this thing is speedy. Its many times better than 
>>>>>>>> using IndexModifier.
>>>>>>>>
>>>>>>>> Without Contention
>>>>>>>> Just retrieve and release Searcher 100000 times
>>>>>>>> ----
>>>>>>>> avg time:6.3E-4 ms
>>>>>>>> total time:63 ms
>>>>>>>>
>>>>>>>> Parse query and search on 1 doc 100000 times
>>>>>>>> ----
>>>>>>>> avg time:0.03107 ms
>>>>>>>> total time:3107 ms
>>>>>>>>
>>>>>>>>
>>>>>>>> With Contention (40 other threads running 80000 searches)
>>>>>>>> Just retrieve and release Searcher 100000 times
>>>>>>>> ----
>>>>>>>> avg time:0.04643 ms
>>>>>>>> total time:4643 ms
>>>>>>>>
>>>>>>>> Parse query and search on 1 doc 100000 times
>>>>>>>> ----
>>>>>>>> avg time:0.64337 ms
>>>>>>>> total time:64337 ms
>>>>>>>>
>>>>>>>>
>>>>>>>> - Mark
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

I agree with you on the compromise aspect of the design.
In particular, I think it's hard to preload all the index accessors in 
the static init while allowing users specify the analyzer for each dir 
without requiring complicated config file ans using reflection.
So a good compromise might be abandon preload the accessors. After all, 
the accessors are cached and not created often.

Thanks!

Jay


Mark Miller wrote:
> I think its just a compromise in the design, though it could be 
> improved. You only ever want a single Writer at a time on the index. 
> Those two flags are really just hints for when a Writer is first 
> opened...should it auto-commit and should it overwrite/create...if a 
> thread tries to writer concurrently with another thread, they will 
> briefly share a Writer, but generally a new Writer is created fairly often.
> 
> The general strategy should be to pick constant values and always pass 
> them. There is an opening for the issue that you have a Writer and are 
> adding a doc, and then before releasing that Writer, another Writer from 
> another thread tries to clear the index with a create=true, and it won't 
> work. That's not a big concern though.
> 
> So the problem really is that these params control what happens when a 
> new writer is created, but your not guaranteed to be creating a Writer, 
> it may be cached. You really should pass the same autocommit flag , 
> though its not necessary. I am open to suggestions for a more coherent 
> design, but functionally, it does work. I am also thinking about how to 
> handle the Analyzer, and I think the solution (the need to init some 
> indexaccessor params) might involve all these issues.
> 
> - Mark
> 
> Jay Yu wrote:
>> Mark,
>>
>> Looking at your implementation of the DefaultIndexAccessor regarding 
>> the writer, I think there could be a problem: you have only one cached 
>> writer but the getWriter(boolean, boolean) allows 2 booleans, so 
>> ideally, you need 4 cached writer. Otherwise if one starts with a 
>> writer that over writes the existing index, then later he cannot 
>> append docs to the index.
>> Do I miss sth here or you have not finished the implementation of 
>> getWriter yet?
>>
>> Thanks!
>>
>> Jay
>>
>> Mark Miller wrote:
>>> Ah, thanks for catching that. One of the pieces I did not 
>>> finish...the keyword analyzer was placeholder code.
>>>
>>> I will take your comments into account and update the code.
>>>
>>> I have some other pieces to polish as well. Previously, I extended 
>>> and built upon the original code, but I can't give it away, so this 
>>> is my attempt at something lessor, but cleaner.
>>>
>>> Jay Yu wrote:
>>>> Thanks for the tip.
>>>> One small improvement on the IndexAccessorFactory might be to allow 
>>>> user to specify the Analyzer instead of using a default 
>>>> KeywordAnalyzer, which of course will make your static init of the 
>>>> cached accessors difficult unless you add more interfaces to the 
>>>> accessor to allow reset analyzer/Dir as in my own version.
>>>>
>>>>
>>>>
>>>>
>>>> Jay
>>>>
>>>> Mark Miller wrote:
>>>>> One final note....if you are using the IndexAccessor and you are 
>>>>> only accessing the index from one JVM, you can use the 
>>>>> NoLockFactory and save some sync cost there.
>>>>>
>>>>> Jay Yu wrote:
>>>>>> Mark,
>>>>>>
>>>>>> Great effort getting the original lucene index accessor package in 
>>>>>> this shape. I am sure this will benefit a lot of people using 
>>>>>> Lucene in a multithread env.
>>>>>> I have a quick question to ask you:
>>>>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>>>>> accessor?
>>>>>>
>>>>>> I will take a look at your codes to see if I could help. I used a 
>>>>>> slightly modified version of the original package in my project 
>>>>>> but it breaks some of my tests. I hope your version works better.
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>> Mark Miller wrote:
>>>>>>> I have sat down and rewrote IndexAccessor from scratch. I copied 
>>>>>>> in the same reference counting logic, pruned some things, and 
>>>>>>> tried to make the whole package a bit simpler to use. I have a 
>>>>>>> few things to do, but its pretty solid already. The only major 
>>>>>>> thing I'd still like to do is add an option to warm searchers 
>>>>>>> before putting them in the Searcher cache. Id like to writer some 
>>>>>>> more tests as well. Any help greatly appreciated if your 
>>>>>>> interested in using the thing.
>>>>>>>
>>>>>>>
>>>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>>>>>
>>>>>>>
>>>>>>> Here is a an example of a class that can be instantiated in one 
>>>>>>> of multiple threads and read /modify a single index without 
>>>>>>> worrying about what any
>>>>>>> of the other threads are doing to the index at any given time. 
>>>>>>> This is a very simple example of how to use the IndexAccessor and 
>>>>>>> not necessarily an
>>>>>>> example of best practices. The main idea is that you get your 
>>>>>>> Writer, Searcher, or Reader, and then be sure to release it as 
>>>>>>> soon as your done with it
>>>>>>> in a finally block. For loading, you will want to load many docs 
>>>>>>> with a Writer (batch them) before releasing it, but remember that 
>>>>>>> Readers will not get a new view
>>>>>>> of the index until you release all of the Writers. So beware 
>>>>>>> hogging a Writer unless you thats what your intending.
>>>>>>>
>>>>>>> JavaDoc:
>>>>>>> http://myhardshadow.com/indexaccessorapi/
>>>>>>>
>>>>>>> Code:
>>>>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>>>>
>>>>>>> Jar:
>>>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>>>>
>>>>>>>
>>>>>>> Your synchronized block concerns:
>>>>>>>
>>>>>>> The synchronized blocks that control accesss to the IndexAccessor 
>>>>>>> do not have a huge impact on performance. Keep in mind that all 
>>>>>>> of the work is not done in a synchonrized block, just the 
>>>>>>> retrieval of the Searcher, Writer, Reader. Even if the 
>>>>>>> synchronization makes the method twice as expensive, it is still 
>>>>>>> overpowered by the cost of parsing queries and searching the 
>>>>>>> index. This applies with or without contention. I wrote a simple 
>>>>>>> test and included the output below. You might use the IBM Lock 
>>>>>>> Analyzer for Java to further analyze these costs. Trust me, this 
>>>>>>> thing is speedy. Its many times better than using IndexModifier.
>>>>>>>
>>>>>>> Without Contention
>>>>>>> Just retrieve and release Searcher 100000 times
>>>>>>> ----
>>>>>>> avg time:6.3E-4 ms
>>>>>>> total time:63 ms
>>>>>>>
>>>>>>> Parse query and search on 1 doc 100000 times
>>>>>>> ----
>>>>>>> avg time:0.03107 ms
>>>>>>> total time:3107 ms
>>>>>>>
>>>>>>>
>>>>>>> With Contention (40 other threads running 80000 searches)
>>>>>>> Just retrieve and release Searcher 100000 times
>>>>>>> ----
>>>>>>> avg time:0.04643 ms
>>>>>>> total time:4643 ms
>>>>>>>
>>>>>>> Parse query and search on 1 doc 100000 times
>>>>>>> ----
>>>>>>> avg time:0.64337 ms
>>>>>>> total time:64337 ms
>>>>>>>
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

I think its just a compromise in the design, though it could be 
improved. You only ever want a single Writer at a time on the index. 
Those two flags are really just hints for when a Writer is first 
opened...should it auto-commit and should it overwrite/create...if a 
thread tries to writer concurrently with another thread, they will 
briefly share a Writer, but generally a new Writer is created fairly often.

The general strategy should be to pick constant values and always pass 
them. There is an opening for the issue that you have a Writer and are 
adding a doc, and then before releasing that Writer, another Writer from 
another thread tries to clear the index with a create=true, and it won't 
work. That's not a big concern though.

So the problem really is that these params control what happens when a 
new writer is created, but your not guaranteed to be creating a Writer, 
it may be cached. You really should pass the same autocommit flag , 
though its not necessary. I am open to suggestions for a more coherent 
design, but functionally, it does work. I am also thinking about how to 
handle the Analyzer, and I think the solution (the need to init some 
indexaccessor params) might involve all these issues.

- Mark

Jay Yu wrote:
> Mark,
>
> Looking at your implementation of the DefaultIndexAccessor regarding 
> the writer, I think there could be a problem: you have only one cached 
> writer but the getWriter(boolean, boolean) allows 2 booleans, so 
> ideally, you need 4 cached writer. Otherwise if one starts with a 
> writer that over writes the existing index, then later he cannot 
> append docs to the index.
> Do I miss sth here or you have not finished the implementation of 
> getWriter yet?
>
> Thanks!
>
> Jay
>
> Mark Miller wrote:
>> Ah, thanks for catching that. One of the pieces I did not 
>> finish...the keyword analyzer was placeholder code.
>>
>> I will take your comments into account and update the code.
>>
>> I have some other pieces to polish as well. Previously, I extended 
>> and built upon the original code, but I can't give it away, so this 
>> is my attempt at something lessor, but cleaner.
>>
>> Jay Yu wrote:
>>> Thanks for the tip.
>>> One small improvement on the IndexAccessorFactory might be to allow 
>>> user to specify the Analyzer instead of using a default 
>>> KeywordAnalyzer, which of course will make your static init of the 
>>> cached accessors difficult unless you add more interfaces to the 
>>> accessor to allow reset analyzer/Dir as in my own version.
>>>
>>>
>>>
>>>
>>> Jay
>>>
>>> Mark Miller wrote:
>>>> One final note....if you are using the IndexAccessor and you are 
>>>> only accessing the index from one JVM, you can use the 
>>>> NoLockFactory and save some sync cost there.
>>>>
>>>> Jay Yu wrote:
>>>>> Mark,
>>>>>
>>>>> Great effort getting the original lucene index accessor package in 
>>>>> this shape. I am sure this will benefit a lot of people using 
>>>>> Lucene in a multithread env.
>>>>> I have a quick question to ask you:
>>>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>>>> accessor?
>>>>>
>>>>> I will take a look at your codes to see if I could help. I used a 
>>>>> slightly modified version of the original package in my project 
>>>>> but it breaks some of my tests. I hope your version works better.
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Jay
>>>>>
>>>>>
>>>>> Mark Miller wrote:
>>>>>> I have sat down and rewrote IndexAccessor from scratch. I copied 
>>>>>> in the same reference counting logic, pruned some things, and 
>>>>>> tried to make the whole package a bit simpler to use. I have a 
>>>>>> few things to do, but its pretty solid already. The only major 
>>>>>> thing I'd still like to do is add an option to warm searchers 
>>>>>> before putting them in the Searcher cache. Id like to writer some 
>>>>>> more tests as well. Any help greatly appreciated if your 
>>>>>> interested in using the thing.
>>>>>>
>>>>>>
>>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>>>>
>>>>>>
>>>>>> Here is a an example of a class that can be instantiated in one 
>>>>>> of multiple threads and read /modify a single index without 
>>>>>> worrying about what any
>>>>>> of the other threads are doing to the index at any given time. 
>>>>>> This is a very simple example of how to use the IndexAccessor and 
>>>>>> not necessarily an
>>>>>> example of best practices. The main idea is that you get your 
>>>>>> Writer, Searcher, or Reader, and then be sure to release it as 
>>>>>> soon as your done with it
>>>>>> in a finally block. For loading, you will want to load many docs 
>>>>>> with a Writer (batch them) before releasing it, but remember that 
>>>>>> Readers will not get a new view
>>>>>> of the index until you release all of the Writers. So beware 
>>>>>> hogging a Writer unless you thats what your intending.
>>>>>>
>>>>>> JavaDoc:
>>>>>> http://myhardshadow.com/indexaccessorapi/
>>>>>>
>>>>>> Code:
>>>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>>>
>>>>>> Jar:
>>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>>>
>>>>>>
>>>>>> Your synchronized block concerns:
>>>>>>
>>>>>> The synchronized blocks that control accesss to the IndexAccessor 
>>>>>> do not have a huge impact on performance. Keep in mind that all 
>>>>>> of the work is not done in a synchonrized block, just the 
>>>>>> retrieval of the Searcher, Writer, Reader. Even if the 
>>>>>> synchronization makes the method twice as expensive, it is still 
>>>>>> overpowered by the cost of parsing queries and searching the 
>>>>>> index. This applies with or without contention. I wrote a simple 
>>>>>> test and included the output below. You might use the IBM Lock 
>>>>>> Analyzer for Java to further analyze these costs. Trust me, this 
>>>>>> thing is speedy. Its many times better than using IndexModifier.
>>>>>>
>>>>>> Without Contention
>>>>>> Just retrieve and release Searcher 100000 times
>>>>>> ----
>>>>>> avg time:6.3E-4 ms
>>>>>> total time:63 ms
>>>>>>
>>>>>> Parse query and search on 1 doc 100000 times
>>>>>> ----
>>>>>> avg time:0.03107 ms
>>>>>> total time:3107 ms
>>>>>>
>>>>>>
>>>>>> With Contention (40 other threads running 80000 searches)
>>>>>> Just retrieve and release Searcher 100000 times
>>>>>> ----
>>>>>> avg time:0.04643 ms
>>>>>> total time:4643 ms
>>>>>>
>>>>>> Parse query and search on 1 doc 100000 times
>>>>>> ----
>>>>>> avg time:0.64337 ms
>>>>>> total time:64337 ms
>>>>>>
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Mark,

Looking at your implementation of the DefaultIndexAccessor regarding the 
writer, I think there could be a problem: you have only one cached 
writer but the getWriter(boolean, boolean) allows 2 booleans, so 
ideally, you need 4 cached writer. Otherwise if one starts with a writer 
that over writes the existing index, then later he cannot append docs to 
the index.
Do I miss sth here or you have not finished the implementation of 
getWriter yet?

Thanks!

Jay

Mark Miller wrote:
> Ah, thanks for catching that. One of the pieces I did not finish...the 
> keyword analyzer was placeholder code.
> 
> I will take your comments into account and update the code.
> 
> I have some other pieces to polish as well. Previously, I extended and 
> built upon the original code, but I can't give it away, so this is my 
> attempt at something lessor, but cleaner.
> 
> Jay Yu wrote:
>> Thanks for the tip.
>> One small improvement on the IndexAccessorFactory might be to allow 
>> user to specify the Analyzer instead of using a default 
>> KeywordAnalyzer, which of course will make your static init of the 
>> cached accessors difficult unless you add more interfaces to the 
>> accessor to allow reset analyzer/Dir as in my own version.
>>
>>
>>
>>
>> Jay
>>
>> Mark Miller wrote:
>>> One final note....if you are using the IndexAccessor and you are only 
>>> accessing the index from one JVM, you can use the NoLockFactory and 
>>> save some sync cost there.
>>>
>>> Jay Yu wrote:
>>>> Mark,
>>>>
>>>> Great effort getting the original lucene index accessor package in 
>>>> this shape. I am sure this will benefit a lot of people using Lucene 
>>>> in a multithread env.
>>>> I have a quick question to ask you:
>>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>>> accessor?
>>>>
>>>> I will take a look at your codes to see if I could help. I used a 
>>>> slightly modified version of the original package in my project but 
>>>> it breaks some of my tests. I hope your version works better.
>>>>
>>>> Thanks a lot!
>>>>
>>>> Jay
>>>>
>>>>
>>>> Mark Miller wrote:
>>>>> I have sat down and rewrote IndexAccessor from scratch. I copied in 
>>>>> the same reference counting logic, pruned some things, and tried to 
>>>>> make the whole package a bit simpler to use. I have a few things to 
>>>>> do, but its pretty solid already. The only major thing I'd still 
>>>>> like to do is add an option to warm searchers before putting them 
>>>>> in the Searcher cache. Id like to writer some more tests as well. 
>>>>> Any help greatly appreciated if your interested in using the thing.
>>>>>
>>>>>
>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>>>
>>>>>
>>>>> Here is a an example of a class that can be instantiated in one of 
>>>>> multiple threads and read /modify a single index without worrying 
>>>>> about what any
>>>>> of the other threads are doing to the index at any given time. This 
>>>>> is a very simple example of how to use the IndexAccessor and not 
>>>>> necessarily an
>>>>> example of best practices. The main idea is that you get your 
>>>>> Writer, Searcher, or Reader, and then be sure to release it as soon 
>>>>> as your done with it
>>>>> in a finally block. For loading, you will want to load many docs 
>>>>> with a Writer (batch them) before releasing it, but remember that 
>>>>> Readers will not get a new view
>>>>> of the index until you release all of the Writers. So beware 
>>>>> hogging a Writer unless you thats what your intending.
>>>>>
>>>>> JavaDoc:
>>>>> http://myhardshadow.com/indexaccessorapi/
>>>>>
>>>>> Code:
>>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>>
>>>>> Jar:
>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>>
>>>>>
>>>>> Your synchronized block concerns:
>>>>>
>>>>> The synchronized blocks that control accesss to the IndexAccessor 
>>>>> do not have a huge impact on performance. Keep in mind that all of 
>>>>> the work is not done in a synchonrized block, just the retrieval of 
>>>>> the Searcher, Writer, Reader. Even if the synchronization makes the 
>>>>> method twice as expensive, it is still overpowered by the cost of 
>>>>> parsing queries and searching the index. This applies with or 
>>>>> without contention. I wrote a simple test and included the output 
>>>>> below. You might use the IBM Lock Analyzer for Java to further 
>>>>> analyze these costs. Trust me, this thing is speedy. Its many times 
>>>>> better than using IndexModifier.
>>>>>
>>>>> Without Contention
>>>>> Just retrieve and release Searcher 100000 times
>>>>> ----
>>>>> avg time:6.3E-4 ms
>>>>> total time:63 ms
>>>>>
>>>>> Parse query and search on 1 doc 100000 times
>>>>> ----
>>>>> avg time:0.03107 ms
>>>>> total time:3107 ms
>>>>>
>>>>>
>>>>> With Contention (40 other threads running 80000 searches)
>>>>> Just retrieve and release Searcher 100000 times
>>>>> ----
>>>>> avg time:0.04643 ms
>>>>> total time:4643 ms
>>>>>
>>>>> Parse query and search on 1 doc 100000 times
>>>>> ----
>>>>> avg time:0.64337 ms
>>>>> total time:64337 ms
>>>>>
>>>>>
>>>>> - Mark
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

Ah, thanks for catching that. One of the pieces I did not finish...the 
keyword analyzer was placeholder code.

I will take your comments into account and update the code.

I have some other pieces to polish as well. Previously, I extended and 
built upon the original code, but I can't give it away, so this is my 
attempt at something lessor, but cleaner.

Jay Yu wrote:
> Thanks for the tip.
> One small improvement on the IndexAccessorFactory might be to allow 
> user to specify the Analyzer instead of using a default 
> KeywordAnalyzer, which of course will make your static init of the 
> cached accessors difficult unless you add more interfaces to the 
> accessor to allow reset analyzer/Dir as in my own version.
>
>
>
>
> Jay
>
> Mark Miller wrote:
>> One final note....if you are using the IndexAccessor and you are only 
>> accessing the index from one JVM, you can use the NoLockFactory and 
>> save some sync cost there.
>>
>> Jay Yu wrote:
>>> Mark,
>>>
>>> Great effort getting the original lucene index accessor package in 
>>> this shape. I am sure this will benefit a lot of people using Lucene 
>>> in a multithread env.
>>> I have a quick question to ask you:
>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>> accessor?
>>>
>>> I will take a look at your codes to see if I could help. I used a 
>>> slightly modified version of the original package in my project but 
>>> it breaks some of my tests. I hope your version works better.
>>>
>>> Thanks a lot!
>>>
>>> Jay
>>>
>>>
>>> Mark Miller wrote:
>>>> I have sat down and rewrote IndexAccessor from scratch. I copied in 
>>>> the same reference counting logic, pruned some things, and tried to 
>>>> make the whole package a bit simpler to use. I have a few things to 
>>>> do, but its pretty solid already. The only major thing I'd still 
>>>> like to do is add an option to warm searchers before putting them 
>>>> in the Searcher cache. Id like to writer some more tests as well. 
>>>> Any help greatly appreciated if your interested in using the thing.
>>>>
>>>>
>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>>
>>>>
>>>> Here is a an example of a class that can be instantiated in one of 
>>>> multiple threads and read /modify a single index without worrying 
>>>> about what any
>>>> of the other threads are doing to the index at any given time. This 
>>>> is a very simple example of how to use the IndexAccessor and not 
>>>> necessarily an
>>>> example of best practices. The main idea is that you get your 
>>>> Writer, Searcher, or Reader, and then be sure to release it as soon 
>>>> as your done with it
>>>> in a finally block. For loading, you will want to load many docs 
>>>> with a Writer (batch them) before releasing it, but remember that 
>>>> Readers will not get a new view
>>>> of the index until you release all of the Writers. So beware 
>>>> hogging a Writer unless you thats what your intending.
>>>>
>>>> JavaDoc:
>>>> http://myhardshadow.com/indexaccessorapi/
>>>>
>>>> Code:
>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>
>>>> Jar:
>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>
>>>>
>>>> Your synchronized block concerns:
>>>>
>>>> The synchronized blocks that control accesss to the IndexAccessor 
>>>> do not have a huge impact on performance. Keep in mind that all of 
>>>> the work is not done in a synchonrized block, just the retrieval of 
>>>> the Searcher, Writer, Reader. Even if the synchronization makes the 
>>>> method twice as expensive, it is still overpowered by the cost of 
>>>> parsing queries and searching the index. This applies with or 
>>>> without contention. I wrote a simple test and included the output 
>>>> below. You might use the IBM Lock Analyzer for Java to further 
>>>> analyze these costs. Trust me, this thing is speedy. Its many times 
>>>> better than using IndexModifier.
>>>>
>>>> Without Contention
>>>> Just retrieve and release Searcher 100000 times
>>>> ----
>>>> avg time:6.3E-4 ms
>>>> total time:63 ms
>>>>
>>>> Parse query and search on 1 doc 100000 times
>>>> ----
>>>> avg time:0.03107 ms
>>>> total time:3107 ms
>>>>
>>>>
>>>> With Contention (40 other threads running 80000 searches)
>>>> Just retrieve and release Searcher 100000 times
>>>> ----
>>>> avg time:0.04643 ms
>>>> total time:4643 ms
>>>>
>>>> Parse query and search on 1 doc 100000 times
>>>> ----
>>>> avg time:0.64337 ms
>>>> total time:64337 ms
>>>>
>>>>
>>>> - Mark
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Thanks for the tip.
One small improvement on the IndexAccessorFactory might be to allow user 
to specify the Analyzer instead of using a default KeywordAnalyzer, 
which of course will make your static init of the cached accessors 
difficult unless you add more interfaces to the accessor to allow reset 
analyzer/Dir as in my own version.




Jay

Mark Miller wrote:
> One final note....if you are using the IndexAccessor and you are only 
> accessing the index from one JVM, you can use the NoLockFactory and save 
> some sync cost there.
> 
> Jay Yu wrote:
>> Mark,
>>
>> Great effort getting the original lucene index accessor package in 
>> this shape. I am sure this will benefit a lot of people using Lucene 
>> in a multithread env.
>> I have a quick question to ask you:
>> Do you have to use the core Lucene 2.3-dev in order to use the accessor?
>>
>> I will take a look at your codes to see if I could help. I used a 
>> slightly modified version of the original package in my project but it 
>> breaks some of my tests. I hope your version works better.
>>
>> Thanks a lot!
>>
>> Jay
>>
>>
>> Mark Miller wrote:
>>> I have sat down and rewrote IndexAccessor from scratch. I copied in 
>>> the same reference counting logic, pruned some things, and tried to 
>>> make the whole package a bit simpler to use. I have a few things to 
>>> do, but its pretty solid already. The only major thing I'd still like 
>>> to do is add an option to warm searchers before putting them in the 
>>> Searcher cache. Id like to writer some more tests as well. Any help 
>>> greatly appreciated if your interested in using the thing.
>>>
>>>
>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>>
>>>
>>> Here is a an example of a class that can be instantiated in one of 
>>> multiple threads and read /modify a single index without worrying 
>>> about what any
>>> of the other threads are doing to the index at any given time. This 
>>> is a very simple example of how to use the IndexAccessor and not 
>>> necessarily an
>>> example of best practices. The main idea is that you get your Writer, 
>>> Searcher, or Reader, and then be sure to release it as soon as your 
>>> done with it
>>> in a finally block. For loading, you will want to load many docs with 
>>> a Writer (batch them) before releasing it, but remember that Readers 
>>> will not get a new view
>>> of the index until you release all of the Writers. So beware hogging 
>>> a Writer unless you thats what your intending.
>>>
>>> JavaDoc:
>>> http://myhardshadow.com/indexaccessorapi/
>>>
>>> Code:
>>> http://myhardshadow.com/indexaccessor/trunk/
>>>
>>> Jar:
>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>
>>>
>>> Your synchronized block concerns:
>>>
>>> The synchronized blocks that control accesss to the IndexAccessor do 
>>> not have a huge impact on performance. Keep in mind that all of the 
>>> work is not done in a synchonrized block, just the retrieval of the 
>>> Searcher, Writer, Reader. Even if the synchronization makes the 
>>> method twice as expensive, it is still overpowered by the cost of 
>>> parsing queries and searching the index. This applies with or without 
>>> contention. I wrote a simple test and included the output below. You 
>>> might use the IBM Lock Analyzer for Java to further analyze these 
>>> costs. Trust me, this thing is speedy. Its many times better than 
>>> using IndexModifier.
>>>
>>> Without Contention
>>> Just retrieve and release Searcher 100000 times
>>> ----
>>> avg time:6.3E-4 ms
>>> total time:63 ms
>>>
>>> Parse query and search on 1 doc 100000 times
>>> ----
>>> avg time:0.03107 ms
>>> total time:3107 ms
>>>
>>>
>>> With Contention (40 other threads running 80000 searches)
>>> Just retrieve and release Searcher 100000 times
>>> ----
>>> avg time:0.04643 ms
>>> total time:4643 ms
>>>
>>> Parse query and search on 1 doc 100000 times
>>> ----
>>> avg time:0.64337 ms
>>> total time:64337 ms
>>>
>>>
>>> - Mark
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

One final note....if you are using the IndexAccessor and you are only 
accessing the index from one JVM, you can use the NoLockFactory and save 
some sync cost there.

Jay Yu wrote:
> Mark,
>
> Great effort getting the original lucene index accessor package in 
> this shape. I am sure this will benefit a lot of people using Lucene 
> in a multithread env.
> I have a quick question to ask you:
> Do you have to use the core Lucene 2.3-dev in order to use the accessor?
>
> I will take a look at your codes to see if I could help. I used a 
> slightly modified version of the original package in my project but it 
> breaks some of my tests. I hope your version works better.
>
> Thanks a lot!
>
> Jay
>
>
> Mark Miller wrote:
>> I have sat down and rewrote IndexAccessor from scratch. I copied in 
>> the same reference counting logic, pruned some things, and tried to 
>> make the whole package a bit simpler to use. I have a few things to 
>> do, but its pretty solid already. The only major thing I'd still like 
>> to do is add an option to warm searchers before putting them in the 
>> Searcher cache. Id like to writer some more tests as well. Any help 
>> greatly appreciated if your interested in using the thing.
>>
>>
>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>
>>
>> Here is a an example of a class that can be instantiated in one of 
>> multiple threads and read /modify a single index without worrying 
>> about what any
>> of the other threads are doing to the index at any given time. This 
>> is a very simple example of how to use the IndexAccessor and not 
>> necessarily an
>> example of best practices. The main idea is that you get your Writer, 
>> Searcher, or Reader, and then be sure to release it as soon as your 
>> done with it
>> in a finally block. For loading, you will want to load many docs with 
>> a Writer (batch them) before releasing it, but remember that Readers 
>> will not get a new view
>> of the index until you release all of the Writers. So beware hogging 
>> a Writer unless you thats what your intending.
>>
>> JavaDoc:
>> http://myhardshadow.com/indexaccessorapi/
>>
>> Code:
>> http://myhardshadow.com/indexaccessor/trunk/
>>
>> Jar:
>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>
>>
>> Your synchronized block concerns:
>>
>> The synchronized blocks that control accesss to the IndexAccessor do 
>> not have a huge impact on performance. Keep in mind that all of the 
>> work is not done in a synchonrized block, just the retrieval of the 
>> Searcher, Writer, Reader. Even if the synchronization makes the 
>> method twice as expensive, it is still overpowered by the cost of 
>> parsing queries and searching the index. This applies with or without 
>> contention. I wrote a simple test and included the output below. You 
>> might use the IBM Lock Analyzer for Java to further analyze these 
>> costs. Trust me, this thing is speedy. Its many times better than 
>> using IndexModifier.
>>
>> Without Contention
>> Just retrieve and release Searcher 100000 times
>> ----
>> avg time:6.3E-4 ms
>> total time:63 ms
>>
>> Parse query and search on 1 doc 100000 times
>> ----
>> avg time:0.03107 ms
>> total time:3107 ms
>>
>>
>> With Contention (40 other threads running 80000 searches)
>> Just retrieve and release Searcher 100000 times
>> ----
>> avg time:0.04643 ms
>> total time:4643 ms
>>
>> Parse query and search on 1 doc 100000 times
>> ----
>> avg time:0.64337 ms
>> total time:64337 ms
>>
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

2.3-dev is not required. I'd have to look and see what is...I'd guess 
around 2.1. I did program for java 1.5 though, but converting to 1.4 
should be minimal work.

Jay Yu wrote:
> Mark,
>
> Great effort getting the original lucene index accessor package in 
> this shape. I am sure this will benefit a lot of people using Lucene 
> in a multithread env.
> I have a quick question to ask you:
> Do you have to use the core Lucene 2.3-dev in order to use the accessor?
>
> I will take a look at your codes to see if I could help. I used a 
> slightly modified version of the original package in my project but it 
> breaks some of my tests. I hope your version works better.
>
> Thanks a lot!
>
> Jay
>
>
> Mark Miller wrote:
>> I have sat down and rewrote IndexAccessor from scratch. I copied in 
>> the same reference counting logic, pruned some things, and tried to 
>> make the whole package a bit simpler to use. I have a few things to 
>> do, but its pretty solid already. The only major thing I'd still like 
>> to do is add an option to warm searchers before putting them in the 
>> Searcher cache. Id like to writer some more tests as well. Any help 
>> greatly appreciated if your interested in using the thing.
>>
>>
>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
>>
>>
>> Here is a an example of a class that can be instantiated in one of 
>> multiple threads and read /modify a single index without worrying 
>> about what any
>> of the other threads are doing to the index at any given time. This 
>> is a very simple example of how to use the IndexAccessor and not 
>> necessarily an
>> example of best practices. The main idea is that you get your Writer, 
>> Searcher, or Reader, and then be sure to release it as soon as your 
>> done with it
>> in a finally block. For loading, you will want to load many docs with 
>> a Writer (batch them) before releasing it, but remember that Readers 
>> will not get a new view
>> of the index until you release all of the Writers. So beware hogging 
>> a Writer unless you thats what your intending.
>>
>> JavaDoc:
>> http://myhardshadow.com/indexaccessorapi/
>>
>> Code:
>> http://myhardshadow.com/indexaccessor/trunk/
>>
>> Jar:
>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>
>>
>> Your synchronized block concerns:
>>
>> The synchronized blocks that control accesss to the IndexAccessor do 
>> not have a huge impact on performance. Keep in mind that all of the 
>> work is not done in a synchonrized block, just the retrieval of the 
>> Searcher, Writer, Reader. Even if the synchronization makes the 
>> method twice as expensive, it is still overpowered by the cost of 
>> parsing queries and searching the index. This applies with or without 
>> contention. I wrote a simple test and included the output below. You 
>> might use the IBM Lock Analyzer for Java to further analyze these 
>> costs. Trust me, this thing is speedy. Its many times better than 
>> using IndexModifier.
>>
>> Without Contention
>> Just retrieve and release Searcher 100000 times
>> ----
>> avg time:6.3E-4 ms
>> total time:63 ms
>>
>> Parse query and search on 1 doc 100000 times
>> ----
>> avg time:0.03107 ms
>> total time:3107 ms
>>
>>
>> With Contention (40 other threads running 80000 searches)
>> Just retrieve and release Searcher 100000 times
>> ----
>> avg time:0.04643 ms
>> total time:4643 ms
>>
>> Parse query and search on 1 doc 100000 times
>> ----
>> avg time:0.64337 ms
>> total time:64337 ms
>>
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Mark,

Great effort getting the original lucene index accessor package in this 
shape. I am sure this will benefit a lot of people using Lucene in a 
multithread env.
I have a quick question to ask you:
Do you have to use the core Lucene 2.3-dev in order to use the accessor?

I will take a look at your codes to see if I could help. I used a 
slightly modified version of the original package in my project but it 
breaks some of my tests. I hope your version works better.

Thanks a lot!

Jay


Mark Miller wrote:
> I have sat down and rewrote IndexAccessor from scratch. I copied in the 
> same reference counting logic, pruned some things, and tried to make the 
> whole package a bit simpler to use. I have a few things to do, but its 
> pretty solid already. The only major thing I'd still like to do is add 
> an option to warm searchers before putting them in the Searcher cache. 
> Id like to writer some more tests as well. Any help greatly appreciated 
> if your interested in using the thing.
> 
> 
> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java 
> 
> 
> Here is a an example of a class that can be instantiated in one of 
> multiple threads and read /modify a single index without worrying about 
> what any
> of the other threads are doing to the index at any given time. This is a 
> very simple example of how to use the IndexAccessor and not necessarily an
> example of best practices. The main idea is that you get your Writer, 
> Searcher, or Reader, and then be sure to release it as soon as your done 
> with it
> in a finally block. For loading, you will want to load many docs with a 
> Writer (batch them) before releasing it, but remember that Readers will 
> not get a new view
> of the index until you release all of the Writers. So beware hogging a 
> Writer unless you thats what your intending.
> 
> JavaDoc:
> http://myhardshadow.com/indexaccessorapi/
> 
> Code:
> http://myhardshadow.com/indexaccessor/trunk/
> 
> Jar:
> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
> 
> 
> Your synchronized block concerns:
> 
> The synchronized blocks that control accesss to the IndexAccessor do not 
> have a huge impact on performance. Keep in mind that all of the work is 
> not done in a synchonrized block, just the retrieval of the Searcher, 
> Writer, Reader. Even if the synchronization makes the method twice as 
> expensive, it is still overpowered by the cost of parsing queries and 
> searching the index. This applies with or without contention. I wrote a 
> simple test and included the output below. You might use the IBM Lock 
> Analyzer for Java to further analyze these costs. Trust me, this thing 
> is speedy. Its many times better than using IndexModifier.
> 
> Without Contention
> Just retrieve and release Searcher 100000 times
> ----
> avg time:6.3E-4 ms
> total time:63 ms
> 
> Parse query and search on 1 doc 100000 times
> ----
> avg time:0.03107 ms
> total time:3107 ms
> 
> 
> With Contention (40 other threads running 80000 searches)
> Just retrieve and release Searcher 100000 times
> ----
> avg time:0.04643 ms
> total time:4643 ms
> 
> Parse query and search on 1 doc 100000 times
> ----
> avg time:0.64337 ms
> total time:64337 ms
> 
> 
> - Mark
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

I have sat down and rewrote IndexAccessor from scratch. I copied in the
same reference counting logic, pruned some things, and tried to make the
whole package a bit simpler to use. I have a few things to do, but its
pretty solid already. The only major thing I'd still like to do is add
an option to warm searchers before putting them in the Searcher cache.
Id like to writer some more tests as well. Any help greatly appreciated
if your interested in using the thing.

http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java

Here is a an example of a class that can be instantiated in one of
multiple threads and read /modify a single index without worrying about
what any
of the other threads are doing to the index at any given time. This is a
very simple example of how to use the IndexAccessor and not necessarily an
example of best practices. The main idea is that you get your Writer,
Searcher, or Reader, and then be sure to release it as soon as your done
with it
in a finally block. For loading, you will want to load many docs with a
Writer (batch them) before releasing it, but remember that Readers will
not get a new view
of the index until you release all of the Writers. So beware hogging a
Writer unless you thats what your intending.

JavaDoc:
http://myhardshadow.com/indexaccessorapi/

Code:
http://myhardshadow.com/indexaccessor/trunk/

Jar:
http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar

Your synchronized block concerns:

The synchronized blocks that control accesss to the IndexAccessor do not
have a huge impact on performance. Keep in mind that all of the work is
not done in a synchonrized block, just the retrieval of the Searcher,
Writer, Reader. Even if the synchronization makes the method twice as
expensive, it is still overpowered by the cost of parsing queries and
searching the index. This applies with or without contention. I wrote a
simple test and included the output below. You might use the IBM Lock
Analyzer for Java to further analyze these costs. Trust me, this thing
is speedy. Its many times better than using IndexModifier.

Without Contention
Just retrieve and release Searcher 100000 times
----
avg time:6.3E-4 ms
total time:63 ms

Parse query and search on 1 doc 100000 times
----
avg time:0.03107 ms
total time:3107 ms

With Contention (40 other threads running 80000 searches)
Just retrieve and release Searcher 100000 times
----
avg time:0.04643 ms
total time:4643 ms

Parse query and search on 1 doc 100000 times
----
avg time:0.64337 ms
total time:64337 ms

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

I'd be very interested to see your test results and codes. Thanks!

Mark Miller wrote:
> I sat down over the weekend and rewrote the code from scratch so that I 
> could improve and simplify it somewhat. I also did some testing of the 
> synch costs, and it is very insignificant compared to the total time to 
> parse a query and run a search. I'll try and get around to posting the 
> code tonight.
> 
> - Mark
> 
> Jay Yu wrote:
>>
>>
>> Mark Miller wrote:
>>> Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does 
>>> is sync Readers with Writers and allow multiple threads to share the 
>>> same instances of them -- nothing more. The code just forces Readers 
>>> to refresh when Writers are used to change the index. There really 
>>> isn't any functionality beyond that offered. Since you want to have a 
>>> multi thread system access the same resources (which occasionally 
>>> need to be refreshed) its not too easy to get around a synchronized 
>>> block.
>>>
>>> If I am able to extract some usable code for you soon I will let you 
>>> know.
>> I will appreciate it!
>> Thanks for your help!
>>
>>>
>>> - Mark
>>>
>>> Jay Yu wrote:
>>>> Mark,
>>>>
>>>> Thanks for sharing your valuable exp. and thoughts.
>>>> Frankly our system already has most of the functionalities 
>>>> LuceneIndexAcessor offers. The only thing I am looking for is to 
>>>> sync the searchers' close. That's why I am little worried about the 
>>>> way accessor handles the searcher sync.
>>>> I will probably give it a try to see how it performs in our system.
>>>>
>>>> Thanks!
>>>>
>>>> Jay
>>>>
>>>> Mark Miller wrote:
>>>>> The method is synched, but this is because each thread *does* share 
>>>>> the same Searcher. To maintain a cache of searchers across multiple 
>>>>> threads, you've got to sync -- to reference count, you've got to 
>>>>> sync. The performance hit of LuceneIndexAcessor is pretty minimal 
>>>>> for its functionality, and frankly, for the functionality you want, 
>>>>> you have to pay a cost. Thats not even the end of it really...your 
>>>>> going to need to maintain a cache of Accessor objects for each 
>>>>> index as well...and if you dont know all the indexes at startup 
>>>>> time, access to this will also need to be synched. I wouldn't worry 
>>>>> though -- searches are still lightening fast...that won't be the 
>>>>> bottleneck. I'll work on getting you some code, but if your 
>>>>> worried, try some benchmarking on the original code.
>>>>>
>>>>> Also, to be clear, I don't have the code in front of me, but 
>>>>> getting a Searcher does not require waiting for a Writer to be 
>>>>> released. Searchers are cached and resused (and instantly 
>>>>> available) until a Writer is released. When this happens, the 
>>>>> release Writer method waits for all the Searchers to return 
>>>>> (happens pretty quick as searches are pretty quick), the Searcher 
>>>>> cache is cleared, and then subsequent calls to getSearcher create 
>>>>> new Searchers that can see what the Writer added.
>>>>>
>>>>> The key is use your Writer/Searcher/Reader quickly and then release 
>>>>> it (unless your bulk loading). I've had such a system with 5+ 
>>>>> million docs on a standard machine and searches where still well 
>>>>> below a second after the first Searcher is cached (and even the 
>>>>> first search is darn quick). And that includes a lot of extra crap 
>>>>> I am doing.
>>>>>
>>>>> - Mark
>>>>>
>>>>> Jay Yu wrote:
>>>>>> Mark,
>>>>>>
>>>>>> After reading the implementation of 
>>>>>> LuceneIndexAccessor.getSearcher(),
>>>>>> I realized that the method is synchronized and wait for 
>>>>>> writingDirector to be released. That means if we getSearcher for 
>>>>>> each query in each thread, there might be a contention and 
>>>>>> performance hit. In fact, even the method of release(searcher) is 
>>>>>> costly. On the other hand, if multiple threads share share one 
>>>>>> searcher then it'd defeat the
>>>>>> purpose of using LuceneIndexAccessor.
>>>>>> Do I miss sth here? What's your suggested use case for 
>>>>>> LuceneIndexAccessor?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Jay
>>>>>> Mark Miller wrote:
>>>>>>> Ill respond a point at a time:
>>>>>>>
>>>>>>> 1.
>>>>>>>
>>>>>>> ****************************** Hi Maik,
>>>>>>>
>>>>>>> So what happens in this case:
>>>>>>>
>>>>>>> IndexAccessProvider accessProvider = new 
>>>>>>> IndexAccessProvider(directory,
>>>>>>>
>>>>>>> analyzer);
>>>>>>>
>>>>>>> LuceneIndexAccessor accessor = new 
>>>>>>> LuceneIndexAccessor(accessProvider);
>>>>>>>
>>>>>>> accessor.open();
>>>>>>>
>>>>>>> IndexWriter writer = accessor.getWriter();
>>>>>>>
>>>>>>> // reference to the same instance?
>>>>>>>
>>>>>>> IndexWriter writer2 = accessor.getWriter();
>>>>>>>
>>>>>>> writer.addDocument(....);
>>>>>>>
>>>>>>> writer2.addDocument(....);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> // I didn't release the writer yet
>>>>>>>
>>>>>>> // will this block?
>>>>>>>
>>>>>>> IndexReader reader = accessor.getReader();
>>>>>>>
>>>>>>> reader.delete(....);
>>>>>>>
>>>>>>> ************
>>>>>>>
>>>>>>> This is not really an issue. First, if you are going to delete 
>>>>>>> with a Reader
>>>>>>> you need to call getWritingReader and not getReader. When you do 
>>>>>>> that, the
>>>>>>> getWritingReader call will block until writer and writer2 are 
>>>>>>> released. If
>>>>>>> you are just adding a couple docs before releasing the writers, 
>>>>>>> this is no
>>>>>>> problem because the block will be very short. If you are loading 
>>>>>>> tons of
>>>>>>> docs and you want to be able to delete with a Reader in a timely 
>>>>>>> manner, you
>>>>>>> should release the writers every now and then (release and re-get 
>>>>>>> the Writer
>>>>>>> every 100 docs or something). An interactive index should not hog 
>>>>>>> the
>>>>>>> Writer, while something that is just loading a lot could hog the 
>>>>>>> Writer.
>>>>>>> This is no different than normal…you cannot delete with a Reader 
>>>>>>> while
>>>>>>> adding with a Writer with Lucene. This code just enforces those 
>>>>>>> semantics.
>>>>>>> The best solution is to just use a Writer to delete – I never get a
>>>>>>> ReadingWriter.
>>>>>>>
>>>>>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>>>
>>>>>>> This is no big deal either. I just added another getWriter call 
>>>>>>> that takes a
>>>>>>> create Boolean.
>>>>>>>
>>>>>>> 3. I don't think there is a latest release. This has never gotten 
>>>>>>> much
>>>>>>> official attention and is not in the sandbox. I worked straight 
>>>>>>> from the
>>>>>>> originally submitted code.
>>>>>>>
>>>>>>> 4. I will look into getting together some code that I can share. The
>>>>>>> multisearcher changes that are need are a couple of one liners 
>>>>>>> really, so at
>>>>>>> a minimum I will give you the changes needed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -       Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>>>
>>>>>>> Mark,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> thanks for sharing your insight and experience about 
>>>>>>> LuceneIndexAccessor!
>>>>>>>
>>>>>>> I remember seeing some people reporting some issues about it, 
>>>>>>> such as:
>>>>>>>
>>>>>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>>>>>
>>>>>>>
>>>>>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Have those issues been resolved?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Where did you get the latest release? It is not in the official 
>>>>>>> Lucene
>>>>>>>
>>>>>>> sandbox/contrib.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Finally, are you willing to share your extended version to 
>>>>>>> include your
>>>>>>>
>>>>>>> tweak relating to the MultiSearcher?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>
>>>>>>>> I use option 3 extensivley and find it very effective. There is 
>>>>>>>> a tweak or
>>>>>>>
>>>>>>>> two required to get it to work right with MultiSearchers, but 
>>>>>>>> other than
>>>>>>>
>>>>>>>> that, the code is great. I have built a lot on top of it. I'm on 
>>>>>>>> the list
>>>>>>>
>>>>>>>> all the time and would be happy to answer any questions you have in
>>>>>>> regards
>>>>>>>
>>>>>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too 
>>>>>>>> much.
>>>>>>>
>>>>>>>
>>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>>>>>> throw a
>>>>>>>
>>>>>>>>> AlreadyClosedException when another thread is trying to update the
>>>>>>>
>>>>>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>>>>>> index is
>>>>>>>
>>>>>>>>> updated. Searching over the past discussions on this mailing 
>>>>>>>>> list, I
>>>>>>>
>>>>>>>>> found several approaches to solve the problem.
>>>>>>>
>>>>>>>>> 1. use solr
>>>>>>>
>>>>>>>>> 2. use DelayCloseIndexSearcher
>>>>>>>
>>>>>>>>> 3. use LuceneIndexAccessor
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> the first one is not feasible for us; some people seemed to have
>>>>>>>
>>>>>>>>> problems with No. 2 and I do not find a lot of discussions 
>>>>>>>>> around No.3.
>>>>>>>
>>>>>>>
>>>>>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>>>>>
>>>>>>>>> Or do I miss other better solutions?
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks for any suggestion/comment!
>>>>>>>
>>>>>>>
>>>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>>
>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

I sat down over the weekend and rewrote the code from scratch so that I 
could improve and simplify it somewhat. I also did some testing of the 
synch costs, and it is very insignificant compared to the total time to 
parse a query and run a search. I'll try and get around to posting the 
code tonight.

- Mark

Jay Yu wrote:
>
>
> Mark Miller wrote:
>> Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does 
>> is sync Readers with Writers and allow multiple threads to share the 
>> same instances of them -- nothing more. The code just forces Readers 
>> to refresh when Writers are used to change the index. There really 
>> isn't any functionality beyond that offered. Since you want to have a 
>> multi thread system access the same resources (which occasionally 
>> need to be refreshed) its not too easy to get around a synchronized 
>> block.
>>
>> If I am able to extract some usable code for you soon I will let you 
>> know.
> I will appreciate it!
> Thanks for your help!
>
>>
>> - Mark
>>
>> Jay Yu wrote:
>>> Mark,
>>>
>>> Thanks for sharing your valuable exp. and thoughts.
>>> Frankly our system already has most of the functionalities 
>>> LuceneIndexAcessor offers. The only thing I am looking for is to 
>>> sync the searchers' close. That's why I am little worried about the 
>>> way accessor handles the searcher sync.
>>> I will probably give it a try to see how it performs in our system.
>>>
>>> Thanks!
>>>
>>> Jay
>>>
>>> Mark Miller wrote:
>>>> The method is synched, but this is because each thread *does* share 
>>>> the same Searcher. To maintain a cache of searchers across multiple 
>>>> threads, you've got to sync -- to reference count, you've got to 
>>>> sync. The performance hit of LuceneIndexAcessor is pretty minimal 
>>>> for its functionality, and frankly, for the functionality you want, 
>>>> you have to pay a cost. Thats not even the end of it really...your 
>>>> going to need to maintain a cache of Accessor objects for each 
>>>> index as well...and if you dont know all the indexes at startup 
>>>> time, access to this will also need to be synched. I wouldn't worry 
>>>> though -- searches are still lightening fast...that won't be the 
>>>> bottleneck. I'll work on getting you some code, but if your 
>>>> worried, try some benchmarking on the original code.
>>>>
>>>> Also, to be clear, I don't have the code in front of me, but 
>>>> getting a Searcher does not require waiting for a Writer to be 
>>>> released. Searchers are cached and resused (and instantly 
>>>> available) until a Writer is released. When this happens, the 
>>>> release Writer method waits for all the Searchers to return 
>>>> (happens pretty quick as searches are pretty quick), the Searcher 
>>>> cache is cleared, and then subsequent calls to getSearcher create 
>>>> new Searchers that can see what the Writer added.
>>>>
>>>> The key is use your Writer/Searcher/Reader quickly and then release 
>>>> it (unless your bulk loading). I've had such a system with 5+ 
>>>> million docs on a standard machine and searches where still well 
>>>> below a second after the first Searcher is cached (and even the 
>>>> first search is darn quick). And that includes a lot of extra crap 
>>>> I am doing.
>>>>
>>>> - Mark
>>>>
>>>> Jay Yu wrote:
>>>>> Mark,
>>>>>
>>>>> After reading the implementation of 
>>>>> LuceneIndexAccessor.getSearcher(),
>>>>> I realized that the method is synchronized and wait for 
>>>>> writingDirector to be released. That means if we getSearcher for 
>>>>> each query in each thread, there might be a contention and 
>>>>> performance hit. In fact, even the method of release(searcher) is 
>>>>> costly. On the other hand, if multiple threads share share one 
>>>>> searcher then it'd defeat the
>>>>> purpose of using LuceneIndexAccessor.
>>>>> Do I miss sth here? What's your suggested use case for 
>>>>> LuceneIndexAccessor?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jay
>>>>> Mark Miller wrote:
>>>>>> Ill respond a point at a time:
>>>>>>
>>>>>> 1.
>>>>>>
>>>>>> ****************************** Hi Maik,
>>>>>>
>>>>>> So what happens in this case:
>>>>>>
>>>>>> IndexAccessProvider accessProvider = new 
>>>>>> IndexAccessProvider(directory,
>>>>>>
>>>>>> analyzer);
>>>>>>
>>>>>> LuceneIndexAccessor accessor = new 
>>>>>> LuceneIndexAccessor(accessProvider);
>>>>>>
>>>>>> accessor.open();
>>>>>>
>>>>>> IndexWriter writer = accessor.getWriter();
>>>>>>
>>>>>> // reference to the same instance?
>>>>>>
>>>>>> IndexWriter writer2 = accessor.getWriter();
>>>>>>
>>>>>> writer.addDocument(....);
>>>>>>
>>>>>> writer2.addDocument(....);
>>>>>>
>>>>>>
>>>>>>
>>>>>> // I didn't release the writer yet
>>>>>>
>>>>>> // will this block?
>>>>>>
>>>>>> IndexReader reader = accessor.getReader();
>>>>>>
>>>>>> reader.delete(....);
>>>>>>
>>>>>> ************
>>>>>>
>>>>>> This is not really an issue. First, if you are going to delete 
>>>>>> with a Reader
>>>>>> you need to call getWritingReader and not getReader. When you do 
>>>>>> that, the
>>>>>> getWritingReader call will block until writer and writer2 are 
>>>>>> released. If
>>>>>> you are just adding a couple docs before releasing the writers, 
>>>>>> this is no
>>>>>> problem because the block will be very short. If you are loading 
>>>>>> tons of
>>>>>> docs and you want to be able to delete with a Reader in a timely 
>>>>>> manner, you
>>>>>> should release the writers every now and then (release and re-get 
>>>>>> the Writer
>>>>>> every 100 docs or something). An interactive index should not hog 
>>>>>> the
>>>>>> Writer, while something that is just loading a lot could hog the 
>>>>>> Writer.
>>>>>> This is no different than normal…you cannot delete with a Reader 
>>>>>> while
>>>>>> adding with a Writer with Lucene. This code just enforces those 
>>>>>> semantics.
>>>>>> The best solution is to just use a Writer to delete – I never get a
>>>>>> ReadingWriter.
>>>>>>
>>>>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>>
>>>>>> This is no big deal either. I just added another getWriter call 
>>>>>> that takes a
>>>>>> create Boolean.
>>>>>>
>>>>>> 3. I don't think there is a latest release. This has never gotten 
>>>>>> much
>>>>>> official attention and is not in the sandbox. I worked straight 
>>>>>> from the
>>>>>> originally submitted code.
>>>>>>
>>>>>> 4. I will look into getting together some code that I can share. The
>>>>>> multisearcher changes that are need are a couple of one liners 
>>>>>> really, so at
>>>>>> a minimum I will give you the changes needed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -       Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>>
>>>>>> Mark,
>>>>>>
>>>>>>
>>>>>>
>>>>>> thanks for sharing your insight and experience about 
>>>>>> LuceneIndexAccessor!
>>>>>>
>>>>>> I remember seeing some people reporting some issues about it, 
>>>>>> such as:
>>>>>>
>>>>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>>>>
>>>>>>
>>>>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>>
>>>>>>
>>>>>>
>>>>>> Have those issues been resolved?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Where did you get the latest release? It is not in the official 
>>>>>> Lucene
>>>>>>
>>>>>> sandbox/contrib.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Finally, are you willing to share your extended version to 
>>>>>> include your
>>>>>>
>>>>>> tweak relating to the MultiSearcher?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mark Miller wrote:
>>>>>>
>>>>>>> I use option 3 extensivley and find it very effective. There is 
>>>>>>> a tweak or
>>>>>>
>>>>>>> two required to get it to work right with MultiSearchers, but 
>>>>>>> other than
>>>>>>
>>>>>>> that, the code is great. I have built a lot on top of it. I'm on 
>>>>>>> the list
>>>>>>
>>>>>>> all the time and would be happy to answer any questions you have in
>>>>>> regards
>>>>>>
>>>>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too 
>>>>>>> much.
>>>>>>
>>>>>>
>>>>>>> - Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>>
>>>>>>
>>>>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>>>>> throw a
>>>>>>
>>>>>>>> AlreadyClosedException when another thread is trying to update the
>>>>>>
>>>>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>>>>> index is
>>>>>>
>>>>>>>> updated. Searching over the past discussions on this mailing 
>>>>>>>> list, I
>>>>>>
>>>>>>>> found several approaches to solve the problem.
>>>>>>
>>>>>>>> 1. use solr
>>>>>>
>>>>>>>> 2. use DelayCloseIndexSearcher
>>>>>>
>>>>>>>> 3. use LuceneIndexAccessor
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> the first one is not feasible for us; some people seemed to have
>>>>>>
>>>>>>>> problems with No. 2 and I do not find a lot of discussions 
>>>>>>>> around No.3.
>>>>>>
>>>>>>
>>>>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>>>>
>>>>>>>> Or do I miss other better solutions?
>>>>>>
>>>>>>
>>>>>>>> Thanks for any suggestion/comment!
>>>>>>
>>>>>>
>>>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.


Mark Miller wrote:
> Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is 
> sync Readers with Writers and allow multiple threads to share the same 
> instances of them -- nothing more. The code just forces Readers to 
> refresh when Writers are used to change the index. There really isn't 
> any functionality beyond that offered. Since you want to have a multi 
> thread system access the same resources (which occasionally need to be 
> refreshed) its not too easy to get around a synchronized block.
> 
> If I am able to extract some usable code for you soon I will let you know.
I will appreciate it!
Thanks for your help!

> 
> - Mark
> 
> Jay Yu wrote:
>> Mark,
>>
>> Thanks for sharing your valuable exp. and thoughts.
>> Frankly our system already has most of the functionalities 
>> LuceneIndexAcessor offers. The only thing I am looking for is to sync 
>> the searchers' close. That's why I am little worried about the way 
>> accessor handles the searcher sync.
>> I will probably give it a try to see how it performs in our system.
>>
>> Thanks!
>>
>> Jay
>>
>> Mark Miller wrote:
>>> The method is synched, but this is because each thread *does* share 
>>> the same Searcher. To maintain a cache of searchers across multiple 
>>> threads, you've got to sync -- to reference count, you've got to 
>>> sync. The performance hit of LuceneIndexAcessor is pretty minimal for 
>>> its functionality, and frankly, for the functionality you want, you 
>>> have to pay a cost. Thats not even the end of it really...your going 
>>> to need to maintain a cache of Accessor objects for each index as 
>>> well...and if you dont know all the indexes at startup time, access 
>>> to this will also need to be synched. I wouldn't worry though -- 
>>> searches are still lightening fast...that won't be the bottleneck. 
>>> I'll work on getting you some code, but if your worried, try some 
>>> benchmarking on the original code.
>>>
>>> Also, to be clear, I don't have the code in front of me, but getting 
>>> a Searcher does not require waiting for a Writer to be released. 
>>> Searchers are cached and resused (and instantly available) until a 
>>> Writer is released. When this happens, the release Writer method 
>>> waits for all the Searchers to return (happens pretty quick as 
>>> searches are pretty quick), the Searcher cache is cleared, and then 
>>> subsequent calls to getSearcher create new Searchers that can see 
>>> what the Writer added.
>>>
>>> The key is use your Writer/Searcher/Reader quickly and then release 
>>> it (unless your bulk loading). I've had such a system with 5+ million 
>>> docs on a standard machine and searches where still well below a 
>>> second after the first Searcher is cached (and even the first search 
>>> is darn quick). And that includes a lot of extra crap I am doing.
>>>
>>> - Mark
>>>
>>> Jay Yu wrote:
>>>> Mark,
>>>>
>>>> After reading the implementation of LuceneIndexAccessor.getSearcher(),
>>>> I realized that the method is synchronized and wait for 
>>>> writingDirector to be released. That means if we getSearcher for 
>>>> each query in each thread, there might be a contention and 
>>>> performance hit. In fact, even the method of release(searcher) is 
>>>> costly. On the other hand, if multiple threads share share one 
>>>> searcher then it'd defeat the
>>>> purpose of using LuceneIndexAccessor.
>>>> Do I miss sth here? What's your suggested use case for 
>>>> LuceneIndexAccessor?
>>>>
>>>> Thanks!
>>>>
>>>> Jay
>>>> Mark Miller wrote:
>>>>> Ill respond a point at a time:
>>>>>
>>>>> 1.
>>>>>
>>>>> ****************************** Hi Maik,
>>>>>
>>>>> So what happens in this case:
>>>>>
>>>>> IndexAccessProvider accessProvider = new 
>>>>> IndexAccessProvider(directory,
>>>>>
>>>>> analyzer);
>>>>>
>>>>> LuceneIndexAccessor accessor = new 
>>>>> LuceneIndexAccessor(accessProvider);
>>>>>
>>>>> accessor.open();
>>>>>
>>>>> IndexWriter writer = accessor.getWriter();
>>>>>
>>>>> // reference to the same instance?
>>>>>
>>>>> IndexWriter writer2 = accessor.getWriter();
>>>>>
>>>>> writer.addDocument(....);
>>>>>
>>>>> writer2.addDocument(....);
>>>>>
>>>>>
>>>>>
>>>>> // I didn't release the writer yet
>>>>>
>>>>> // will this block?
>>>>>
>>>>> IndexReader reader = accessor.getReader();
>>>>>
>>>>> reader.delete(....);
>>>>>
>>>>> ************
>>>>>
>>>>> This is not really an issue. First, if you are going to delete with 
>>>>> a Reader
>>>>> you need to call getWritingReader and not getReader. When you do 
>>>>> that, the
>>>>> getWritingReader call will block until writer and writer2 are 
>>>>> released. If
>>>>> you are just adding a couple docs before releasing the writers, 
>>>>> this is no
>>>>> problem because the block will be very short. If you are loading 
>>>>> tons of
>>>>> docs and you want to be able to delete with a Reader in a timely 
>>>>> manner, you
>>>>> should release the writers every now and then (release and re-get 
>>>>> the Writer
>>>>> every 100 docs or something). An interactive index should not hog the
>>>>> Writer, while something that is just loading a lot could hog the 
>>>>> Writer.
>>>>> This is no different than normal…you cannot delete with a Reader while
>>>>> adding with a Writer with Lucene. This code just enforces those 
>>>>> semantics.
>>>>> The best solution is to just use a Writer to delete – I never get a
>>>>> ReadingWriter.
>>>>>
>>>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>
>>>>> This is no big deal either. I just added another getWriter call 
>>>>> that takes a
>>>>> create Boolean.
>>>>>
>>>>> 3. I don't think there is a latest release. This has never gotten much
>>>>> official attention and is not in the sandbox. I worked straight 
>>>>> from the
>>>>> originally submitted code.
>>>>>
>>>>> 4. I will look into getting together some code that I can share. The
>>>>> multisearcher changes that are need are a couple of one liners 
>>>>> really, so at
>>>>> a minimum I will give you the changes needed.
>>>>>
>>>>>
>>>>>
>>>>> -       Mark
>>>>>
>>>>>
>>>>>
>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>
>>>>> Mark,
>>>>>
>>>>>
>>>>>
>>>>> thanks for sharing your insight and experience about 
>>>>> LuceneIndexAccessor!
>>>>>
>>>>> I remember seeing some people reporting some issues about it, such as:
>>>>>
>>>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>>>
>>>>>
>>>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>>
>>>>>
>>>>>
>>>>> Have those issues been resolved?
>>>>>
>>>>>
>>>>>
>>>>> Where did you get the latest release? It is not in the official Lucene
>>>>>
>>>>> sandbox/contrib.
>>>>>
>>>>>
>>>>>
>>>>> Finally, are you willing to share your extended version to include 
>>>>> your
>>>>>
>>>>> tweak relating to the MultiSearcher?
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>
>>>>>
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>> Mark Miller wrote:
>>>>>
>>>>>> I use option 3 extensivley and find it very effective. There is a 
>>>>>> tweak or
>>>>>
>>>>>> two required to get it to work right with MultiSearchers, but 
>>>>>> other than
>>>>>
>>>>>> that, the code is great. I have built a lot on top of it. I'm on 
>>>>>> the list
>>>>>
>>>>>> all the time and would be happy to answer any questions you have in
>>>>> regards
>>>>>
>>>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
>>>>>
>>>>>
>>>>>> - Mark
>>>>>
>>>>>
>>>>>
>>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>>
>>>>>
>>>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>>>> throw a
>>>>>
>>>>>>> AlreadyClosedException when another thread is trying to update the
>>>>>
>>>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>>>> index is
>>>>>
>>>>>>> updated. Searching over the past discussions on this mailing list, I
>>>>>
>>>>>>> found several approaches to solve the problem.
>>>>>
>>>>>>> 1. use solr
>>>>>
>>>>>>> 2. use DelayCloseIndexSearcher
>>>>>
>>>>>>> 3. use LuceneIndexAccessor
>>>>>
>>>>>
>>>>>
>>>>>>> the first one is not feasible for us; some people seemed to have
>>>>>
>>>>>>> problems with No. 2 and I do not find a lot of discussions around 
>>>>>>> No.3.
>>>>>
>>>>>
>>>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>>>
>>>>>>> Or do I miss other better solutions?
>>>>>
>>>>>
>>>>>>> Thanks for any suggestion/comment!
>>>>>
>>>>>
>>>>>>> Jay
>>>>>
>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is 
sync Readers with Writers and allow multiple threads to share the same 
instances of them -- nothing more. The code just forces Readers to 
refresh when Writers are used to change the index. There really isn't 
any functionality beyond that offered. Since you want to have a multi 
thread system access the same resources (which occasionally need to be 
refreshed) its not too easy to get around a synchronized block.

If I am able to extract some usable code for you soon I will let you know.

- Mark

Jay Yu wrote:
> Mark,
>
> Thanks for sharing your valuable exp. and thoughts.
> Frankly our system already has most of the functionalities 
> LuceneIndexAcessor offers. The only thing I am looking for is to sync 
> the searchers' close. That's why I am little worried about the way 
> accessor handles the searcher sync.
> I will probably give it a try to see how it performs in our system.
>
> Thanks!
>
> Jay
>
> Mark Miller wrote:
>> The method is synched, but this is because each thread *does* share 
>> the same Searcher. To maintain a cache of searchers across multiple 
>> threads, you've got to sync -- to reference count, you've got to 
>> sync. The performance hit of LuceneIndexAcessor is pretty minimal for 
>> its functionality, and frankly, for the functionality you want, you 
>> have to pay a cost. Thats not even the end of it really...your going 
>> to need to maintain a cache of Accessor objects for each index as 
>> well...and if you dont know all the indexes at startup time, access 
>> to this will also need to be synched. I wouldn't worry though -- 
>> searches are still lightening fast...that won't be the bottleneck. 
>> I'll work on getting you some code, but if your worried, try some 
>> benchmarking on the original code.
>>
>> Also, to be clear, I don't have the code in front of me, but getting 
>> a Searcher does not require waiting for a Writer to be released. 
>> Searchers are cached and resused (and instantly available) until a 
>> Writer is released. When this happens, the release Writer method 
>> waits for all the Searchers to return (happens pretty quick as 
>> searches are pretty quick), the Searcher cache is cleared, and then 
>> subsequent calls to getSearcher create new Searchers that can see 
>> what the Writer added.
>>
>> The key is use your Writer/Searcher/Reader quickly and then release 
>> it (unless your bulk loading). I've had such a system with 5+ million 
>> docs on a standard machine and searches where still well below a 
>> second after the first Searcher is cached (and even the first search 
>> is darn quick). And that includes a lot of extra crap I am doing.
>>
>> - Mark
>>
>> Jay Yu wrote:
>>> Mark,
>>>
>>> After reading the implementation of LuceneIndexAccessor.getSearcher(),
>>> I realized that the method is synchronized and wait for 
>>> writingDirector to be released. That means if we getSearcher for 
>>> each query in each thread, there might be a contention and 
>>> performance hit. In fact, even the method of release(searcher) is 
>>> costly. On the other hand, if multiple threads share share one 
>>> searcher then it'd defeat the
>>> purpose of using LuceneIndexAccessor.
>>> Do I miss sth here? What's your suggested use case for 
>>> LuceneIndexAccessor?
>>>
>>> Thanks!
>>>
>>> Jay
>>> Mark Miller wrote:
>>>> Ill respond a point at a time:
>>>>
>>>> 1.
>>>>
>>>> ****************************** Hi Maik,
>>>>
>>>> So what happens in this case:
>>>>
>>>> IndexAccessProvider accessProvider = new 
>>>> IndexAccessProvider(directory,
>>>>
>>>> analyzer);
>>>>
>>>> LuceneIndexAccessor accessor = new 
>>>> LuceneIndexAccessor(accessProvider);
>>>>
>>>> accessor.open();
>>>>
>>>> IndexWriter writer = accessor.getWriter();
>>>>
>>>> // reference to the same instance?
>>>>
>>>> IndexWriter writer2 = accessor.getWriter();
>>>>
>>>> writer.addDocument(....);
>>>>
>>>> writer2.addDocument(....);
>>>>
>>>>
>>>>
>>>> // I didn't release the writer yet
>>>>
>>>> // will this block?
>>>>
>>>> IndexReader reader = accessor.getReader();
>>>>
>>>> reader.delete(....);
>>>>
>>>> ************
>>>>
>>>> This is not really an issue. First, if you are going to delete with 
>>>> a Reader
>>>> you need to call getWritingReader and not getReader. When you do 
>>>> that, the
>>>> getWritingReader call will block until writer and writer2 are 
>>>> released. If
>>>> you are just adding a couple docs before releasing the writers, 
>>>> this is no
>>>> problem because the block will be very short. If you are loading 
>>>> tons of
>>>> docs and you want to be able to delete with a Reader in a timely 
>>>> manner, you
>>>> should release the writers every now and then (release and re-get 
>>>> the Writer
>>>> every 100 docs or something). An interactive index should not hog the
>>>> Writer, while something that is just loading a lot could hog the 
>>>> Writer.
>>>> This is no different than normal…you cannot delete with a Reader while
>>>> adding with a Writer with Lucene. This code just enforces those 
>>>> semantics.
>>>> The best solution is to just use a Writer to delete – I never get a
>>>> ReadingWriter.
>>>>
>>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>
>>>> This is no big deal either. I just added another getWriter call 
>>>> that takes a
>>>> create Boolean.
>>>>
>>>> 3. I don't think there is a latest release. This has never gotten much
>>>> official attention and is not in the sandbox. I worked straight 
>>>> from the
>>>> originally submitted code.
>>>>
>>>> 4. I will look into getting together some code that I can share. The
>>>> multisearcher changes that are need are a couple of one liners 
>>>> really, so at
>>>> a minimum I will give you the changes needed.
>>>>
>>>>
>>>>
>>>> -       Mark
>>>>
>>>>
>>>>
>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>
>>>> Mark,
>>>>
>>>>
>>>>
>>>> thanks for sharing your insight and experience about 
>>>> LuceneIndexAccessor!
>>>>
>>>> I remember seeing some people reporting some issues about it, such as:
>>>>
>>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>>
>>>>
>>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>>
>>>>
>>>>
>>>> Have those issues been resolved?
>>>>
>>>>
>>>>
>>>> Where did you get the latest release? It is not in the official Lucene
>>>>
>>>> sandbox/contrib.
>>>>
>>>>
>>>>
>>>> Finally, are you willing to share your extended version to include 
>>>> your
>>>>
>>>> tweak relating to the MultiSearcher?
>>>>
>>>>
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>>
>>>> Jay
>>>>
>>>>
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>> I use option 3 extensivley and find it very effective. There is a 
>>>>> tweak or
>>>>
>>>>> two required to get it to work right with MultiSearchers, but 
>>>>> other than
>>>>
>>>>> that, the code is great. I have built a lot on top of it. I'm on 
>>>>> the list
>>>>
>>>>> all the time and would be happy to answer any questions you have in
>>>> regards
>>>>
>>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
>>>>
>>>>
>>>>> - Mark
>>>>
>>>>
>>>>
>>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>>
>>>>
>>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>>> throw a
>>>>
>>>>>> AlreadyClosedException when another thread is trying to update the
>>>>
>>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>>> index is
>>>>
>>>>>> updated. Searching over the past discussions on this mailing list, I
>>>>
>>>>>> found several approaches to solve the problem.
>>>>
>>>>>> 1. use solr
>>>>
>>>>>> 2. use DelayCloseIndexSearcher
>>>>
>>>>>> 3. use LuceneIndexAccessor
>>>>
>>>>
>>>>
>>>>>> the first one is not feasible for us; some people seemed to have
>>>>
>>>>>> problems with No. 2 and I do not find a lot of discussions around 
>>>>>> No.3.
>>>>
>>>>
>>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>>
>>>>>> Or do I miss other better solutions?
>>>>
>>>>
>>>>>> Thanks for any suggestion/comment!
>>>>
>>>>
>>>>>> Jay
>>>>
>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Mark,

Thanks for sharing your valuable exp. and thoughts.
Frankly our system already has most of the functionalities 
LuceneIndexAcessor offers. The only thing I am looking for is to sync 
the searchers' close. That's why I am little worried about the way 
accessor handles the searcher sync.
I will probably give it a try to see how it performs in our system.

Thanks!

Jay

Mark Miller wrote:
> The method is synched, but this is because each thread *does* share the 
> same Searcher. To maintain a cache of searchers across multiple threads, 
> you've got to sync -- to reference count, you've got to sync. The 
> performance hit of LuceneIndexAcessor is pretty minimal for its 
> functionality, and frankly, for the functionality you want, you have to 
> pay a cost. Thats not even the end of it really...your going to need to 
> maintain a cache of Accessor objects for each index as well...and if you 
> dont know all the indexes at startup time, access to this will also need 
> to be synched. I wouldn't worry though -- searches are still lightening 
> fast...that won't be the bottleneck. I'll work on getting you some code, 
> but if your worried, try some benchmarking on the original code.
> 
> Also, to be clear, I don't have the code in front of me, but getting a 
> Searcher does not require waiting for a Writer to be released. Searchers 
> are cached and resused (and instantly available) until a Writer is 
> released. When this happens, the release Writer method waits for all the 
> Searchers to return (happens pretty quick as searches are pretty quick), 
> the Searcher cache is cleared, and then subsequent calls to getSearcher 
> create new Searchers that can see what the Writer added.
> 
> The key is use your Writer/Searcher/Reader quickly and then release it 
> (unless your bulk loading). I've had such a system with 5+ million docs 
> on a standard machine and searches where still well below a second after 
> the first Searcher is cached (and even the first search is darn quick). 
> And that includes a lot of extra crap I am doing.
> 
> - Mark
> 
> Jay Yu wrote:
>> Mark,
>>
>> After reading the implementation of LuceneIndexAccessor.getSearcher(),
>> I realized that the method is synchronized and wait for 
>> writingDirector to be released. That means if we getSearcher for each 
>> query in each thread, there might be a contention and performance hit. 
>> In fact, even the method of release(searcher) is costly. On the other 
>> hand, if multiple threads share share one searcher then it'd defeat the
>> purpose of using LuceneIndexAccessor.
>> Do I miss sth here? What's your suggested use case for 
>> LuceneIndexAccessor?
>>
>> Thanks!
>>
>> Jay
>> Mark Miller wrote:
>>> Ill respond a point at a time:
>>>
>>> 1.
>>>
>>> ****************************** Hi Maik,
>>>
>>> So what happens in this case:
>>>
>>> IndexAccessProvider accessProvider = new IndexAccessProvider(directory,
>>>
>>> analyzer);
>>>
>>> LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);
>>>
>>> accessor.open();
>>>
>>> IndexWriter writer = accessor.getWriter();
>>>
>>> // reference to the same instance?
>>>
>>> IndexWriter writer2 = accessor.getWriter();
>>>
>>> writer.addDocument(....);
>>>
>>> writer2.addDocument(....);
>>>
>>>
>>>
>>> // I didn't release the writer yet
>>>
>>> // will this block?
>>>
>>> IndexReader reader = accessor.getReader();
>>>
>>> reader.delete(....);
>>>
>>> ************
>>>
>>> This is not really an issue. First, if you are going to delete with a 
>>> Reader
>>> you need to call getWritingReader and not getReader. When you do 
>>> that, the
>>> getWritingReader call will block until writer and writer2 are 
>>> released. If
>>> you are just adding a couple docs before releasing the writers, this 
>>> is no
>>> problem because the block will be very short. If you are loading tons of
>>> docs and you want to be able to delete with a Reader in a timely 
>>> manner, you
>>> should release the writers every now and then (release and re-get the 
>>> Writer
>>> every 100 docs or something). An interactive index should not hog the
>>> Writer, while something that is just loading a lot could hog the Writer.
>>> This is no different than normal…you cannot delete with a Reader while
>>> adding with a Writer with Lucene. This code just enforces those 
>>> semantics.
>>> The best solution is to just use a Writer to delete – I never get a
>>> ReadingWriter.
>>>
>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>
>>> This is no big deal either. I just added another getWriter call that 
>>> takes a
>>> create Boolean.
>>>
>>> 3. I don't think there is a latest release. This has never gotten much
>>> official attention and is not in the sandbox. I worked straight from the
>>> originally submitted code.
>>>
>>> 4. I will look into getting together some code that I can share. The
>>> multisearcher changes that are need are a couple of one liners 
>>> really, so at
>>> a minimum I will give you the changes needed.
>>>
>>>
>>>
>>> -       Mark
>>>
>>>
>>>
>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>
>>> Mark,
>>>
>>>
>>>
>>> thanks for sharing your insight and experience about 
>>> LuceneIndexAccessor!
>>>
>>> I remember seeing some people reporting some issues about it, such as:
>>>
>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>
>>>
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>
>>>
>>>
>>> Have those issues been resolved?
>>>
>>>
>>>
>>> Where did you get the latest release? It is not in the official Lucene
>>>
>>> sandbox/contrib.
>>>
>>>
>>>
>>> Finally, are you willing to share your extended version to include your
>>>
>>> tweak relating to the MultiSearcher?
>>>
>>>
>>>
>>> Thanks a lot!
>>>
>>>
>>>
>>> Jay
>>>
>>>
>>>
>>> Mark Miller wrote:
>>>
>>>> I use option 3 extensivley and find it very effective. There is a 
>>>> tweak or
>>>
>>>> two required to get it to work right with MultiSearchers, but other 
>>>> than
>>>
>>>> that, the code is great. I have built a lot on top of it. I'm on the 
>>>> list
>>>
>>>> all the time and would be happy to answer any questions you have in
>>> regards
>>>
>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
>>>
>>>
>>>> - Mark
>>>
>>>
>>>
>>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>>
>>>
>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>> throw a
>>>
>>>>> AlreadyClosedException when another thread is trying to update the
>>>
>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>> index is
>>>
>>>>> updated. Searching over the past discussions on this mailing list, I
>>>
>>>>> found several approaches to solve the problem.
>>>
>>>>> 1. use solr
>>>
>>>>> 2. use DelayCloseIndexSearcher
>>>
>>>>> 3. use LuceneIndexAccessor
>>>
>>>
>>>
>>>>> the first one is not feasible for us; some people seemed to have
>>>
>>>>> problems with No. 2 and I do not find a lot of discussions around 
>>>>> No.3.
>>>
>>>
>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>
>>>>> Or do I miss other better solutions?
>>>
>>>
>>>>> Thanks for any suggestion/comment!
>>>
>>>
>>>>> Jay
>>>
>>>
>>>>> ---------------------------------------------------------------------
>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>>
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

The method is synched, but this is because each thread *does* share the 
same Searcher. To maintain a cache of searchers across multiple threads, 
you've got to sync -- to reference count, you've got to sync. The 
performance hit of LuceneIndexAcessor is pretty minimal for its 
functionality, and frankly, for the functionality you want, you have to 
pay a cost. Thats not even the end of it really...your going to need to 
maintain a cache of Accessor objects for each index as well...and if you 
dont know all the indexes at startup time, access to this will also need 
to be synched. I wouldn't worry though -- searches are still lightening 
fast...that won't be the bottleneck. I'll work on getting you some code, 
but if your worried, try some benchmarking on the original code.

Also, to be clear, I don't have the code in front of me, but getting a 
Searcher does not require waiting for a Writer to be released. Searchers 
are cached and resused (and instantly available) until a Writer is 
released. When this happens, the release Writer method waits for all the 
Searchers to return (happens pretty quick as searches are pretty quick), 
the Searcher cache is cleared, and then subsequent calls to getSearcher 
create new Searchers that can see what the Writer added.

The key is use your Writer/Searcher/Reader quickly and then release it 
(unless your bulk loading). I've had such a system with 5+ million docs 
on a standard machine and searches where still well below a second after 
the first Searcher is cached (and even the first search is darn quick). 
And that includes a lot of extra crap I am doing.

- Mark

Jay Yu wrote:
> Mark,
>
> After reading the implementation of LuceneIndexAccessor.getSearcher(),
> I realized that the method is synchronized and wait for 
> writingDirector to be released. That means if we getSearcher for each 
> query in each thread, there might be a contention and performance hit. 
> In fact, even the method of release(searcher) is costly. On the other 
> hand, if multiple threads share share one searcher then it'd defeat the
> purpose of using LuceneIndexAccessor.
> Do I miss sth here? What's your suggested use case for 
> LuceneIndexAccessor?
>
> Thanks!
>
> Jay
> Mark Miller wrote:
>> Ill respond a point at a time:
>>
>> 1.
>>
>> ****************************** Hi Maik,
>>
>> So what happens in this case:
>>
>> IndexAccessProvider accessProvider = new IndexAccessProvider(directory,
>>
>> analyzer);
>>
>> LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);
>>
>> accessor.open();
>>
>> IndexWriter writer = accessor.getWriter();
>>
>> // reference to the same instance?
>>
>> IndexWriter writer2 = accessor.getWriter();
>>
>> writer.addDocument(....);
>>
>> writer2.addDocument(....);
>>
>>
>>
>> // I didn't release the writer yet
>>
>> // will this block?
>>
>> IndexReader reader = accessor.getReader();
>>
>> reader.delete(....);
>>
>> ************
>>
>> This is not really an issue. First, if you are going to delete with a 
>> Reader
>> you need to call getWritingReader and not getReader. When you do 
>> that, the
>> getWritingReader call will block until writer and writer2 are 
>> released. If
>> you are just adding a couple docs before releasing the writers, this 
>> is no
>> problem because the block will be very short. If you are loading tons of
>> docs and you want to be able to delete with a Reader in a timely 
>> manner, you
>> should release the writers every now and then (release and re-get the 
>> Writer
>> every 100 docs or something). An interactive index should not hog the
>> Writer, while something that is just loading a lot could hog the Writer.
>> This is no different than normal…you cannot delete with a Reader while
>> adding with a Writer with Lucene. This code just enforces those 
>> semantics.
>> The best solution is to just use a Writer to delete – I never get a
>> ReadingWriter.
>>
>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>
>> This is no big deal either. I just added another getWriter call that 
>> takes a
>> create Boolean.
>>
>> 3. I don't think there is a latest release. This has never gotten much
>> official attention and is not in the sandbox. I worked straight from the
>> originally submitted code.
>>
>> 4. I will look into getting together some code that I can share. The
>> multisearcher changes that are need are a couple of one liners 
>> really, so at
>> a minimum I will give you the changes needed.
>>
>>
>>
>> -       Mark
>>
>>
>>
>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>
>> Mark,
>>
>>
>>
>> thanks for sharing your insight and experience about 
>> LuceneIndexAccessor!
>>
>> I remember seeing some people reporting some issues about it, such as:
>>
>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>
>>
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>
>>
>>
>> Have those issues been resolved?
>>
>>
>>
>> Where did you get the latest release? It is not in the official Lucene
>>
>> sandbox/contrib.
>>
>>
>>
>> Finally, are you willing to share your extended version to include your
>>
>> tweak relating to the MultiSearcher?
>>
>>
>>
>> Thanks a lot!
>>
>>
>>
>> Jay
>>
>>
>>
>> Mark Miller wrote:
>>
>>> I use option 3 extensivley and find it very effective. There is a 
>>> tweak or
>>
>>> two required to get it to work right with MultiSearchers, but other 
>>> than
>>
>>> that, the code is great. I have built a lot on top of it. I'm on the 
>>> list
>>
>>> all the time and would be happy to answer any questions you have in
>> regards
>>
>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
>>
>>
>>> - Mark
>>
>>
>>
>>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>
>>
>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>> throw a
>>
>>>> AlreadyClosedException when another thread is trying to update the
>>
>>>> underlying IndexReader by closing the shared searcher after the 
>>>> index is
>>
>>>> updated. Searching over the past discussions on this mailing list, I
>>
>>>> found several approaches to solve the problem.
>>
>>>> 1. use solr
>>
>>>> 2. use DelayCloseIndexSearcher
>>
>>>> 3. use LuceneIndexAccessor
>>
>>
>>
>>>> the first one is not feasible for us; some people seemed to have
>>
>>>> problems with No. 2 and I do not find a lot of discussions around 
>>>> No.3.
>>
>>
>>>> I wonder if anyone has good experience on No 2 and 3?
>>
>>>> Or do I miss other better solutions?
>>
>>
>>>> Thanks for any suggestion/comment!
>>
>>
>>>> Jay
>>
>>
>>>> ---------------------------------------------------------------------
>>
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>>
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Mark,

After reading the implementation of LuceneIndexAccessor.getSearcher(),
I realized that the method is synchronized and wait for writingDirector 
to be released. That means if we getSearcher for each query in each 
thread, there might be a contention and performance hit. In fact, even 
the method of release(searcher) is costly. On the other hand, if 
multiple threads share share one searcher then it'd defeat the
purpose of using LuceneIndexAccessor.
Do I miss sth here? What's your suggested use case for 
LuceneIndexAccessor?

Thanks!

Jay
Mark Miller wrote:
> Ill respond a point at a time:
> 
> 1.
> 
> ****************************** Hi Maik,
> 
> So what happens in this case:
> 
> IndexAccessProvider accessProvider = new IndexAccessProvider(directory,
> 
> analyzer);
> 
> LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);
> 
> accessor.open();
> 
> IndexWriter writer = accessor.getWriter();
> 
> // reference to the same instance?
> 
> IndexWriter writer2 = accessor.getWriter();
> 
> writer.addDocument(....);
> 
> writer2.addDocument(....);
> 
> 
> 
> // I didn't release the writer yet
> 
> // will this block?
> 
> IndexReader reader = accessor.getReader();
> 
> reader.delete(....);
> 
> ************
> 
> This is not really an issue. First, if you are going to delete with a Reader
> you need to call getWritingReader and not getReader. When you do that, the
> getWritingReader call will block until writer and writer2 are released. If
> you are just adding a couple docs before releasing the writers, this is no
> problem because the block will be very short. If you are loading tons of
> docs and you want to be able to delete with a Reader in a timely manner, you
> should release the writers every now and then (release and re-get the Writer
> every 100 docs or something). An interactive index should not hog the
> Writer, while something that is just loading a lot could hog the Writer.
> This is no different than normal…you cannot delete with a Reader while
> adding with a Writer with Lucene. This code just enforces those semantics.
> The best solution is to just use a Writer to delete – I never get a
> ReadingWriter.
> 
> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
> 
> This is no big deal either. I just added another getWriter call that takes a
> create Boolean.
> 
> 3. I don't think there is a latest release. This has never gotten much
> official attention and is not in the sandbox. I worked straight from the
> originally submitted code.
> 
> 4. I will look into getting together some code that I can share. The
> multisearcher changes that are need are a couple of one liners really, so at
> a minimum I will give you the changes needed.
> 
> 
> 
> -       Mark
> 
> 
> 
> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
> 
> Mark,
> 
> 
> 
> thanks for sharing your insight and experience about LuceneIndexAccessor!
> 
> I remember seeing some people reporting some issues about it, such as:
> 
> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html
> 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
> 
> 
> 
> Have those issues been resolved?
> 
> 
> 
> Where did you get the latest release? It is not in the official Lucene
> 
> sandbox/contrib.
> 
> 
> 
> Finally, are you willing to share your extended version to include your
> 
> tweak relating to the MultiSearcher?
> 
> 
> 
> Thanks a lot!
> 
> 
> 
> Jay
> 
> 
> 
> Mark Miller wrote:
> 
>> I use option 3 extensivley and find it very effective. There is a tweak or
> 
>> two required to get it to work right with MultiSearchers, but other than
> 
>> that, the code is great. I have built a lot on top of it. I'm on the list
> 
>> all the time and would be happy to answer any questions you have in
> regards
> 
>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
> 
> 
>> - Mark
> 
> 
> 
>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
> 
> 
>>> In a multithread app like web app, a shared IndexSearcher could throw a
> 
>>> AlreadyClosedException when another thread is trying to update the
> 
>>> underlying IndexReader by closing the shared searcher after the index is
> 
>>> updated. Searching over the past discussions on this mailing list, I
> 
>>> found several approaches to solve the problem.
> 
>>> 1. use solr
> 
>>> 2. use DelayCloseIndexSearcher
> 
>>> 3. use LuceneIndexAccessor
> 
> 
> 
>>> the first one is not feasible for us; some people seemed to have
> 
>>> problems with No. 2 and I do not find a lot of discussions around No.3.
> 
> 
>>> I wonder if anyone has good experience on No 2 and 3?
> 
>>> Or do I miss other better solutions?
> 
> 
>>> Thanks for any suggestion/comment!
> 
> 
>>> Jay
> 
> 
>>> ---------------------------------------------------------------------
> 
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> 
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> 
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> 
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Thanks for your detailed explanation of the issues and your solutions.
It seems that LuceneIndexAccessor is worth trying first before I 
implement other locking mechanism to ensure proper order.
I will appreciate it very much if you'd like your extension with us.

Jay

Mark Miller wrote:
> Ill respond a point at a time:
> 
> 1.
> 
> ****************************** Hi Maik,
> 
> So what happens in this case:
> 
> IndexAccessProvider accessProvider = new IndexAccessProvider(directory,
> 
> analyzer);
> 
> LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);
> 
> accessor.open();
> 
> IndexWriter writer = accessor.getWriter();
> 
> // reference to the same instance?
> 
> IndexWriter writer2 = accessor.getWriter();
> 
> writer.addDocument(....);
> 
> writer2.addDocument(....);
> 
> 
> 
> // I didn't release the writer yet
> 
> // will this block?
> 
> IndexReader reader = accessor.getReader();
> 
> reader.delete(....);
> 
> ************
> 
> This is not really an issue. First, if you are going to delete with a Reader
> you need to call getWritingReader and not getReader. When you do that, the
> getWritingReader call will block until writer and writer2 are released. If
> you are just adding a couple docs before releasing the writers, this is no
> problem because the block will be very short. If you are loading tons of
> docs and you want to be able to delete with a Reader in a timely manner, you
> should release the writers every now and then (release and re-get the Writer
> every 100 docs or something). An interactive index should not hog the
> Writer, while something that is just loading a lot could hog the Writer.
> This is no different than normal…you cannot delete with a Reader while
> adding with a Writer with Lucene. This code just enforces those semantics.
> The best solution is to just use a Writer to delete – I never get a
> ReadingWriter.
> 
> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
> 
> This is no big deal either. I just added another getWriter call that takes a
> create Boolean.
> 
> 3. I don't think there is a latest release. This has never gotten much
> official attention and is not in the sandbox. I worked straight from the
> originally submitted code.
> 
> 4. I will look into getting together some code that I can share. The
> multisearcher changes that are need are a couple of one liners really, so at
> a minimum I will give you the changes needed.
> 
> 
> 
> -       Mark
> 
> 
> 
> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
> 
> Mark,
> 
> 
> 
> thanks for sharing your insight and experience about LuceneIndexAccessor!
> 
> I remember seeing some people reporting some issues about it, such as:
> 
> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html
> 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
> 
> 
> 
> Have those issues been resolved?
> 
> 
> 
> Where did you get the latest release? It is not in the official Lucene
> 
> sandbox/contrib.
> 
> 
> 
> Finally, are you willing to share your extended version to include your
> 
> tweak relating to the MultiSearcher?
> 
> 
> 
> Thanks a lot!
> 
> 
> 
> Jay
> 
> 
> 
> Mark Miller wrote:
> 
>> I use option 3 extensivley and find it very effective. There is a tweak or
> 
>> two required to get it to work right with MultiSearchers, but other than
> 
>> that, the code is great. I have built a lot on top of it. I'm on the list
> 
>> all the time and would be happy to answer any questions you have in
> regards
> 
>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
> 
> 
>> - Mark
> 
> 
> 
>> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
> 
> 
>>> In a multithread app like web app, a shared IndexSearcher could throw a
> 
>>> AlreadyClosedException when another thread is trying to update the
> 
>>> underlying IndexReader by closing the shared searcher after the index is
> 
>>> updated. Searching over the past discussions on this mailing list, I
> 
>>> found several approaches to solve the problem.
> 
>>> 1. use solr
> 
>>> 2. use DelayCloseIndexSearcher
> 
>>> 3. use LuceneIndexAccessor
> 
> 
> 
>>> the first one is not feasible for us; some people seemed to have
> 
>>> problems with No. 2 and I do not find a lot of discussions around No.3.
> 
> 
>>> I wonder if anyone has good experience on No 2 and 3?
> 
>>> Or do I miss other better solutions?
> 
> 
>>> Thanks for any suggestion/comment!
> 
> 
>>> Jay
> 
> 
>>> ---------------------------------------------------------------------
> 
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> 
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> 
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> 
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

Ill respond a point at a time:

1.

****************************** Hi Maik,

So what happens in this case:

IndexAccessProvider accessProvider = new IndexAccessProvider(directory,

analyzer);

LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);

accessor.open();

IndexWriter writer = accessor.getWriter();

// reference to the same instance?

IndexWriter writer2 = accessor.getWriter();

writer.addDocument(....);

writer2.addDocument(....);

// I didn't release the writer yet

// will this block?

IndexReader reader = accessor.getReader();

reader.delete(....);

************

This is not really an issue. First, if you are going to delete with a Reader
you need to call getWritingReader and not getReader. When you do that, the
getWritingReader call will block until writer and writer2 are released. If
you are just adding a couple docs before releasing the writers, this is no
problem because the block will be very short. If you are loading tons of
docs and you want to be able to delete with a Reader in a timely manner, you
should release the writers every now and then (release and re-get the Writer
every 100 docs or something). An interactive index should not hog the
Writer, while something that is just loading a lot could hog the Writer.
This is no different than normal…you cannot delete with a Reader while
adding with a Writer with Lucene. This code just enforces those semantics.
The best solution is to just use a Writer to delete – I never get a
ReadingWriter.

2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3

This is no big deal either. I just added another getWriter call that takes a
create Boolean.

3. I don't think there is a latest release. This has never gotten much
official attention and is not in the sandbox. I worked straight from the
originally submitted code.

4. I will look into getting together some code that I can share. The
multisearcher changes that are need are a couple of one liners really, so at
a minimum I will give you the changes needed.

-       Mark

On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:

Mark,

thanks for sharing your insight and experience about LuceneIndexAccessor!

I remember seeing some people reporting some issues about it, such as:

http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html

http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3

Have those issues been resolved?

Where did you get the latest release? It is not in the official Lucene

sandbox/contrib.

Finally, are you willing to share your extended version to include your

tweak relating to the MultiSearcher?

Thanks a lot!

Jay

Mark Miller wrote:

> I use option 3 extensivley and find it very effective. There is a tweak or

> two required to get it to work right with MultiSearchers, but other than

> that, the code is great. I have built a lot on top of it. I'm on the list

> all the time and would be happy to answer any questions you have in
regards

> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.

>

> - Mark

>

>

> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:

>>

>> In a multithread app like web app, a shared IndexSearcher could throw a

>> AlreadyClosedException when another thread is trying to update the

>> underlying IndexReader by closing the shared searcher after the index is

>> updated. Searching over the past discussions on this mailing list, I

>> found several approaches to solve the problem.

>> 1. use solr

>> 2. use DelayCloseIndexSearcher

>> 3. use LuceneIndexAccessor

>>

>>

>> the first one is not feasible for us; some people seemed to have

>> problems with No. 2 and I do not find a lot of discussions around No.3.

>>

>> I wonder if anyone has good experience on No 2 and 3?

>> Or do I miss other better solutions?

>>

>> Thanks for any suggestion/comment!

>>

>> Jay

>>

>> ---------------------------------------------------------------------

>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

>> For additional commands, e-mail: java-user-help@lucene.apache.org

>>

>>

>

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Jay Yu <yu...@AI.SRI.COM>.

Mark,

thanks for sharing your insight and experience about LuceneIndexAccessor!
I remember seeing some people reporting some issues about it, such as:
http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html
http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3

Have those issues been resolved?

Where did you get the latest release? It is not in the official Lucene 
sandbox/contrib.

Finally, are you willing to share your extended version to include your 
tweak relating to the MultiSearcher?

Thanks a lot!

Jay

Mark Miller wrote:
> I use option 3 extensivley and find it very effective. There is a tweak or
> two required to get it to work right with MultiSearchers, but other than
> that, the code is great. I have built a lot on top of it. I'm on the list
> all the time and would be happy to answer any questions you have in regards
> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
> 
> - Mark
> 
> 
> On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>>
>> In a multithread app like web app, a shared IndexSearcher could throw a
>> AlreadyClosedException when another thread is trying to update the
>> underlying IndexReader by closing the shared searcher after the index is
>> updated. Searching over the past discussions on this mailing list, I
>> found several approaches to solve the problem.
>> 1. use solr
>> 2. use DelayCloseIndexSearcher
>> 3. use LuceneIndexAccessor
>>
>>
>> the first one is not feasible for us; some people seemed to have
>> problems with No. 2 and I do not find a lot of discussions around No.3.
>>
>> I wonder if anyone has good experience on No 2 and 3?
>> Or do I miss other better solutions?
>>
>> Thanks for any suggestion/comment!
>>
>> Jay
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: thread safe shared IndexSearcher

Posted by Mark Miller <ma...@gmail.com>.

I use option 3 extensivley and find it very effective. There is a tweak or
two required to get it to work right with MultiSearchers, but other than
that, the code is great. I have built a lot on top of it. I'm on the list
all the time and would be happy to answer any questions you have in regards
to LuceneIndexAccessor. Frankly, I think its overlooked far too much.

- Mark


On 9/19/07, Jay Yu <yu...@ai.sri.com> wrote:
>
>
> In a multithread app like web app, a shared IndexSearcher could throw a
> AlreadyClosedException when another thread is trying to update the
> underlying IndexReader by closing the shared searcher after the index is
> updated. Searching over the past discussions on this mailing list, I
> found several approaches to solve the problem.
> 1. use solr
> 2. use DelayCloseIndexSearcher
> 3. use LuceneIndexAccessor
>
>
> the first one is not feasible for us; some people seemed to have
> problems with No. 2 and I do not find a lot of discussions around No.3.
>
> I wonder if anyone has good experience on No 2 and 3?
> Or do I miss other better solutions?
>
> Thanks for any suggestion/comment!
>
> Jay
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>