You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/04/20 22:51:31 UTC

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com> wrote:
> This issue started on java-user, but I am moving it to solr-dev:
> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>
> I am using solr trunk and building an RTree from stored document fields.
>  This process worked fine until a recent change in 2.9 that has different
> document id strategy then I was used to.
>
> In that thread, Yonik suggested:
> - pop back to the top level from the sub-reader, if you really need a single
> set
> - if a set-per-reader will work, then cache per segment (better for
> incremental updates anyway)
>
> I'm not quite sure what you mean by a "set-per-reader".

I meant RTree per reader (per segment reader).

>  Previously I was
> building a single RTree and using it until the the last modified time had
> changed.  This avoided building an index anytime a new reader was opened and
> the index had not changed.

I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and do
that yourself.

> I'm fine building a new RTree for each reader if
> that is required.

If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.

> Is there any existing code that deals with this situation?

To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for ExternalFileField.
 See FileFloatSource.getValues() for the implementation.

> - - - -
>
> Yonik also suggested:
>
>  Relatively new in 2.9, you can pass null to enumerate over all non-deleted
> docs:
>  TermDocs td = reader.termDocs(null);
>
>  It would probably be a lot faster to iterate over indexed values though.
>
> If I iterate of indexed values (from the FieldCache i presume) then how do i
> get access to the document id?

IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.

-Yonik

Re: lucene 2.9 migration issues

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Apr 24, 2009 at 2:42 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> This is LUCENE-1573 (properly responding to Thread.interrupt()), which
> is new in 2.9.
>
> Do you know what would have interrupted the thread in your context?

Hmmm, nothing in Solr proper that calls interrupt on another thread.

-Yonik
http://www.lucidimagination.com

Re: lucene 2.9 migration issues

Posted by Ryan McKinley <ry...@gmail.com>.

I found it...   it was something that I guess was just ignored before.

All fixed.

Thanks!
ryan


On Apr 24, 2009, at 2:42 PM, Michael McCandless wrote:

> This is LUCENE-1573 (properly responding to Thread.interrupt()), which
> is new in 2.9.
>
> Do you know what would have interrupted the thread in your context?
>
> Mike
>
> On Fri, Apr 24, 2009 at 2:25 PM, Ryan McKinley <ry...@gmail.com>  
> wrote:
>> thanks!   I'm not sure this is related to 2.9, but I have not seen  
>> this
>> before either....
>>
>> 2009-04-24 14:19:53,024 ERROR org.apache.solr.core.SolrCore -
>> java.lang.RuntimeException: java.lang.InterruptedException: sleep
>> interrupted
>>        at
>> org.apache.lucene.index.SegmentInfos 
>> $FindSegmentsFile.run(SegmentInfos.java:620)
>>        at  
>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:289)
>>        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java: 
>> 1456)
>>        at  
>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1295)
>>        at
>> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java: 
>> 159)
>>        at
>> org 
>> .apache 
>> .solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java: 
>> 123)
>>        at
>> org 
>> .apache 
>> .solr 
>> .update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java: 
>> 170)
>>        at
>> org 
>> .apache 
>> .solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java: 
>> 217)
>>        at
>> org 
>> .apache 
>> .solr 
>> .update 
>> .processor 
>> .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>>        at
>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>>        at
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .ContentStreamHandlerBase 
>> .handleRequestBody(ContentStreamHandlerBase.java:54)
>>        at
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>        at
>> org 
>> .apache 
>> .solr 
>> .client 
>> .solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 
>> 139)
>>        at
>> org 
>> .apache 
>> .solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 
>> 259)
>>        at  
>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
>>
>> Any thoughts what could cause that?  This is using the embedded solr
>> server...
>>
>>
>> On Apr 24, 2009, at 12:10 PM, Shalin Shekhar Mangar wrote:
>>
>>> On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley <ry...@gmail.com>  
>>> wrote:
>>>
>>>> Yes, that would be great!  the changes we need are in rev 768275:
>>>> http://svn.apache.org/viewvc?view=rev&revision=768275
>>>>
>>>
>>> Done. I upgraded to r768336.
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>
>>

Re: lucene 2.9 migration issues

Posted by Michael McCandless <lu...@mikemccandless.com>.

This is LUCENE-1573 (properly responding to Thread.interrupt()), which
is new in 2.9.

Do you know what would have interrupted the thread in your context?

Mike

On Fri, Apr 24, 2009 at 2:25 PM, Ryan McKinley <ry...@gmail.com> wrote:
> thanks!   I'm not sure this is related to 2.9, but I have not seen this
> before either....
>
> 2009-04-24 14:19:53,024 ERROR org.apache.solr.core.SolrCore -
> java.lang.RuntimeException: java.lang.InterruptedException: sleep
> interrupted
>        at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:620)
>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:289)
>        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1456)
>        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1295)
>        at
> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:159)
>        at
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
>        at
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:170)
>        at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>        at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>        at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>        at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
>        at
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
>
> Any thoughts what could cause that?  This is using the embedded solr
> server...
>
>
> On Apr 24, 2009, at 12:10 PM, Shalin Shekhar Mangar wrote:
>
>> On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley <ry...@gmail.com> wrote:
>>
>>> Yes, that would be great!  the changes we need are in rev 768275:
>>> http://svn.apache.org/viewvc?view=rev&revision=768275
>>>
>>
>> Done. I upgraded to r768336.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
>

Re: lucene 2.9 migration issues

Posted by Ryan McKinley <ry...@gmail.com>.

thanks!   I'm not sure this is related to 2.9, but I have not seen  
this before either....

2009-04-24 14:19:53,024 ERROR org.apache.solr.core.SolrCore -  
java.lang.RuntimeException: java.lang.InterruptedException: sleep  
interrupted
	at org.apache.lucene.index.SegmentInfos 
$FindSegmentsFile.run(SegmentInfos.java:620)
	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:289)
	at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1456)
	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1295)
	at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java: 
159)
	at  
org 
.apache 
.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
	at  
org 
.apache 
.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java: 
170)
	at  
org 
.apache 
.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
	at  
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
	at  
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
	at  
org 
.apache 
.solr 
.client 
.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
	at  
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)

Any thoughts what could cause that?  This is using the embedded solr  
server...


On Apr 24, 2009, at 12:10 PM, Shalin Shekhar Mangar wrote:

> On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley <ry...@gmail.com>  
> wrote:
>
>> Yes, that would be great!  the changes we need are in rev 768275:
>> http://svn.apache.org/viewvc?view=rev&revision=768275
>>
>
> Done. I upgraded to r768336.
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley <ry...@gmail.com> wrote:

> Yes, that would be great!  the changes we need are in rev 768275:
> http://svn.apache.org/viewvc?view=rev&revision=768275
>

Done. I upgraded to r768336.

-- 
Regards,
Shalin Shekhar Mangar.

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Ryan McKinley <ry...@gmail.com>.

Yes, that would be great!  the changes we need are in rev 768275:
http://svn.apache.org/viewvc?view=rev&revision=768275

thanks



On Apr 24, 2009, at 11:23 AM, Shalin Shekhar Mangar wrote:

> Yes, I upgraded the lucene jars a few hours ago for trie api  
> updates. Do you
> want me to upgrade them again?
>
> On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller <ma...@gmail.com>  
> wrote:
>
>> I think Shalin upgraded the jars this morning, so I'd just grab  
>> them again
>> real quick.
>>
>> 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228
>>
>>
>> Ryan McKinley wrote:
>>
>>> thanks Mark!
>>>
>>> how far is lucene /trunk from what is currently in solr?
>>>
>>> Is it something we should consider upgrading?
>>>
>>>
>>> On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:
>>>
>>> I just committed a fix Ryan - should work with upgraded Lucene jars.
>>>>
>>>> - Mark
>>>>
>>>> Ryan McKinley wrote:
>>>>
>>>>> thanks!
>>>>>
>>>>>
>>>>> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>>>>>
>>>>> Looks like its my fault. Auto resolution was moved upto  
>>>>> IndexSearcher
>>>>>> in Lucene, and it looks like SolrIndexSearcher is not tickling  
>>>>>> it first.
>>>>>> I'll take a look.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> Ryan McKinley wrote:
>>>>>>
>>>>>>> Ok, not totally resolved....
>>>>>>>
>>>>>>> Things work fine when I have my custom Filter alone or with  
>>>>>>> other
>>>>>>> Filters, however if I add a query string to the mix it breaks  
>>>>>>> with an
>>>>>>> IllegalStateException:
>>>>>>>
>>>>>>> java.lang.IllegalStateException: Auto should be resolved  
>>>>>>> before now
>>>>>>> at
>>>>>>> org.apache.lucene.search.FieldSortedHitQueue 
>>>>>>> $1.createValue(FieldSortedHitQueue.java:216)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.lucene.search.FieldCacheImpl 
>>>>>>> $Cache.get(FieldCacheImpl.java:73)
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .search 
>>>>>>> .FieldSortedHitQueue 
>>>>>>> .getCachedComparator(FieldSortedHitQueue.java:168)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .search 
>>>>>>> .SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
>>>>>>> 1214)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
>>>>>>> 924)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
>>>>>>> 345)
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .handler.component.QueryComponent.process(QueryComponent.java: 
>>>>>>> 171)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .handler 
>>>>>>> .component.SearchHandler.handleRequestBody(SearchHandler.java: 
>>>>>>> 195)
>>>>>>>
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .handler 
>>>>>>> .RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>>>>>
>>>>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>>>>> at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .solr 
>>>>>>> .servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>>>>>
>>>>>>>
>>>>>>> This is for a query:
>>>>>>> /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>>>>>> bounds=XXX triggers my custom filter to kick in.
>>>>>>>
>>>>>>> Any thoughts where to look?  This error is new since upgrading  
>>>>>>> the
>>>>>>> lucene libs (in recent solr)
>>>>>>>
>>>>>>> Thanks!
>>>>>>> ryan
>>>>>>>
>>>>>>>
>>>>>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>>>>>
>>>>>>> thanks!
>>>>>>>>
>>>>>>>> everything got better when I removed my logic to cache based  
>>>>>>>> on the
>>>>>>>> index modification time.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>>>>>
>>>>>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ryantxu@gmail.com 
>>>>>>>> >
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> This issue started on java-user, but I am moving it to solr- 
>>>>>>>>>> dev:
>>>>>>>>>>
>>>>>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>>>>>>>>>
>>>>>>>>>> I am using solr trunk and building an RTree from stored  
>>>>>>>>>> document
>>>>>>>>>> fields.
>>>>>>>>>> This process worked fine until a recent change in 2.9 that  
>>>>>>>>>> has
>>>>>>>>>> different
>>>>>>>>>> document id strategy then I was used to.
>>>>>>>>>>
>>>>>>>>>> In that thread, Yonik suggested:
>>>>>>>>>> - pop back to the top level from the sub-reader, if you  
>>>>>>>>>> really need
>>>>>>>>>> a single
>>>>>>>>>> set
>>>>>>>>>> - if a set-per-reader will work, then cache per segment  
>>>>>>>>>> (better for
>>>>>>>>>> incremental updates anyway)
>>>>>>>>>>
>>>>>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I meant RTree per reader (per segment reader).
>>>>>>>>>
>>>>>>>>> Previously I was
>>>>>>>>>> building a single RTree and using it until the the last  
>>>>>>>>>> modified
>>>>>>>>>> time had
>>>>>>>>>> changed.  This avoided building an index anytime a new  
>>>>>>>>>> reader was
>>>>>>>>>> opened and
>>>>>>>>>> the index had not changed.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I *think* that our use of re-open will return the same  
>>>>>>>>> IndexReader
>>>>>>>>> instance if nothing has changed... so you shouldn't have to  
>>>>>>>>> try and
>>>>>>>>> do
>>>>>>>>> that yourself.
>>>>>>>>>
>>>>>>>>> I'm fine building a new RTree for each reader if
>>>>>>>>>> that is required.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If that works just as well, it will put you in a better  
>>>>>>>>> position for
>>>>>>>>> faster incremental updates... new RTrees will be built only  
>>>>>>>>> for
>>>>>>>>> those
>>>>>>>>> segments that have changed.
>>>>>>>>>
>>>>>>>>> Is there any existing code that deals with this situation?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To cache an RTree per reader, you could use the same logic as
>>>>>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>>>>>
>>>>>>>>> If a single top-level RTree that covers the entire index works
>>>>>>>>> better
>>>>>>>>> for you, then you can cache the RTree based on the top level  
>>>>>>>>> multi
>>>>>>>>> reader and translate the ids... that was my fix for
>>>>>>>>> ExternalFileField.
>>>>>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - - - -
>>>>>>>>>>
>>>>>>>>>> Yonik also suggested:
>>>>>>>>>>
>>>>>>>>>> Relatively new in 2.9, you can pass null to enumerate over  
>>>>>>>>>> all
>>>>>>>>>> non-deleted
>>>>>>>>>> docs:
>>>>>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>>>>>
>>>>>>>>>> It would probably be a lot faster to iterate over indexed  
>>>>>>>>>> values
>>>>>>>>>> though.
>>>>>>>>>>
>>>>>>>>>> If I iterate of indexed values (from the FieldCache i  
>>>>>>>>>> presume) then
>>>>>>>>>> how do i
>>>>>>>>>> get access to the document id?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> IndexReader.terms(Term t) returns a TermEnum that can  
>>>>>>>>> iterate over
>>>>>>>>> terms, starting at t.
>>>>>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you  
>>>>>>>>> the list
>>>>>>>>> of
>>>>>>>>> documents that match a term.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Yonik
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Yes, I upgraded the lucene jars a few hours ago for trie api updates. Do you
want me to upgrade them again?

On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller <ma...@gmail.com> wrote:

> I think Shalin upgraded the jars this morning, so I'd just grab them again
> real quick.
>
> 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228
>
>
> Ryan McKinley wrote:
>
>> thanks Mark!
>>
>> how far is lucene /trunk from what is currently in solr?
>>
>> Is it something we should consider upgrading?
>>
>>
>> On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:
>>
>>  I just committed a fix Ryan - should work with upgraded Lucene jars.
>>>
>>> - Mark
>>>
>>> Ryan McKinley wrote:
>>>
>>>> thanks!
>>>>
>>>>
>>>> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>>>>
>>>>  Looks like its my fault. Auto resolution was moved upto IndexSearcher
>>>>> in Lucene, and it looks like SolrIndexSearcher is not tickling it first.
>>>>> I'll take a look.
>>>>>
>>>>> - Mark
>>>>>
>>>>> Ryan McKinley wrote:
>>>>>
>>>>>> Ok, not totally resolved....
>>>>>>
>>>>>> Things work fine when I have my custom Filter alone or with other
>>>>>> Filters, however if I add a query string to the mix it breaks with an
>>>>>> IllegalStateException:
>>>>>>
>>>>>> java.lang.IllegalStateException: Auto should be resolved before now
>>>>>>  at
>>>>>> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216)
>>>>>>
>>>>>>  at
>>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)
>>>>>>  at
>>>>>> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
>>>>>>
>>>>>>  at
>>>>>> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)
>>>>>>
>>>>>>  at
>>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)
>>>>>>
>>>>>>  at
>>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)
>>>>>>
>>>>>>  at
>>>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)
>>>>>>  at
>>>>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171)
>>>>>>
>>>>>>  at
>>>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>>>>>
>>>>>>  at
>>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>>>>
>>>>>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>>>>  at
>>>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>>>>
>>>>>>
>>>>>> This is for a query:
>>>>>> /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>>>>> bounds=XXX triggers my custom filter to kick in.
>>>>>>
>>>>>> Any thoughts where to look?  This error is new since upgrading the
>>>>>> lucene libs (in recent solr)
>>>>>>
>>>>>> Thanks!
>>>>>> ryan
>>>>>>
>>>>>>
>>>>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>>>>
>>>>>>  thanks!
>>>>>>>
>>>>>>> everything got better when I removed my logic to cache based on the
>>>>>>> index modification time.
>>>>>>>
>>>>>>>
>>>>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>>>>
>>>>>>>  On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>>>>>>
>>>>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>>>>>>>>
>>>>>>>>> I am using solr trunk and building an RTree from stored document
>>>>>>>>> fields.
>>>>>>>>> This process worked fine until a recent change in 2.9 that has
>>>>>>>>> different
>>>>>>>>> document id strategy then I was used to.
>>>>>>>>>
>>>>>>>>> In that thread, Yonik suggested:
>>>>>>>>> - pop back to the top level from the sub-reader, if you really need
>>>>>>>>> a single
>>>>>>>>> set
>>>>>>>>> - if a set-per-reader will work, then cache per segment (better for
>>>>>>>>> incremental updates anyway)
>>>>>>>>>
>>>>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>>>>>
>>>>>>>>
>>>>>>>> I meant RTree per reader (per segment reader).
>>>>>>>>
>>>>>>>>  Previously I was
>>>>>>>>> building a single RTree and using it until the the last modified
>>>>>>>>> time had
>>>>>>>>> changed.  This avoided building an index anytime a new reader was
>>>>>>>>> opened and
>>>>>>>>> the index had not changed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I *think* that our use of re-open will return the same IndexReader
>>>>>>>> instance if nothing has changed... so you shouldn't have to try and
>>>>>>>> do
>>>>>>>> that yourself.
>>>>>>>>
>>>>>>>>  I'm fine building a new RTree for each reader if
>>>>>>>>> that is required.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If that works just as well, it will put you in a better position for
>>>>>>>> faster incremental updates... new RTrees will be built only for
>>>>>>>> those
>>>>>>>> segments that have changed.
>>>>>>>>
>>>>>>>>  Is there any existing code that deals with this situation?
>>>>>>>>>
>>>>>>>>
>>>>>>>> To cache an RTree per reader, you could use the same logic as
>>>>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>>>>
>>>>>>>> If a single top-level RTree that covers the entire index works
>>>>>>>> better
>>>>>>>> for you, then you can cache the RTree based on the top level multi
>>>>>>>> reader and translate the ids... that was my fix for
>>>>>>>> ExternalFileField.
>>>>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>>>>
>>>>>>>>
>>>>>>>>  - - - -
>>>>>>>>>
>>>>>>>>> Yonik also suggested:
>>>>>>>>>
>>>>>>>>> Relatively new in 2.9, you can pass null to enumerate over all
>>>>>>>>> non-deleted
>>>>>>>>> docs:
>>>>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>>>>
>>>>>>>>> It would probably be a lot faster to iterate over indexed values
>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>> If I iterate of indexed values (from the FieldCache i presume) then
>>>>>>>>> how do i
>>>>>>>>> get access to the document id?
>>>>>>>>>
>>>>>>>>
>>>>>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>>>>>>>> terms, starting at t.
>>>>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the list
>>>>>>>> of
>>>>>>>> documents that match a term.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Yonik
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Mark Miller <ma...@gmail.com>.

I think Shalin upgraded the jars this morning, so I'd just grab them 
again real quick.

4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228

Ryan McKinley wrote:
> thanks Mark!
>
> how far is lucene /trunk from what is currently in solr?
>
> Is it something we should consider upgrading?
>
>
> On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:
>
>> I just committed a fix Ryan - should work with upgraded Lucene jars.
>>
>> - Mark
>>
>> Ryan McKinley wrote:
>>> thanks!
>>>
>>>
>>> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>>>
>>>> Looks like its my fault. Auto resolution was moved upto 
>>>> IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not 
>>>> tickling it first. I'll take a look.
>>>>
>>>> - Mark
>>>>
>>>> Ryan McKinley wrote:
>>>>> Ok, not totally resolved....
>>>>>
>>>>> Things work fine when I have my custom Filter alone or with other 
>>>>> Filters, however if I add a query string to the mix it breaks with 
>>>>> an IllegalStateException:
>>>>>
>>>>> java.lang.IllegalStateException: Auto should be resolved before now
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 
>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) 
>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 
>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 
>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 
>>>>>
>>>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>>>   at 
>>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 
>>>>>
>>>>>
>>>>> This is for a query:
>>>>> /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>>>> bounds=XXX triggers my custom filter to kick in.
>>>>>
>>>>> Any thoughts where to look?  This error is new since upgrading the 
>>>>> lucene libs (in recent solr)
>>>>>
>>>>> Thanks!
>>>>> ryan
>>>>>
>>>>>
>>>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>>>
>>>>>> thanks!
>>>>>>
>>>>>> everything got better when I removed my logic to cache based on 
>>>>>> the index modification time.
>>>>>>
>>>>>>
>>>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>>>
>>>>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley 
>>>>>>> <ry...@gmail.com> wrote:
>>>>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 
>>>>>>>>
>>>>>>>>
>>>>>>>> I am using solr trunk and building an RTree from stored 
>>>>>>>> document fields.
>>>>>>>> This process worked fine until a recent change in 2.9 that has 
>>>>>>>> different
>>>>>>>> document id strategy then I was used to.
>>>>>>>>
>>>>>>>> In that thread, Yonik suggested:
>>>>>>>> - pop back to the top level from the sub-reader, if you really 
>>>>>>>> need a single
>>>>>>>> set
>>>>>>>> - if a set-per-reader will work, then cache per segment (better 
>>>>>>>> for
>>>>>>>> incremental updates anyway)
>>>>>>>>
>>>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>>>
>>>>>>> I meant RTree per reader (per segment reader).
>>>>>>>
>>>>>>>> Previously I was
>>>>>>>> building a single RTree and using it until the the last 
>>>>>>>> modified time had
>>>>>>>> changed.  This avoided building an index anytime a new reader 
>>>>>>>> was opened and
>>>>>>>> the index had not changed.
>>>>>>>
>>>>>>> I *think* that our use of re-open will return the same IndexReader
>>>>>>> instance if nothing has changed... so you shouldn't have to try 
>>>>>>> and do
>>>>>>> that yourself.
>>>>>>>
>>>>>>>> I'm fine building a new RTree for each reader if
>>>>>>>> that is required.
>>>>>>>
>>>>>>> If that works just as well, it will put you in a better position 
>>>>>>> for
>>>>>>> faster incremental updates... new RTrees will be built only for 
>>>>>>> those
>>>>>>> segments that have changed.
>>>>>>>
>>>>>>>> Is there any existing code that deals with this situation?
>>>>>>>
>>>>>>> To cache an RTree per reader, you could use the same logic as
>>>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>>>
>>>>>>> If a single top-level RTree that covers the entire index works 
>>>>>>> better
>>>>>>> for you, then you can cache the RTree based on the top level multi
>>>>>>> reader and translate the ids... that was my fix for 
>>>>>>> ExternalFileField.
>>>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>>>
>>>>>>>
>>>>>>>> - - - -
>>>>>>>>
>>>>>>>> Yonik also suggested:
>>>>>>>>
>>>>>>>> Relatively new in 2.9, you can pass null to enumerate over all 
>>>>>>>> non-deleted
>>>>>>>> docs:
>>>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>>>
>>>>>>>> It would probably be a lot faster to iterate over indexed 
>>>>>>>> values though.
>>>>>>>>
>>>>>>>> If I iterate of indexed values (from the FieldCache i presume) 
>>>>>>>> then how do i
>>>>>>>> get access to the document id?
>>>>>>>
>>>>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>>>>>>> terms, starting at t.
>>>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the 
>>>>>>> list of
>>>>>>> documents that match a term.
>>>>>>>
>>>>>>>
>>>>>>> -Yonik
>>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>
>>
>>
>> -- 
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>


-- 
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Ryan McKinley <ry...@gmail.com>.

thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?


On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:

> I just committed a fix Ryan - should work with upgraded Lucene jars.
>
> - Mark
>
> Ryan McKinley wrote:
>> thanks!
>>
>>
>> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>>
>>> Looks like its my fault. Auto resolution was moved upto  
>>> IndexSearcher in Lucene, and it looks like SolrIndexSearcher is  
>>> not tickling it first. I'll take a look.
>>>
>>> - Mark
>>>
>>> Ryan McKinley wrote:
>>>> Ok, not totally resolved....
>>>>
>>>> Things work fine when I have my custom Filter alone or with other  
>>>> Filters, however if I add a query string to the mix it breaks  
>>>> with an IllegalStateException:
>>>>
>>>> java.lang.IllegalStateException: Auto should be resolved before now
>>>>   at org.apache.lucene.search.FieldSortedHitQueue 
>>>> $1.createValue(FieldSortedHitQueue.java:216)
>>>>   at org.apache.lucene.search.FieldCacheImpl 
>>>> $Cache.get(FieldCacheImpl.java:73)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .search 
>>>> .FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 
>>>> 168)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .search 
>>>> .SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
>>>> 924)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler.component.QueryComponent.process(QueryComponent.java:171)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler 
>>>> .component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>>>> 131)
>>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>>   at  
>>>> org 
>>>> .apache 
>>>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
>>>> 303)
>>>>
>>>> This is for a query:
>>>> /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>>> bounds=XXX triggers my custom filter to kick in.
>>>>
>>>> Any thoughts where to look?  This error is new since upgrading  
>>>> the lucene libs (in recent solr)
>>>>
>>>> Thanks!
>>>> ryan
>>>>
>>>>
>>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>>
>>>>> thanks!
>>>>>
>>>>> everything got better when I removed my logic to cache based on  
>>>>> the index modification time.
>>>>>
>>>>>
>>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>>
>>>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley  
>>>>>> <ry...@gmail.com> wrote:
>>>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>>>>>>
>>>>>>> I am using solr trunk and building an RTree from stored  
>>>>>>> document fields.
>>>>>>> This process worked fine until a recent change in 2.9 that has  
>>>>>>> different
>>>>>>> document id strategy then I was used to.
>>>>>>>
>>>>>>> In that thread, Yonik suggested:
>>>>>>> - pop back to the top level from the sub-reader, if you really  
>>>>>>> need a single
>>>>>>> set
>>>>>>> - if a set-per-reader will work, then cache per segment  
>>>>>>> (better for
>>>>>>> incremental updates anyway)
>>>>>>>
>>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>>
>>>>>> I meant RTree per reader (per segment reader).
>>>>>>
>>>>>>> Previously I was
>>>>>>> building a single RTree and using it until the the last  
>>>>>>> modified time had
>>>>>>> changed.  This avoided building an index anytime a new reader  
>>>>>>> was opened and
>>>>>>> the index had not changed.
>>>>>>
>>>>>> I *think* that our use of re-open will return the same  
>>>>>> IndexReader
>>>>>> instance if nothing has changed... so you shouldn't have to try  
>>>>>> and do
>>>>>> that yourself.
>>>>>>
>>>>>>> I'm fine building a new RTree for each reader if
>>>>>>> that is required.
>>>>>>
>>>>>> If that works just as well, it will put you in a better  
>>>>>> position for
>>>>>> faster incremental updates... new RTrees will be built only for  
>>>>>> those
>>>>>> segments that have changed.
>>>>>>
>>>>>>> Is there any existing code that deals with this situation?
>>>>>>
>>>>>> To cache an RTree per reader, you could use the same logic as
>>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>>
>>>>>> If a single top-level RTree that covers the entire index works  
>>>>>> better
>>>>>> for you, then you can cache the RTree based on the top level  
>>>>>> multi
>>>>>> reader and translate the ids... that was my fix for  
>>>>>> ExternalFileField.
>>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>>
>>>>>>
>>>>>>> - - - -
>>>>>>>
>>>>>>> Yonik also suggested:
>>>>>>>
>>>>>>> Relatively new in 2.9, you can pass null to enumerate over all  
>>>>>>> non-deleted
>>>>>>> docs:
>>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>>
>>>>>>> It would probably be a lot faster to iterate over indexed  
>>>>>>> values though.
>>>>>>>
>>>>>>> If I iterate of indexed values (from the FieldCache i presume)  
>>>>>>> then how do i
>>>>>>> get access to the document id?
>>>>>>
>>>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate  
>>>>>> over
>>>>>> terms, starting at t.
>>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the  
>>>>>> list of
>>>>>> documents that match a term.
>>>>>>
>>>>>>
>>>>>> -Yonik
>>>>>
>>>>
>>>
>>>
>>> -- 
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Mark Miller <ma...@gmail.com>.

I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:
> thanks!
>
>
> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>
>> Looks like its my fault. Auto resolution was moved upto IndexSearcher 
>> in Lucene, and it looks like SolrIndexSearcher is not tickling it 
>> first. I'll take a look.
>>
>> - Mark
>>
>> Ryan McKinley wrote:
>>> Ok, not totally resolved....
>>>
>>> Things work fine when I have my custom Filter alone or with other 
>>> Filters, however if I add a query string to the mix it breaks with 
>>> an IllegalStateException:
>>>
>>> java.lang.IllegalStateException: Auto should be resolved before now
>>>    at 
>>> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 
>>>
>>>    at 
>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) 
>>>
>>>    at 
>>> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 
>>>
>>>    at 
>>> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58) 
>>>
>>>    at 
>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 
>>>
>>>    at 
>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 
>>>
>>>    at 
>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 
>>>
>>>    at 
>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 
>>>
>>>    at 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 
>>>
>>>    at 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 
>>>
>>>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>    at 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 
>>>
>>>
>>> This is for a query:
>>>  /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>> bounds=XXX triggers my custom filter to kick in.
>>>
>>> Any thoughts where to look?  This error is new since upgrading the 
>>> lucene libs (in recent solr)
>>>
>>> Thanks!
>>> ryan
>>>
>>>
>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>
>>>> thanks!
>>>>
>>>> everything got better when I removed my logic to cache based on the 
>>>> index modification time.
>>>>
>>>>
>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>
>>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com> 
>>>>> wrote:
>>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 
>>>>>>
>>>>>>
>>>>>> I am using solr trunk and building an RTree from stored document 
>>>>>> fields.
>>>>>> This process worked fine until a recent change in 2.9 that has 
>>>>>> different
>>>>>> document id strategy then I was used to.
>>>>>>
>>>>>> In that thread, Yonik suggested:
>>>>>> - pop back to the top level from the sub-reader, if you really 
>>>>>> need a single
>>>>>> set
>>>>>> - if a set-per-reader will work, then cache per segment (better for
>>>>>> incremental updates anyway)
>>>>>>
>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>
>>>>> I meant RTree per reader (per segment reader).
>>>>>
>>>>>> Previously I was
>>>>>> building a single RTree and using it until the the last modified 
>>>>>> time had
>>>>>> changed.  This avoided building an index anytime a new reader was 
>>>>>> opened and
>>>>>> the index had not changed.
>>>>>
>>>>> I *think* that our use of re-open will return the same IndexReader
>>>>> instance if nothing has changed... so you shouldn't have to try 
>>>>> and do
>>>>> that yourself.
>>>>>
>>>>>> I'm fine building a new RTree for each reader if
>>>>>> that is required.
>>>>>
>>>>> If that works just as well, it will put you in a better position for
>>>>> faster incremental updates... new RTrees will be built only for those
>>>>> segments that have changed.
>>>>>
>>>>>> Is there any existing code that deals with this situation?
>>>>>
>>>>> To cache an RTree per reader, you could use the same logic as
>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>
>>>>> If a single top-level RTree that covers the entire index works better
>>>>> for you, then you can cache the RTree based on the top level multi
>>>>> reader and translate the ids... that was my fix for 
>>>>> ExternalFileField.
>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>
>>>>>
>>>>>> - - - -
>>>>>>
>>>>>> Yonik also suggested:
>>>>>>
>>>>>> Relatively new in 2.9, you can pass null to enumerate over all 
>>>>>> non-deleted
>>>>>> docs:
>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>
>>>>>> It would probably be a lot faster to iterate over indexed values 
>>>>>> though.
>>>>>>
>>>>>> If I iterate of indexed values (from the FieldCache i presume) 
>>>>>> then how do i
>>>>>> get access to the document id?
>>>>>
>>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>>>>> terms, starting at t.
>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the list of
>>>>> documents that match a term.
>>>>>
>>>>>
>>>>> -Yonik
>>>>
>>>
>>
>>
>> -- 
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>


-- 
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Ryan McKinley <ry...@gmail.com>.

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

> Looks like its my fault. Auto resolution was moved upto  
> IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not  
> tickling it first. I'll take a look.
>
> - Mark
>
> Ryan McKinley wrote:
>> Ok, not totally resolved....
>>
>> Things work fine when I have my custom Filter alone or with other  
>> Filters, however if I add a query string to the mix it breaks with  
>> an IllegalStateException:
>>
>> java.lang.IllegalStateException: Auto should be resolved before now
>>    at org.apache.lucene.search.FieldSortedHitQueue 
>> $1.createValue(FieldSortedHitQueue.java:216)
>>    at org.apache.lucene.search.FieldCacheImpl 
>> $Cache.get(FieldCacheImpl.java:73)
>>    at  
>> org 
>> .apache 
>> .lucene 
>> .search 
>> .FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 
>> 168)
>>    at  
>> org 
>> .apache 
>> .lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java: 
>> 58)
>>    at  
>> org 
>> .apache 
>> .solr 
>> .search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
>> 1214)
>>    at  
>> org 
>> .apache 
>> .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
>> 924)
>>    at  
>> org 
>> .apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
>> 345)
>>    at  
>> org 
>> .apache 
>> .solr.handler.component.QueryComponent.process(QueryComponent.java: 
>> 171)
>>    at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>    at  
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 131)
>>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>    at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>
>> This is for a query:
>>  /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>> bounds=XXX triggers my custom filter to kick in.
>>
>> Any thoughts where to look?  This error is new since upgrading the  
>> lucene libs (in recent solr)
>>
>> Thanks!
>> ryan
>>
>>
>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>
>>> thanks!
>>>
>>> everything got better when I removed my logic to cache based on  
>>> the index modification time.
>>>
>>>
>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>
>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley  
>>>> <ry...@gmail.com> wrote:
>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>>>>
>>>>> I am using solr trunk and building an RTree from stored document  
>>>>> fields.
>>>>> This process worked fine until a recent change in 2.9 that has  
>>>>> different
>>>>> document id strategy then I was used to.
>>>>>
>>>>> In that thread, Yonik suggested:
>>>>> - pop back to the top level from the sub-reader, if you really  
>>>>> need a single
>>>>> set
>>>>> - if a set-per-reader will work, then cache per segment (better  
>>>>> for
>>>>> incremental updates anyway)
>>>>>
>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>
>>>> I meant RTree per reader (per segment reader).
>>>>
>>>>> Previously I was
>>>>> building a single RTree and using it until the the last modified  
>>>>> time had
>>>>> changed.  This avoided building an index anytime a new reader  
>>>>> was opened and
>>>>> the index had not changed.
>>>>
>>>> I *think* that our use of re-open will return the same IndexReader
>>>> instance if nothing has changed... so you shouldn't have to try  
>>>> and do
>>>> that yourself.
>>>>
>>>>> I'm fine building a new RTree for each reader if
>>>>> that is required.
>>>>
>>>> If that works just as well, it will put you in a better position  
>>>> for
>>>> faster incremental updates... new RTrees will be built only for  
>>>> those
>>>> segments that have changed.
>>>>
>>>>> Is there any existing code that deals with this situation?
>>>>
>>>> To cache an RTree per reader, you could use the same logic as
>>>> FieldCache uses... a weak map with the reader as the key.
>>>>
>>>> If a single top-level RTree that covers the entire index works  
>>>> better
>>>> for you, then you can cache the RTree based on the top level multi
>>>> reader and translate the ids... that was my fix for  
>>>> ExternalFileField.
>>>> See FileFloatSource.getValues() for the implementation.
>>>>
>>>>
>>>>> - - - -
>>>>>
>>>>> Yonik also suggested:
>>>>>
>>>>> Relatively new in 2.9, you can pass null to enumerate over all  
>>>>> non-deleted
>>>>> docs:
>>>>> TermDocs td = reader.termDocs(null);
>>>>>
>>>>> It would probably be a lot faster to iterate over indexed values  
>>>>> though.
>>>>>
>>>>> If I iterate of indexed values (from the FieldCache i presume)  
>>>>> then how do i
>>>>> get access to the document id?
>>>>
>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>>>> terms, starting at t.
>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the  
>>>> list of
>>>> documents that match a term.
>>>>
>>>>
>>>> -Yonik
>>>
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Mark Miller <ma...@gmail.com>.

Looks like its my fault. Auto resolution was moved upto IndexSearcher in 
Lucene, and it looks like SolrIndexSearcher is not tickling it first. 
I'll take a look.

- Mark

Ryan McKinley wrote:
> Ok, not totally resolved....
>
> Things work fine when I have my custom Filter alone or with other 
> Filters, however if I add a query string to the mix it breaks with an 
> IllegalStateException:
>
> java.lang.IllegalStateException: Auto should be resolved before now
>     at 
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 
>
>     at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)
>     at 
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 
>
>     at 
> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58) 
>
>     at 
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 
>
>     at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 
>
>     at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 
>
>     at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 
>
>     at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 
>
>     at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 
>
>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 
>
>
> This is for a query:
>   /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
> bounds=XXX triggers my custom filter to kick in.
>
> Any thoughts where to look?  This error is new since upgrading the 
> lucene libs (in recent solr)
>
> Thanks!
> ryan
>
>
> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>
>> thanks!
>>
>> everything got better when I removed my logic to cache based on the 
>> index modification time.
>>
>>
>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>
>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com> 
>>> wrote:
>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 
>>>>
>>>>
>>>> I am using solr trunk and building an RTree from stored document 
>>>> fields.
>>>> This process worked fine until a recent change in 2.9 that has 
>>>> different
>>>> document id strategy then I was used to.
>>>>
>>>> In that thread, Yonik suggested:
>>>> - pop back to the top level from the sub-reader, if you really need 
>>>> a single
>>>> set
>>>> - if a set-per-reader will work, then cache per segment (better for
>>>> incremental updates anyway)
>>>>
>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>
>>> I meant RTree per reader (per segment reader).
>>>
>>>> Previously I was
>>>> building a single RTree and using it until the the last modified 
>>>> time had
>>>> changed.  This avoided building an index anytime a new reader was 
>>>> opened and
>>>> the index had not changed.
>>>
>>> I *think* that our use of re-open will return the same IndexReader
>>> instance if nothing has changed... so you shouldn't have to try and do
>>> that yourself.
>>>
>>>> I'm fine building a new RTree for each reader if
>>>> that is required.
>>>
>>> If that works just as well, it will put you in a better position for
>>> faster incremental updates... new RTrees will be built only for those
>>> segments that have changed.
>>>
>>>> Is there any existing code that deals with this situation?
>>>
>>> To cache an RTree per reader, you could use the same logic as
>>> FieldCache uses... a weak map with the reader as the key.
>>>
>>> If a single top-level RTree that covers the entire index works better
>>> for you, then you can cache the RTree based on the top level multi
>>> reader and translate the ids... that was my fix for ExternalFileField.
>>> See FileFloatSource.getValues() for the implementation.
>>>
>>>
>>>> - - - -
>>>>
>>>> Yonik also suggested:
>>>>
>>>> Relatively new in 2.9, you can pass null to enumerate over all 
>>>> non-deleted
>>>> docs:
>>>> TermDocs td = reader.termDocs(null);
>>>>
>>>> It would probably be a lot faster to iterate over indexed values 
>>>> though.
>>>>
>>>> If I iterate of indexed values (from the FieldCache i presume) then 
>>>> how do i
>>>> get access to the document id?
>>>
>>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>>> terms, starting at t.
>>> IndexReader.termDocs(Term t or TermEnum te) will give you the list of
>>> documents that match a term.
>>>
>>>
>>> -Yonik
>>
>


-- 
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Ryan McKinley <ry...@gmail.com>.

Ok, not totally resolved....

Things work fine when I have my custom Filter alone or with other  
Filters, however if I add a query string to the mix it breaks with an  
IllegalStateException:

java.lang.IllegalStateException: Auto should be resolved before now
	at org.apache.lucene.search.FieldSortedHitQueue 
$1.createValue(FieldSortedHitQueue.java:216)
	at org.apache.lucene.search.FieldCacheImpl 
$Cache.get(FieldCacheImpl.java:73)
	at  
org 
.apache 
.lucene 
.search 
.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
	at  
org 
.apache 
.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)
	at  
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1214)
	at  
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)
	at  
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
345)
	at  
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java:171)
	at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
195)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

This is for a query:
   /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the  
lucene libs (in recent solr)

Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:

> thanks!
>
> everything got better when I removed my logic to cache based on the  
> index modification time.
>
>
> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>
>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com>  
>> wrote:
>>> This issue started on java-user, but I am moving it to solr-dev:
>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>>
>>> I am using solr trunk and building an RTree from stored document  
>>> fields.
>>> This process worked fine until a recent change in 2.9 that has  
>>> different
>>> document id strategy then I was used to.
>>>
>>> In that thread, Yonik suggested:
>>> - pop back to the top level from the sub-reader, if you really  
>>> need a single
>>> set
>>> - if a set-per-reader will work, then cache per segment (better for
>>> incremental updates anyway)
>>>
>>> I'm not quite sure what you mean by a "set-per-reader".
>>
>> I meant RTree per reader (per segment reader).
>>
>>> Previously I was
>>> building a single RTree and using it until the the last modified  
>>> time had
>>> changed.  This avoided building an index anytime a new reader was  
>>> opened and
>>> the index had not changed.
>>
>> I *think* that our use of re-open will return the same IndexReader
>> instance if nothing has changed... so you shouldn't have to try and  
>> do
>> that yourself.
>>
>>> I'm fine building a new RTree for each reader if
>>> that is required.
>>
>> If that works just as well, it will put you in a better position for
>> faster incremental updates... new RTrees will be built only for those
>> segments that have changed.
>>
>>> Is there any existing code that deals with this situation?
>>
>> To cache an RTree per reader, you could use the same logic as
>> FieldCache uses... a weak map with the reader as the key.
>>
>> If a single top-level RTree that covers the entire index works better
>> for you, then you can cache the RTree based on the top level multi
>> reader and translate the ids... that was my fix for  
>> ExternalFileField.
>> See FileFloatSource.getValues() for the implementation.
>>
>>
>>> - - - -
>>>
>>> Yonik also suggested:
>>>
>>> Relatively new in 2.9, you can pass null to enumerate over all non- 
>>> deleted
>>> docs:
>>> TermDocs td = reader.termDocs(null);
>>>
>>> It would probably be a lot faster to iterate over indexed values  
>>> though.
>>>
>>> If I iterate of indexed values (from the FieldCache i presume)  
>>> then how do i
>>> get access to the document id?
>>
>> IndexReader.terms(Term t) returns a TermEnum that can iterate over
>> terms, starting at t.
>> IndexReader.termDocs(Term t or TermEnum te) will give you the list of
>> documents that match a term.
>>
>>
>> -Yonik
>

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Posted by Ryan McKinley <ry...@gmail.com>.

thanks!

everything got better when I removed my logic to cache based on the  
index modification time.


On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ry...@gmail.com>  
> wrote:
>> This issue started on java-user, but I am moving it to solr-dev:
>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>
>> I am using solr trunk and building an RTree from stored document  
>> fields.
>>  This process worked fine until a recent change in 2.9 that has  
>> different
>> document id strategy then I was used to.
>>
>> In that thread, Yonik suggested:
>> - pop back to the top level from the sub-reader, if you really need  
>> a single
>> set
>> - if a set-per-reader will work, then cache per segment (better for
>> incremental updates anyway)
>>
>> I'm not quite sure what you mean by a "set-per-reader".
>
> I meant RTree per reader (per segment reader).
>
>>  Previously I was
>> building a single RTree and using it until the the last modified  
>> time had
>> changed.  This avoided building an index anytime a new reader was  
>> opened and
>> the index had not changed.
>
> I *think* that our use of re-open will return the same IndexReader
> instance if nothing has changed... so you shouldn't have to try and do
> that yourself.
>
>>  I'm fine building a new RTree for each reader if
>> that is required.
>
> If that works just as well, it will put you in a better position for
> faster incremental updates... new RTrees will be built only for those
> segments that have changed.
>
>> Is there any existing code that deals with this situation?
>
> To cache an RTree per reader, you could use the same logic as
> FieldCache uses... a weak map with the reader as the key.
>
> If a single top-level RTree that covers the entire index works better
> for you, then you can cache the RTree based on the top level multi
> reader and translate the ids... that was my fix for ExternalFileField.
> See FileFloatSource.getValues() for the implementation.
>
>
>> - - - -
>>
>> Yonik also suggested:
>>
>>  Relatively new in 2.9, you can pass null to enumerate over all non- 
>> deleted
>> docs:
>>  TermDocs td = reader.termDocs(null);
>>
>>  It would probably be a lot faster to iterate over indexed values  
>> though.
>>
>> If I iterate of indexed values (from the FieldCache i presume) then  
>> how do i
>> get access to the document id?
>
> IndexReader.terms(Term t) returns a TermEnum that can iterate over
> terms, starting at t.
> IndexReader.termDocs(Term t or TermEnum te) will give you the list of
> documents that match a term.
>
>
> -Yonik