You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Micah Jaffe <mi...@affinitycircles.com> on 2009/01/22 20:36:10 UTC

Problem with creating IndexReaders and understanding their implementation use

My environment:

Lucene 2.3.2 on Linux, Java 1.6.0_07-b06, running under Tomcat 5.5.26

What I'm trying to do seems pretty simple, but is causing a headache  
which I can't sleuth out.  When I try to build an IndexSearcher using  
an IndexReader, the IndexReader.open( String_to_index_dir ) call  
sometimes creates a reader with a SegmentReader impl.  Other times a  
MultiSegmentReader will be created.  The SegmentReader impl when used  
returns a much reduced or empty result set from searchers; the  
MultiSegmentReader returns the right and expected results from the  
index.

The indexes I'm using were built initially optimized, but will not  
remain so until they are rebuilt from scratch (usually weekly).  The  
indexes are being constantly updated and searched on in the Tomcat  
environment.

I noticed in the guts of the DirectoryIndexReader it will seem to pick  
SegmentReader over MultiSegmentReader if the index is optimized.  Is  
there any way to make this code always return a MultiSegmentReader?   
Why is one reader failing over the other?  This appears to be a bug  
(and a nasty one for me).

Any help or guidance would be much appreciated as I've already spent a  
day trying to dig through these mechanics and am a little lost.   
Please no, "Upgrade to 2.4 and then see what happens."  2.4 appears to  
include a lot of interesting things but also deprecates a lot of API  
which I do not have time to dig into or refactor.

thanks,
Micah

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with creating IndexReaders and understanding their implementation use

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, phew!  Thanks for bringing closure.

Mike

Micah Jaffe wrote:

> Hi, thanks for responding.  A quick follow-up, the problem turned  
> out to be more complex than originally explained, and it was not in  
> fact a problem with buggy behavior in Lucene.  A static nested class  
> created a fugly point where I was holding/using the wrong reader at  
> the time of failure.
>
> -M
>
> On Jan 22, 2009, at 2:57 PM, Michael McCandless wrote:
>
>>
>> IndexReader.open opens the latest segments_N file.  If that file  
>> references only 1 segment, a SegmentReader is returned, else a  
>> MultiSegmentReader.
>>
>> I'm confused why you see the SegmentReader impl giving too few  
>> results -- that should only be returned if your index legitimately  
>> has only 1 segment.  Are you sure the SegmentReader impl is not  
>> returning the right results?  Do you have an example index which  
>> IndexReader.open consistently incorrectly returns a SegmentReader?
>>
>> Is an IndexWriter changing the index when you call IndexReader.open?
>>
>> Mike
>>
>> Micah Jaffe wrote:
>>
>>> My environment:
>>>
>>> Lucene 2.3.2 on Linux, Java 1.6.0_07-b06, running under Tomcat  
>>> 5.5.26
>>>
>>> What I'm trying to do seems pretty simple, but is causing a  
>>> headache which I can't sleuth out.  When I try to build an  
>>> IndexSearcher using an IndexReader, the  
>>> IndexReader.open( String_to_index_dir ) call sometimes creates a  
>>> reader with a SegmentReader impl.  Other times a  
>>> MultiSegmentReader will be created.  The SegmentReader impl when  
>>> used returns a much reduced or empty result set from searchers;  
>>> the MultiSegmentReader returns the right and expected results from  
>>> the index.
>>>
>>> The indexes I'm using were built initially optimized, but will not  
>>> remain so until they are rebuilt from scratch (usually weekly).   
>>> The indexes are being constantly updated and searched on in the  
>>> Tomcat environment.
>>>
>>> I noticed in the guts of the DirectoryIndexReader it will seem to  
>>> pick SegmentReader over MultiSegmentReader if the index is  
>>> optimized.  Is there any way to make this code always return a  
>>> MultiSegmentReader?  Why is one reader failing over the other?   
>>> This appears to be a bug (and a nasty one for me).
>>>
>>> Any help or guidance would be much appreciated as I've already  
>>> spent a day trying to dig through these mechanics and am a little  
>>> lost.  Please no, "Upgrade to 2.4 and then see what happens."  2.4  
>>> appears to include a lot of interesting things but also deprecates  
>>> a lot of API which I do not have time to dig into or refactor.
>>>
>>> thanks,
>>> Micah
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with creating IndexReaders and understanding their implementation use

Posted by Micah Jaffe <mi...@affinitycircles.com>.

Hi, thanks for responding.  A quick follow-up, the problem turned out  
to be more complex than originally explained, and it was not in fact a  
problem with buggy behavior in Lucene.  A static nested class created  
a fugly point where I was holding/using the wrong reader at the time  
of failure.

-M

On Jan 22, 2009, at 2:57 PM, Michael McCandless wrote:

>
> IndexReader.open opens the latest segments_N file.  If that file  
> references only 1 segment, a SegmentReader is returned, else a  
> MultiSegmentReader.
>
> I'm confused why you see the SegmentReader impl giving too few  
> results -- that should only be returned if your index legitimately  
> has only 1 segment.  Are you sure the SegmentReader impl is not  
> returning the right results?  Do you have an example index which  
> IndexReader.open consistently incorrectly returns a SegmentReader?
>
> Is an IndexWriter changing the index when you call IndexReader.open?
>
> Mike
>
> Micah Jaffe wrote:
>
>> My environment:
>>
>> Lucene 2.3.2 on Linux, Java 1.6.0_07-b06, running under Tomcat 5.5.26
>>
>> What I'm trying to do seems pretty simple, but is causing a  
>> headache which I can't sleuth out.  When I try to build an  
>> IndexSearcher using an IndexReader, the  
>> IndexReader.open( String_to_index_dir ) call sometimes creates a  
>> reader with a SegmentReader impl.  Other times a MultiSegmentReader  
>> will be created.  The SegmentReader impl when used returns a much  
>> reduced or empty result set from searchers; the MultiSegmentReader  
>> returns the right and expected results from the index.
>>
>> The indexes I'm using were built initially optimized, but will not  
>> remain so until they are rebuilt from scratch (usually weekly).   
>> The indexes are being constantly updated and searched on in the  
>> Tomcat environment.
>>
>> I noticed in the guts of the DirectoryIndexReader it will seem to  
>> pick SegmentReader over MultiSegmentReader if the index is  
>> optimized.  Is there any way to make this code always return a  
>> MultiSegmentReader?  Why is one reader failing over the other?   
>> This appears to be a bug (and a nasty one for me).
>>
>> Any help or guidance would be much appreciated as I've already  
>> spent a day trying to dig through these mechanics and am a little  
>> lost.  Please no, "Upgrade to 2.4 and then see what happens."  2.4  
>> appears to include a lot of interesting things but also deprecates  
>> a lot of API which I do not have time to dig into or refactor.
>>
>> thanks,
>> Micah
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with creating IndexReaders and understanding their implementation use

Posted by Michael McCandless <lu...@mikemccandless.com>.

IndexReader.open opens the latest segments_N file.  If that file  
references only 1 segment, a SegmentReader is returned, else a  
MultiSegmentReader.

I'm confused why you see the SegmentReader impl giving too few results  
-- that should only be returned if your index legitimately has only 1  
segment.  Are you sure the SegmentReader impl is not returning the  
right results?  Do you have an example index which IndexReader.open  
consistently incorrectly returns a SegmentReader?

Is an IndexWriter changing the index when you call IndexReader.open?

Mike

Micah Jaffe wrote:

> My environment:
>
> Lucene 2.3.2 on Linux, Java 1.6.0_07-b06, running under Tomcat 5.5.26
>
> What I'm trying to do seems pretty simple, but is causing a headache  
> which I can't sleuth out.  When I try to build an IndexSearcher  
> using an IndexReader, the IndexReader.open( String_to_index_dir )  
> call sometimes creates a reader with a SegmentReader impl.  Other  
> times a MultiSegmentReader will be created.  The SegmentReader impl  
> when used returns a much reduced or empty result set from searchers;  
> the MultiSegmentReader returns the right and expected results from  
> the index.
>
> The indexes I'm using were built initially optimized, but will not  
> remain so until they are rebuilt from scratch (usually weekly).  The  
> indexes are being constantly updated and searched on in the Tomcat  
> environment.
>
> I noticed in the guts of the DirectoryIndexReader it will seem to  
> pick SegmentReader over MultiSegmentReader if the index is  
> optimized.  Is there any way to make this code always return a  
> MultiSegmentReader?  Why is one reader failing over the other?  This  
> appears to be a bug (and a nasty one for me).
>
> Any help or guidance would be much appreciated as I've already spent  
> a day trying to dig through these mechanics and am a little lost.   
> Please no, "Upgrade to 2.4 and then see what happens."  2.4 appears  
> to include a lot of interesting things but also deprecates a lot of  
> API which I do not have time to dig into or refactor.
>
> thanks,
> Micah
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org