You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sol myr <so...@gmail.com> on 2011/03/24 15:00:30 UTC

Should I use MultiSearcher?

Hi,

I need to search a Catalog.
Most users search *this* year's catalog, but on rare occasions they may ask
for old products (from previous years).
I'm trying to select between 2 options:

1) Keep huge big index for all years (where documents have a "year" field,
so I can filter out the current year, when needed)

2) Keep separate indexes - FSDirectory per year:
FSDirectory.open("c:/index_2009/"),  FSDirectory.open("c:/index_2010/") ...
Most searches will run on the current year's FSDirectory, but if I want old
product I can use MultiSearcher.

Which option sounds better?
The 1st seems easier to code.
But I thought the 2nd might have better performance - especially since most
searches are on the current year.
Moreover, since changes occur only on current year (old products never
change), I though the 2nd approach would be easier on the IndexWriter
(especially on heavy actions like "optimize()").

What do you thing?
Thanks :)

Re: Should I use MultiSearcher?

Posted by Ian Lea <ia...@gmail.com>.
https://issues.apache.org/jira/browse/LUCENE-2756.

--
Ian.


On Thu, Mar 24, 2011 at 2:13 PM, Devon H. O'Dell <de...@gmail.com> wrote:
> 2011/3/24 Uwe Schindler <uw...@thetaphi.de>:
>> Don't use MultiSearcher. Instead create a MultiReader around the separate
>> IndexReaders for each index and pass that MultiReader to a conventional
>> IndexSearcher as IndexReader. MultiSearcher is very buggy.
>
> Could you elaborate on this point at all, Uwe? I'm using
> ParallelMultiSearcher with 3.0.3 in an application right now and it
> doesn't seem to be very buggy to me, but perhaps I'm missing
> something. Switching to MultiReader / IndexSearcher wouldn't be a
> problem, but I'm curious as to what exactly is wrong with the
> MultiSearcher interface.
>
> Thanks!
>
> --dho
>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: sol myr [mailto:solmyr72@gmail.com]
>>> Sent: Thursday, March 24, 2011 3:01 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Should I use MultiSearcher?
>>>
>>> Hi,
>>>
>>> I need to search a Catalog.
>>> Most users search *this* year's catalog, but on rare occasions they may
>> ask
>>> for old products (from previous years).
>>> I'm trying to select between 2 options:
>>>
>>> 1) Keep huge big index for all years (where documents have a "year" field,
>> so
>>> I can filter out the current year, when needed)
>>>
>>> 2) Keep separate indexes - FSDirectory per year:
>>> FSDirectory.open("c:/index_2009/"),  FSDirectory.open("c:/index_2010/")
>> ...
>>> Most searches will run on the current year's FSDirectory, but if I want
>> old
>>> product I can use MultiSearcher.
>>>
>>> Which option sounds better?
>>> The 1st seems easier to code.
>>> But I thought the 2nd might have better performance - especially since
>> most
>>> searches are on the current year.
>>> Moreover, since changes occur only on current year (old products never
>>> change), I though the 2nd approach would be easier on the IndexWriter
>>> (especially on heavy actions like "optimize()").
>>>
>>> What do you thing?
>>> Thanks :)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Should I use MultiSearcher?

Posted by "Devon H. O'Dell" <de...@gmail.com>.
2011/3/24 Uwe Schindler <uw...@thetaphi.de>:
> Don't use MultiSearcher. Instead create a MultiReader around the separate
> IndexReaders for each index and pass that MultiReader to a conventional
> IndexSearcher as IndexReader. MultiSearcher is very buggy.

Could you elaborate on this point at all, Uwe? I'm using
ParallelMultiSearcher with 3.0.3 in an application right now and it
doesn't seem to be very buggy to me, but perhaps I'm missing
something. Switching to MultiReader / IndexSearcher wouldn't be a
problem, but I'm curious as to what exactly is wrong with the
MultiSearcher interface.

Thanks!

--dho

> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: sol myr [mailto:solmyr72@gmail.com]
>> Sent: Thursday, March 24, 2011 3:01 PM
>> To: java-user@lucene.apache.org
>> Subject: Should I use MultiSearcher?
>>
>> Hi,
>>
>> I need to search a Catalog.
>> Most users search *this* year's catalog, but on rare occasions they may
> ask
>> for old products (from previous years).
>> I'm trying to select between 2 options:
>>
>> 1) Keep huge big index for all years (where documents have a "year" field,
> so
>> I can filter out the current year, when needed)
>>
>> 2) Keep separate indexes - FSDirectory per year:
>> FSDirectory.open("c:/index_2009/"),  FSDirectory.open("c:/index_2010/")
> ...
>> Most searches will run on the current year's FSDirectory, but if I want
> old
>> product I can use MultiSearcher.
>>
>> Which option sounds better?
>> The 1st seems easier to code.
>> But I thought the 2nd might have better performance - especially since
> most
>> searches are on the current year.
>> Moreover, since changes occur only on current year (old products never
>> change), I though the 2nd approach would be easier on the IndexWriter
>> (especially on heavy actions like "optimize()").
>>
>> What do you thing?
>> Thanks :)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Should I use MultiSearcher?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Don't use MultiSearcher. Instead create a MultiReader around the separate
IndexReaders for each index and pass that MultiReader to a conventional
IndexSearcher as IndexReader. MultiSearcher is very buggy.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: sol myr [mailto:solmyr72@gmail.com]
> Sent: Thursday, March 24, 2011 3:01 PM
> To: java-user@lucene.apache.org
> Subject: Should I use MultiSearcher?
> 
> Hi,
> 
> I need to search a Catalog.
> Most users search *this* year's catalog, but on rare occasions they may
ask
> for old products (from previous years).
> I'm trying to select between 2 options:
> 
> 1) Keep huge big index for all years (where documents have a "year" field,
so
> I can filter out the current year, when needed)
> 
> 2) Keep separate indexes - FSDirectory per year:
> FSDirectory.open("c:/index_2009/"),  FSDirectory.open("c:/index_2010/")
...
> Most searches will run on the current year's FSDirectory, but if I want
old
> product I can use MultiSearcher.
> 
> Which option sounds better?
> The 1st seems easier to code.
> But I thought the 2nd might have better performance - especially since
most
> searches are on the current year.
> Moreover, since changes occur only on current year (old products never
> change), I though the 2nd approach would be easier on the IndexWriter
> (especially on heavy actions like "optimize()").
> 
> What do you thing?
> Thanks :)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Should I use MultiSearcher?

Posted by Ian Lea <ia...@gmail.com>.
Care to define huge?  There often isn't a "best" solution but in this
case I think I'd vote for the index-per-year approach.

btw with recent versions of lucene you don't need to call optimize()
very often, if at all, although you might want to run it at the
beginning of each year against the previous year's index, now static.


--
Ian.


On Thu, Mar 24, 2011 at 2:00 PM, sol myr <so...@gmail.com> wrote:
> Hi,
>
> I need to search a Catalog.
> Most users search *this* year's catalog, but on rare occasions they may ask
> for old products (from previous years).
> I'm trying to select between 2 options:
>
> 1) Keep huge big index for all years (where documents have a "year" field,
> so I can filter out the current year, when needed)
>
> 2) Keep separate indexes - FSDirectory per year:
> FSDirectory.open("c:/index_2009/"),  FSDirectory.open("c:/index_2010/") ...
> Most searches will run on the current year's FSDirectory, but if I want old
> product I can use MultiSearcher.
>
> Which option sounds better?
> The 1st seems easier to code.
> But I thought the 2nd might have better performance - especially since most
> searches are on the current year.
> Moreover, since changes occur only on current year (old products never
> change), I though the 2nd approach would be easier on the IndexWriter
> (especially on heavy actions like "optimize()").
>
> What do you thing?
> Thanks :)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org