You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Samarendra Pratap <sa...@gmail.com> on 2010/04/22 12:38:44 UTC
Reopening a Searcher for each request
Greetings to all.
I have read at so many places that we should not open a Searcher for each
request for the sake of performance, but I have always been wondering
whether it is actually Searcher or Reader?
I have a group of index amounting to 23G which actually contains of
different index directories. The structure is something like following
Main directory
|
|_________ country1
| |___ country1-time1 (actual index)
| |___ country1-time2 (actual index)
| |___ country1-time3 (actual index)
|
|_________ country2
|___ country2-time1 (actual index)
|___ country2-time2 (actual index)
|___ country2-time3 (actual index)
When application starts I open IndexReaders on all of actual index
directories (country1-time1, country1-tim2, .... country2-time3) and keep
them in a pool.
At the time of search, IndexSearchers are created by selecting the
appropriate IndexReaders from the pool. These IndexSearchers in turn are
used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
ParallelMultiSearcher are run for every request.
Now I believe that creating a pool of ParallelMultiSearcher itself is a
good idea but* I wanted to know if reopening **IndexSearchers** will really
degrade performance irrespective of **IndexReaders** being opened once*.
In my performance tests (which may not be very comprehensive) I didn't find
any noticeable difference.
Please throw some light.
--
Regards,
Samar
Re: Reopening a Searcher for each request
Posted by Samarendra Pratap <sa...@gmail.com>.
No! It's not like this in my code. This code opens an IndexReader every time
I call newIndexSearcher().
In my code it is sometime like -
IndexReader[] irs;
// irs is a global array containing IndexReaders which are opened when the
application starts
.....
.....
IndexSearcher[] getIndexSearchers(IndexReader[] irs)
{
IndexSearcher[] iss = new IndexSearcher[irs.length];
for(int i=0;i<irs.length;i++)
{
iss[i] = new IndexSearcher(irs[i]);
iss[i].setSimilarity(new MySimilarity(new
String[]{"contents"}));
}
return iss;
}
ParallelMultiSearcher getParallelMultiSearcher(<country and time related
parameters>) throws IOException
{
// by checking country and time related parameters, correct elements
are chosen from complete array of IndexReaders (irs) to pass in the function
for(int i=0;i<irs.length;i++)
{
if(<country and time related parameters match for current
value of "i">)
{
return (new
ParallelMultiSearcher(getIndexSearchers((IndexReader[])irs[i])));
}
}
// ideally this code should never be executed
return (new
ParallelMultiSearcher(getIndexSearchers(prepareReaders(<country
and time related parameters>))));
}
2010/4/24 Ivan Liu <ja...@gmail.com>
> like this?
> public synchronized IndexSearcher newIndexSearcher() {
> try {
> // semaphore.acquire();
> if (null == indexSearcher) {
> Directory directory = FSDirectory.open(new
> File(Config.DB_DIR+"/rssindex"));
> indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
> } else {
> IndexReader indexReader = indexSearcher.getIndexReader();
> IndexReader newIndexReader = indexReader.reopen();
> if (newIndexReader!=indexReader) {
>
> indexReader.close();
> indexSearcher.close();
>
>
> indexSearcher = new IndexSearcher(newIndexReader);
> }
> }
> return indexSearcher;
> } catch (CorruptIndexException e) {
> log.error(e.getMessage(),e);
> return null;
> } catch (IOException e) {
> log.error(e.getMessage(),e);
> return null;
> }finally{
> // semaphore.release();
> }
> }
>
> 2010/4/22 Samarendra Pratap <sa...@gmail.com>
>
> > Thanks Mike.
> > That solved a query which was itching my mind for a long time.
> >
> > On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> > > It's the IndexReader that's costly to open/warm, so ideally it should
> > > be opened once and shared.
> > >
> > > The Searchers do very little on construction so re-creating per query
> > > should be OK.
> > >
> > > Mike
> > >
> > > On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <
> samarzone@gmail.com>
> > > wrote:
> > > > Greetings to all.
> > > > I have read at so many places that we should not open a Searcher for
> > > each
> > > > request for the sake of performance, but I have always been wondering
> > > > whether it is actually Searcher or Reader?
> > > >
> > > > I have a group of index amounting to 23G which actually contains of
> > > > different index directories. The structure is something like
> following
> > > >
> > > > Main directory
> > > > |
> > > > |_________ country1
> > > > | |___ country1-time1 (actual index)
> > > > | |___ country1-time2 (actual index)
> > > > | |___ country1-time3 (actual index)
> > > > |
> > > > |_________ country2
> > > > |___ country2-time1 (actual index)
> > > > |___ country2-time2 (actual index)
> > > > |___ country2-time3 (actual index)
> > > >
> > > > When application starts I open IndexReaders on all of actual index
> > > > directories (country1-time1, country1-tim2, .... country2-time3) and
> > keep
> > > > them in a pool.
> > > >
> > > > At the time of search, IndexSearchers are created by selecting the
> > > > appropriate IndexReaders from the pool. These IndexSearchers in turn
> > are
> > > > used to create a ParallelMultiSearcher. Constructors of IndexSearcher
> > and
> > > > ParallelMultiSearcher are run for every request.
> > > >
> > > > Now I believe that creating a pool of ParallelMultiSearcher itself
> is
> > a
> > > > good idea but* I wanted to know if reopening **IndexSearchers** will
> > > really
> > > > degrade performance irrespective of **IndexReaders** being opened
> > once*.
> > > >
> > > > In my performance tests (which may not be very comprehensive) I
> didn't
> > > find
> > > > any noticeable difference.
> > > >
> > > > Please throw some light.
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Samar
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Regards,
> > Samar
> >
>
>
>
> --
> 冲浪板
>
> my blog:冲浪板 <http://chonglangban.appspot.com/>
> my site:Keji Technology <http://kejiblog.appspot.com/>
>
--
Regards,
Samar
Re: Reopening a Searcher for each request
Posted by Ivan Liu <ja...@gmail.com>.
like this?
public synchronized IndexSearcher newIndexSearcher() {
try {
// semaphore.acquire();
if (null == indexSearcher) {
Directory directory = FSDirectory.open(new
File(Config.DB_DIR+"/rssindex"));
indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
} else {
IndexReader indexReader = indexSearcher.getIndexReader();
IndexReader newIndexReader = indexReader.reopen();
if (newIndexReader!=indexReader) {
indexReader.close();
indexSearcher.close();
indexSearcher = new IndexSearcher(newIndexReader);
}
}
return indexSearcher;
} catch (CorruptIndexException e) {
log.error(e.getMessage(),e);
return null;
} catch (IOException e) {
log.error(e.getMessage(),e);
return null;
}finally{
// semaphore.release();
}
}
2010/4/22 Samarendra Pratap <sa...@gmail.com>
> Thanks Mike.
> That solved a query which was itching my mind for a long time.
>
> On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > It's the IndexReader that's costly to open/warm, so ideally it should
> > be opened once and shared.
> >
> > The Searchers do very little on construction so re-creating per query
> > should be OK.
> >
> > Mike
> >
> > On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com>
> > wrote:
> > > Greetings to all.
> > > I have read at so many places that we should not open a Searcher for
> > each
> > > request for the sake of performance, but I have always been wondering
> > > whether it is actually Searcher or Reader?
> > >
> > > I have a group of index amounting to 23G which actually contains of
> > > different index directories. The structure is something like following
> > >
> > > Main directory
> > > |
> > > |_________ country1
> > > | |___ country1-time1 (actual index)
> > > | |___ country1-time2 (actual index)
> > > | |___ country1-time3 (actual index)
> > > |
> > > |_________ country2
> > > |___ country2-time1 (actual index)
> > > |___ country2-time2 (actual index)
> > > |___ country2-time3 (actual index)
> > >
> > > When application starts I open IndexReaders on all of actual index
> > > directories (country1-time1, country1-tim2, .... country2-time3) and
> keep
> > > them in a pool.
> > >
> > > At the time of search, IndexSearchers are created by selecting the
> > > appropriate IndexReaders from the pool. These IndexSearchers in turn
> are
> > > used to create a ParallelMultiSearcher. Constructors of IndexSearcher
> and
> > > ParallelMultiSearcher are run for every request.
> > >
> > > Now I believe that creating a pool of ParallelMultiSearcher itself is
> a
> > > good idea but* I wanted to know if reopening **IndexSearchers** will
> > really
> > > degrade performance irrespective of **IndexReaders** being opened
> once*.
> > >
> > > In my performance tests (which may not be very comprehensive) I didn't
> > find
> > > any noticeable difference.
> > >
> > > Please throw some light.
> > >
> > >
> > > --
> > > Regards,
> > > Samar
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Regards,
> Samar
>
--
冲浪板
my blog:冲浪板 <http://chonglangban.appspot.com/>
my site:Keji Technology <http://kejiblog.appspot.com/>
Re: Reopening a Searcher for each request
Posted by Samarendra Pratap <sa...@gmail.com>.
Thanks Mike.
That solved a query which was itching my mind for a long time.
On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:
> It's the IndexReader that's costly to open/warm, so ideally it should
> be opened once and shared.
>
> The Searchers do very little on construction so re-creating per query
> should be OK.
>
> Mike
>
> On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com>
> wrote:
> > Greetings to all.
> > I have read at so many places that we should not open a Searcher for
> each
> > request for the sake of performance, but I have always been wondering
> > whether it is actually Searcher or Reader?
> >
> > I have a group of index amounting to 23G which actually contains of
> > different index directories. The structure is something like following
> >
> > Main directory
> > |
> > |_________ country1
> > | |___ country1-time1 (actual index)
> > | |___ country1-time2 (actual index)
> > | |___ country1-time3 (actual index)
> > |
> > |_________ country2
> > |___ country2-time1 (actual index)
> > |___ country2-time2 (actual index)
> > |___ country2-time3 (actual index)
> >
> > When application starts I open IndexReaders on all of actual index
> > directories (country1-time1, country1-tim2, .... country2-time3) and keep
> > them in a pool.
> >
> > At the time of search, IndexSearchers are created by selecting the
> > appropriate IndexReaders from the pool. These IndexSearchers in turn are
> > used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
> > ParallelMultiSearcher are run for every request.
> >
> > Now I believe that creating a pool of ParallelMultiSearcher itself is a
> > good idea but* I wanted to know if reopening **IndexSearchers** will
> really
> > degrade performance irrespective of **IndexReaders** being opened once*.
> >
> > In my performance tests (which may not be very comprehensive) I didn't
> find
> > any noticeable difference.
> >
> > Please throw some light.
> >
> >
> > --
> > Regards,
> > Samar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
Regards,
Samar
Re: Reopening a Searcher for each request
Posted by Michael McCandless <lu...@mikemccandless.com>.
It's the IndexReader that's costly to open/warm, so ideally it should
be opened once and shared.
The Searchers do very little on construction so re-creating per query
should be OK.
Mike
On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com> wrote:
> Greetings to all.
> I have read at so many places that we should not open a Searcher for each
> request for the sake of performance, but I have always been wondering
> whether it is actually Searcher or Reader?
>
> I have a group of index amounting to 23G which actually contains of
> different index directories. The structure is something like following
>
> Main directory
> |
> |_________ country1
> | |___ country1-time1 (actual index)
> | |___ country1-time2 (actual index)
> | |___ country1-time3 (actual index)
> |
> |_________ country2
> |___ country2-time1 (actual index)
> |___ country2-time2 (actual index)
> |___ country2-time3 (actual index)
>
> When application starts I open IndexReaders on all of actual index
> directories (country1-time1, country1-tim2, .... country2-time3) and keep
> them in a pool.
>
> At the time of search, IndexSearchers are created by selecting the
> appropriate IndexReaders from the pool. These IndexSearchers in turn are
> used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
> ParallelMultiSearcher are run for every request.
>
> Now I believe that creating a pool of ParallelMultiSearcher itself is a
> good idea but* I wanted to know if reopening **IndexSearchers** will really
> degrade performance irrespective of **IndexReaders** being opened once*.
>
> In my performance tests (which may not be very comprehensive) I didn't find
> any noticeable difference.
>
> Please throw some light.
>
>
> --
> Regards,
> Samar
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org