You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Samarendra Pratap <sa...@gmail.com> on 2010/04/22 12:38:44 UTC

Reopening a Searcher for each request

Greetings to all.
 I have read at so many places that we should not open a Searcher for each
request for the sake of performance, but I have always been wondering
whether it is actually Searcher or Reader?

 I have a group of index amounting to 23G which actually contains of
different index directories. The structure is something like following

Main directory
|
|_________ country1
|                     |___ country1-time1 (actual index)
|                     |___ country1-time2 (actual index)
|                     |___ country1-time3 (actual index)
|
|_________ country2
                     |___ country2-time1 (actual index)
                     |___ country2-time2 (actual index)
                     |___ country2-time3 (actual index)

 When application starts I open IndexReaders on all of actual index
directories (country1-time1, country1-tim2, .... country2-time3) and keep
them in a pool.

 At the time of search, IndexSearchers are created by selecting the
appropriate IndexReaders from the pool. These IndexSearchers in turn are
used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
ParallelMultiSearcher are run for every request.

 Now I believe that creating a pool of ParallelMultiSearcher itself is a
good idea but* I wanted to know if reopening **IndexSearchers** will really
degrade performance irrespective of **IndexReaders** being opened once*.

In my performance tests (which may not be very comprehensive) I didn't find
any noticeable difference.

Please throw some light.


-- 
Regards,
Samar

Re: Reopening a Searcher for each request

Posted by Samarendra Pratap <sa...@gmail.com>.
No! It's not like this in my code. This code opens an IndexReader every time
I call newIndexSearcher().

In my code it is sometime like -


IndexReader[] irs;
// irs is a global array containing IndexReaders which are opened when the
application starts
.....
.....
IndexSearcher[] getIndexSearchers(IndexReader[] irs)
{
        IndexSearcher[] iss = new IndexSearcher[irs.length];
        for(int i=0;i<irs.length;i++)
        {
                iss[i] = new IndexSearcher(irs[i]);
                iss[i].setSimilarity(new MySimilarity(new
String[]{"contents"}));
        }
        return iss;
}

ParallelMultiSearcher getParallelMultiSearcher(<country and time related
parameters>) throws IOException
{
        // by checking country and time related parameters, correct elements
are chosen from complete array of IndexReaders (irs) to pass in the function
        for(int i=0;i<irs.length;i++)
        {
                if(<country and time related parameters match for current
value of "i">)
                {
                        return (new
ParallelMultiSearcher(getIndexSearchers((IndexReader[])irs[i])));
                }
        }

        // ideally this code should never be executed
        return (new
ParallelMultiSearcher(getIndexSearchers(prepareReaders(<country
and time related parameters>))));
}



2010/4/24 Ivan Liu <ja...@gmail.com>

> like this?
>  public synchronized IndexSearcher newIndexSearcher() {
>  try {
> //   semaphore.acquire();
>   if (null == indexSearcher) {
>    Directory directory = FSDirectory.open(new
> File(Config.DB_DIR+"/rssindex"));
>    indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
>   } else {
>    IndexReader indexReader = indexSearcher.getIndexReader();
>    IndexReader newIndexReader = indexReader.reopen();
>    if (newIndexReader!=indexReader) {
>
>     indexReader.close();
>     indexSearcher.close();
>
>
>     indexSearcher = new IndexSearcher(newIndexReader);
>    }
>   }
>   return indexSearcher;
>  } catch (CorruptIndexException e) {
>   log.error(e.getMessage(),e);
>   return null;
>  } catch (IOException e) {
>   log.error(e.getMessage(),e);
>   return null;
>  }finally{
> //   semaphore.release();
>  }
>  }
>
> 2010/4/22 Samarendra Pratap <sa...@gmail.com>
>
> > Thanks Mike.
> > That solved a query which was itching my mind for a long time.
> >
> > On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> > > It's the IndexReader that's costly to open/warm, so ideally it should
> > > be opened once and shared.
> > >
> > > The Searchers do very little on construction so re-creating per query
> > > should be OK.
> > >
> > > Mike
> > >
> > > On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <
> samarzone@gmail.com>
> > > wrote:
> > > > Greetings to all.
> > > >  I have read at so many places that we should not open a Searcher for
> > > each
> > > > request for the sake of performance, but I have always been wondering
> > > > whether it is actually Searcher or Reader?
> > > >
> > > >  I have a group of index amounting to 23G which actually contains of
> > > > different index directories. The structure is something like
> following
> > > >
> > > > Main directory
> > > > |
> > > > |_________ country1
> > > > |                     |___ country1-time1 (actual index)
> > > > |                     |___ country1-time2 (actual index)
> > > > |                     |___ country1-time3 (actual index)
> > > > |
> > > > |_________ country2
> > > >                     |___ country2-time1 (actual index)
> > > >                     |___ country2-time2 (actual index)
> > > >                     |___ country2-time3 (actual index)
> > > >
> > > >  When application starts I open IndexReaders on all of actual index
> > > > directories (country1-time1, country1-tim2, .... country2-time3) and
> > keep
> > > > them in a pool.
> > > >
> > > >  At the time of search, IndexSearchers are created by selecting the
> > > > appropriate IndexReaders from the pool. These IndexSearchers in turn
> > are
> > > > used to create a ParallelMultiSearcher. Constructors of IndexSearcher
> > and
> > > > ParallelMultiSearcher are run for every request.
> > > >
> > > >  Now I believe that creating a pool of ParallelMultiSearcher itself
> is
> > a
> > > > good idea but* I wanted to know if reopening **IndexSearchers** will
> > > really
> > > > degrade performance irrespective of **IndexReaders** being opened
> > once*.
> > > >
> > > > In my performance tests (which may not be very comprehensive) I
> didn't
> > > find
> > > > any noticeable difference.
> > > >
> > > > Please throw some light.
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Samar
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Regards,
> > Samar
> >
>
>
>
> --
> 冲浪板
>
> my blog:冲浪板 <http://chonglangban.appspot.com/>
> my site:Keji Technology <http://kejiblog.appspot.com/>
>



-- 
Regards,
Samar

Re: Reopening a Searcher for each request

Posted by Ivan Liu <ja...@gmail.com>.
like this?
 public synchronized IndexSearcher newIndexSearcher() {
  try {
//   semaphore.acquire();
   if (null == indexSearcher) {
    Directory directory = FSDirectory.open(new
File(Config.DB_DIR+"/rssindex"));
    indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
   } else {
    IndexReader indexReader = indexSearcher.getIndexReader();
    IndexReader newIndexReader = indexReader.reopen();
    if (newIndexReader!=indexReader) {

     indexReader.close();
     indexSearcher.close();


     indexSearcher = new IndexSearcher(newIndexReader);
    }
   }
   return indexSearcher;
  } catch (CorruptIndexException e) {
   log.error(e.getMessage(),e);
   return null;
  } catch (IOException e) {
   log.error(e.getMessage(),e);
   return null;
  }finally{
//   semaphore.release();
  }
 }

2010/4/22 Samarendra Pratap <sa...@gmail.com>

> Thanks Mike.
> That solved a query which was itching my mind for a long time.
>
> On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > It's the IndexReader that's costly to open/warm, so ideally it should
> > be opened once and shared.
> >
> > The Searchers do very little on construction so re-creating per query
> > should be OK.
> >
> > Mike
> >
> > On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com>
> > wrote:
> > > Greetings to all.
> > >  I have read at so many places that we should not open a Searcher for
> > each
> > > request for the sake of performance, but I have always been wondering
> > > whether it is actually Searcher or Reader?
> > >
> > >  I have a group of index amounting to 23G which actually contains of
> > > different index directories. The structure is something like following
> > >
> > > Main directory
> > > |
> > > |_________ country1
> > > |                     |___ country1-time1 (actual index)
> > > |                     |___ country1-time2 (actual index)
> > > |                     |___ country1-time3 (actual index)
> > > |
> > > |_________ country2
> > >                     |___ country2-time1 (actual index)
> > >                     |___ country2-time2 (actual index)
> > >                     |___ country2-time3 (actual index)
> > >
> > >  When application starts I open IndexReaders on all of actual index
> > > directories (country1-time1, country1-tim2, .... country2-time3) and
> keep
> > > them in a pool.
> > >
> > >  At the time of search, IndexSearchers are created by selecting the
> > > appropriate IndexReaders from the pool. These IndexSearchers in turn
> are
> > > used to create a ParallelMultiSearcher. Constructors of IndexSearcher
> and
> > > ParallelMultiSearcher are run for every request.
> > >
> > >  Now I believe that creating a pool of ParallelMultiSearcher itself is
> a
> > > good idea but* I wanted to know if reopening **IndexSearchers** will
> > really
> > > degrade performance irrespective of **IndexReaders** being opened
> once*.
> > >
> > > In my performance tests (which may not be very comprehensive) I didn't
> > find
> > > any noticeable difference.
> > >
> > > Please throw some light.
> > >
> > >
> > > --
> > > Regards,
> > > Samar
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Regards,
> Samar
>



-- 
冲浪板

my blog:冲浪板 <http://chonglangban.appspot.com/>
my site:Keji Technology <http://kejiblog.appspot.com/>

Re: Reopening a Searcher for each request

Posted by Samarendra Pratap <sa...@gmail.com>.
Thanks Mike.
That solved a query which was itching my mind for a long time.

On Thu, Apr 22, 2010 at 4:41 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It's the IndexReader that's costly to open/warm, so ideally it should
> be opened once and shared.
>
> The Searchers do very little on construction so re-creating per query
> should be OK.
>
> Mike
>
> On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com>
> wrote:
> > Greetings to all.
> >  I have read at so many places that we should not open a Searcher for
> each
> > request for the sake of performance, but I have always been wondering
> > whether it is actually Searcher or Reader?
> >
> >  I have a group of index amounting to 23G which actually contains of
> > different index directories. The structure is something like following
> >
> > Main directory
> > |
> > |_________ country1
> > |                     |___ country1-time1 (actual index)
> > |                     |___ country1-time2 (actual index)
> > |                     |___ country1-time3 (actual index)
> > |
> > |_________ country2
> >                     |___ country2-time1 (actual index)
> >                     |___ country2-time2 (actual index)
> >                     |___ country2-time3 (actual index)
> >
> >  When application starts I open IndexReaders on all of actual index
> > directories (country1-time1, country1-tim2, .... country2-time3) and keep
> > them in a pool.
> >
> >  At the time of search, IndexSearchers are created by selecting the
> > appropriate IndexReaders from the pool. These IndexSearchers in turn are
> > used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
> > ParallelMultiSearcher are run for every request.
> >
> >  Now I believe that creating a pool of ParallelMultiSearcher itself is a
> > good idea but* I wanted to know if reopening **IndexSearchers** will
> really
> > degrade performance irrespective of **IndexReaders** being opened once*.
> >
> > In my performance tests (which may not be very comprehensive) I didn't
> find
> > any noticeable difference.
> >
> > Please throw some light.
> >
> >
> > --
> > Regards,
> > Samar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Samar

Re: Reopening a Searcher for each request

Posted by Michael McCandless <lu...@mikemccandless.com>.
It's the IndexReader that's costly to open/warm, so ideally it should
be opened once and shared.

The Searchers do very little on construction so re-creating per query
should be OK.

Mike

On Thu, Apr 22, 2010 at 6:38 AM, Samarendra Pratap <sa...@gmail.com> wrote:
> Greetings to all.
>  I have read at so many places that we should not open a Searcher for each
> request for the sake of performance, but I have always been wondering
> whether it is actually Searcher or Reader?
>
>  I have a group of index amounting to 23G which actually contains of
> different index directories. The structure is something like following
>
> Main directory
> |
> |_________ country1
> |                     |___ country1-time1 (actual index)
> |                     |___ country1-time2 (actual index)
> |                     |___ country1-time3 (actual index)
> |
> |_________ country2
>                     |___ country2-time1 (actual index)
>                     |___ country2-time2 (actual index)
>                     |___ country2-time3 (actual index)
>
>  When application starts I open IndexReaders on all of actual index
> directories (country1-time1, country1-tim2, .... country2-time3) and keep
> them in a pool.
>
>  At the time of search, IndexSearchers are created by selecting the
> appropriate IndexReaders from the pool. These IndexSearchers in turn are
> used to create a ParallelMultiSearcher. Constructors of IndexSearcher and
> ParallelMultiSearcher are run for every request.
>
>  Now I believe that creating a pool of ParallelMultiSearcher itself is a
> good idea but* I wanted to know if reopening **IndexSearchers** will really
> degrade performance irrespective of **IndexReaders** being opened once*.
>
> In my performance tests (which may not be very comprehensive) I didn't find
> any noticeable difference.
>
> Please throw some light.
>
>
> --
> Regards,
> Samar
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org