You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2011/06/20 11:13:22 UTC

About IndexReader.reopen with very similar indexes

Hey there, 
I have a doubt about the behaviour of IndexReader.reopen.
I have a tomcat server holding a lucene index over an IndexSearcher. If I
move the index.folder to index.folder.old and another index, let's say
index.folder.2 to index.folder and then I reopen readers, something weird
happen if the first index and the second have very similar size and are
built from scratch. It seems that when I get the new reader and compare with
the new one:

 IndexReader reader = ... 
 ...
 IndexReader newReader = r.reopen();
 if (newReader != reader) {
 ...     // reader was reopened
   reader.close(); 
 }
 reader = newReader;
 ...

Lucene does not detect that are different indexes.
Here you can see both indexes (have same number of files and names are
similar, but sizes are a bit different, as contained documents are not
exactly the same).
*This does not happen if indexes have much more differents sizes (and so,
file names will not be equal, ex: _4.fdt, etc)
 
Index1:
-rw-r--r--   1 marc  admin  269289634 15 Jun 15:52 _3.fdt
-rw-r--r--   1 marc  admin    2066764 15 Jun 15:52 _3.fdx
-rw-r--r--   1 marc  admin        463 15 Jun 15:52 _3.fnm
-rw-r--r--   1 marc  admin   40358787 15 Jun 15:52 _3.frq
-rw-r--r--   1 marc  admin    1033384 15 Jun 15:52 _3.nrm
-rw-r--r--   1 marc  admin   27014923 15 Jun 15:52 _3.prx
-rw-r--r--   1 marc  admin     234797 15 Jun 15:52 _3.tii
-rw-r--r--   1 marc  admin   19322234 15 Jun 15:52 _3.tis
-rw-r--r--   1 marc  admin         20 15 Jun 15:52 segments.gen
-rw-r--r--   1 marc  admin        298 15 Jun 15:52 segments_2

Index2:
-rw-r--r--   1 marc  admin  269044254 15 Jun 15:52 _3.fdt
-rw-r--r--   1 marc  admin    2068116 15 Jun 15:52 _3.fdx
-rw-r--r--   1 marc  admin        463 15 Jun 15:52 _3.fnm
-rw-r--r--   1 marc  admin   40320465 15 Jun 15:52 _3.frq
-rw-r--r--   1 marc  admin    1034060 15 Jun 15:52 _3.nrm
-rw-r--r--   1 marc  admin   26967519 15 Jun 15:52 _3.prx
-rw-r--r--   1 marc  admin     235895 15 Jun 15:52 _3.tii
-rw-r--r--   1 marc  admin   19372446 15 Jun 15:52 _3.tis
-rw-r--r--   1 marc  admin         20 15 Jun 15:52 segments.gen
-rw-r--r--   1 marc  admin        298 15 Jun 15:52 segments_2

Can someone explain me the lucene criteria to decide if a segment has
changed or not?
Thanks in advance.


--
View this message in context: http://lucene.472066.n3.nabble.com/About-IndexReader-reopen-with-very-similar-indexes-tp3085456p3085456.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: About IndexReader.reopen with very similar indexes

Posted by Michael McCandless <lu...@mikemccandless.com>.
Reopening is based entirely on the latest segments_N file present in the index.

Lucene loads that file and checks if it refers to any new segments not
already open and if so opens those new ones.  And segments in common
with what the reader already has open (ie same segment name) are
simply reused.

Lucene doesn't look at file modification times, etc.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 20, 2011 at 5:13 AM, Marc Sturlese <ma...@gmail.com> wrote:
> Hey there,
> I have a doubt about the behaviour of IndexReader.reopen.
> I have a tomcat server holding a lucene index over an IndexSearcher. If I
> move the index.folder to index.folder.old and another index, let's say
> index.folder.2 to index.folder and then I reopen readers, something weird
> happen if the first index and the second have very similar size and are
> built from scratch. It seems that when I get the new reader and compare with
> the new one:
>
>  IndexReader reader = ...
>  ...
>  IndexReader newReader = r.reopen();
>  if (newReader != reader) {
>  ...     // reader was reopened
>   reader.close();
>  }
>  reader = newReader;
>  ...
>
> Lucene does not detect that are different indexes.
> Here you can see both indexes (have same number of files and names are
> similar, but sizes are a bit different, as contained documents are not
> exactly the same).
> *This does not happen if indexes have much more differents sizes (and so,
> file names will not be equal, ex: _4.fdt, etc)
>
> Index1:
> -rw-r--r--   1 marc  admin  269289634 15 Jun 15:52 _3.fdt
> -rw-r--r--   1 marc  admin    2066764 15 Jun 15:52 _3.fdx
> -rw-r--r--   1 marc  admin        463 15 Jun 15:52 _3.fnm
> -rw-r--r--   1 marc  admin   40358787 15 Jun 15:52 _3.frq
> -rw-r--r--   1 marc  admin    1033384 15 Jun 15:52 _3.nrm
> -rw-r--r--   1 marc  admin   27014923 15 Jun 15:52 _3.prx
> -rw-r--r--   1 marc  admin     234797 15 Jun 15:52 _3.tii
> -rw-r--r--   1 marc  admin   19322234 15 Jun 15:52 _3.tis
> -rw-r--r--   1 marc  admin         20 15 Jun 15:52 segments.gen
> -rw-r--r--   1 marc  admin        298 15 Jun 15:52 segments_2
>
> Index2:
> -rw-r--r--   1 marc  admin  269044254 15 Jun 15:52 _3.fdt
> -rw-r--r--   1 marc  admin    2068116 15 Jun 15:52 _3.fdx
> -rw-r--r--   1 marc  admin        463 15 Jun 15:52 _3.fnm
> -rw-r--r--   1 marc  admin   40320465 15 Jun 15:52 _3.frq
> -rw-r--r--   1 marc  admin    1034060 15 Jun 15:52 _3.nrm
> -rw-r--r--   1 marc  admin   26967519 15 Jun 15:52 _3.prx
> -rw-r--r--   1 marc  admin     235895 15 Jun 15:52 _3.tii
> -rw-r--r--   1 marc  admin   19372446 15 Jun 15:52 _3.tis
> -rw-r--r--   1 marc  admin         20 15 Jun 15:52 segments.gen
> -rw-r--r--   1 marc  admin        298 15 Jun 15:52 segments_2
>
> Can someone explain me the lucene criteria to decide if a segment has
> changed or not?
> Thanks in advance.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/About-IndexReader-reopen-with-very-similar-indexes-tp3085456p3085456.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org