You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2014/06/02 13:52:51 UTC

Re: MultiReader docid reliability

Hi Erick,

the good reason for now is caching, we use them to store the results in
cache, and I wanted a better explanation of "ephemeral" do understand
the possible life of the cache.
>From the answers, ephemeral can be related to the opening of the
indexreader (in general for precaution) and all kind of modifications to
the index can be another interpretation.

Than it's not necessary, was just a matter of better understanding the
javadoc; I see the javadoc is the same for all the IndexReader than I
presume there are no differences from the various implementations.



nicola.


On Fri, 2014-05-30 at 12:50 -0700, Erick Erickson wrote:
> If you do an optimize, btw, the internal doc IDs may change.....
> 
> 
> But _why_ do you want to keep them? You may have very good reasons,
> but it's not clear that this is necessary/desirable from what you've
> said so far...
> 
> 
> Best,
> Erick
> 
> 
> On Fri, May 30, 2014 at 7:49 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi,
>         
>         thanks Michael and Alan. Is enough to know that re-opening the
>         index
>         there is no guarantee that the docids are maintained also if
>         the index
>         does not change.
>         
>         And I will try the question also on the Solr mailinglist.
>         
>         
>         nicola.
>         
>         
>         On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
>         > There is a Solr document cache that holds field values too,
>         see:
>         > http://wiki.apache.org/solr/SolrCaching
>         >
>         > Maybe take this question over to the solr mailing list?
>         >
>         > -Mike
>         >
>         > On 5/30/2014 10:32 AM, Alan Woodward wrote:
>         > > Solr caches hold lucene docids, which are invalidated
>         every time a new searcher is opened.  The various fields for a
>         response aren't cached as far as I know, they're reloaded on
>         each request.  But loading the fields for 10 documents is
>         typically very fast, compared to searching over a very large
>         collection.
>         > >
>         > > Alan Woodward
>         > > www.flax.co.uk
>         > >
>         > >
>         > > On 30 May 2014, at 11:20, Nicola Buso wrote:
>         > >
>         > >> Hi Alan,
>         > >>
>         > >> just to make it more typical (yes there are not
>         IndexWriters open on
>         > >> that indexes) how solr is caching results? the first
>         thing I would like
>         > >> to do is to store the docs ids and return to the reader
>         for the real
>         > >> content. Is solr storing the whole results with all
>         values?
>         > >>
>         > >>
>         > >> nicola.
>         > >>
>         > >>
>         > >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
>         > >>> If the index is truly unchanging (ie there's no
>         IndexWriter open on
>         > >>> it) then I guess the document numbers will be stable
>         across reopens.
>         > >>> But this is a pretty specialized situation, and the docs
>         are really
>         > >>> there to warn you off trying to rely on this for more
>         typical uses.
>         > >>>
>         > >>> Alan Woodward
>         > >>> www.flax.co.uk
>         > >>>
>         > >>>
>         > >>>
>         > >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
>         > >>>
>         > >>>> Hi Alan,
>         > >>>>
>         > >>>> thanks a lot for the reply.
>         > >>>>
>         > >>>> For what I understood from your reply if the index is
>         not changing
>         > >>>> (no
>         > >>>> adds, deletes even updates) the docs id viewed by the
>         MultiReader
>         > >>>> will
>         > >>>> not change if you open more times that unchanged index
>         also in
>         > >>>> different
>         > >>>> environments.
>         > >>>>
>         > >>>> If this is true (my understanding) the word "ephemeral"
>         in the API
>         > >>>> could
>         > >>>> be elaborated a bit more.
>         > >>>>
>         > >>>>
>         > >>>> nicola
>         > >>>>
>         > >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
>         > >>>>> Hi Nicola,
>         > >>>>>
>         > >>>>>
>         > >>>>> 1) A session here means as long as you have that
>         MultiReader open.
>         > >>>>> IndexReaders see a snapshot of the index and so
>         document ids
>         > >>>>> shouldn't change over the lifetime of an IndexReader,
>         even if the
>         > >>>>> index is being updated.
>         > >>>>>
>         > >>>>>
>         > >>>>> 2) MultiReader just takes an array of subindexes, so
>         as long as
>         > >>>>> the
>         > >>>>> subindexes are passed to the MultiReader constructor
>         in the same
>         > >>>>> order
>         > >>>>> on both machines, the docBase assigned to each reader
>         context
>         > >>>>> should
>         > >>>>> be the same.
>         > >>>>>
>         > >>>>> Alan Woodward
>         > >>>>> www.flax.co.uk
>         > >>>>>
>         > >>>>>
>         > >>>>>
>         > >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
>         > >>>>>
>         > >>>>>> Hi,
>         > >>>>>>
>         > >>>>>> from the javadocs:
>         > >>>>>>
>         > >>>>>> ----
>         > >>>>>> For efficiency, in this API documents are often
>         referred to via
>         > >>>>>> document
>         > >>>>>> numbers, non-negative integers which each name a
>         unique document
>         > >>>>>> in
>         > >>>>>> the
>         > >>>>>> index. These document numbers are ephemeral -- they
>         may change
>         > >>>>>> as
>         > >>>>>> documents are added to and deleted from an index.
>         Clients should
>         > >>>>>> thus
>         > >>>>>> not rely on a given document having the same number
>         between
>         > >>>>>> sessions.
>         > >>>>>> ----
>         > >>>>>>
>         > >>>>>> What does it mean in this context "sessions"? Are
>         search
>         > >>>>>> sessions?
>         > >>>>>>
>         > >>>>>> 1) If I have an index that does not change (no
>         deletes or
>         > >>>>>> updates)
>         > >>>>>> and
>         > >>>>>> I'm keeping the MultiReader open, can the docid
>         change executing
>         > >>>>>> more
>         > >>>>>> times the same search on that reader?
>         > >>>>>>
>         > >>>>>> 2) Opening the same set of indexes in a MultiReader
>         on different
>         > >>>>>> machines will assign different docids to the same
>         document at
>         > >>>>>> runtime or
>         > >>>>>> the algorithm to calculate such docids in some way
>         can guarantee
>         > >>>>>> that
>         > >>>>>> static indexes will have the same docids in different
>         machines
>         > >>>>>> (than
>         > >>>>>> separated JVMs)?
>         > >>>>>>
>         > >>>>>>
>         > >>>>>> nicola.
>         > >>>>>>
>         > >>>>>>
>         > >>>>>>
>         > >>>>>> --
>         > >>>>>> Nicola Buso <nb...@ebi.ac.uk>
>         > >>>>>> EMBL-EBI
>         > >>>>>>
>         > >>>>>>
>         > >>>>>>
>         ---------------------------------------------------------------------
>         > >>>>>> To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         > >>>>>> For additional commands, e-mail:
>         > >>>>>> java-user-help@lucene.apache.org
>         > >>>>>>
>         > >>>>>>
>         > >>>>>
>         > >>>> --
>         > >>>> Nicola Buso <nb...@ebi.ac.uk>
>         > >>>> EMBL-EBI
>         > >>>>
>         > >>>>
>         > >>>
>         > >> --
>         > >> Nicola Buso
>         > >> Software Engineer - Web Production Team
>         > >>
>         > >> European Bioinformatics Institute (EMBL-EBI)
>         > >> European Molecular Biology Laboratory
>         > >>
>         > >> Wellcome Trust Genome Campus
>         > >> Hinxton
>         > >> Cambridge CB10 1SD
>         > >> United Kingdom
>         > >>
>         > >> URL: http://www.ebi.ac.uk
>         > >>
>         > >
>         >
>         
>         --
>         Nicola Buso
>         Software Engineer - Web Production Team
>         
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         United Kingdom
>         
>         URL: http://www.ebi.ac.uk
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
>         
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org