You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2014/05/29 15:29:18 UTC

MultiReader docid reliability

Hi,

from the javadocs:

----
For efficiency, in this API documents are often referred to via document
numbers, non-negative integers which each name a unique document in the
index. These document numbers are ephemeral -- they may change as
documents are added to and deleted from an index. Clients should thus
not rely on a given document having the same number between sessions. 
----

What does it mean in this context "sessions"? Are search sessions?

1) If I have an index that does not change (no deletes or updates) and
I'm keeping the MultiReader open, can the docid change executing more
times the same search on that reader?

2) Opening the same set of indexes in a MultiReader on different
machines will assign different docids to the same document at runtime or
the algorithm to calculate such docids in some way can guarantee that
static indexes will have the same docids in different machines (than
separated JVMs)?


nicola.



-- 
Nicola Buso <nb...@ebi.ac.uk>
EMBL-EBI


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Erick,

the good reason for now is caching, we use them to store the results in
cache, and I wanted a better explanation of "ephemeral" do understand
the possible life of the cache.
>From the answers, ephemeral can be related to the opening of the
indexreader (in general for precaution) and all kind of modifications to
the index can be another interpretation.

Than it's not necessary, was just a matter of better understanding the
javadoc; I see the javadoc is the same for all the IndexReader than I
presume there are no differences from the various implementations.



nicola.


On Fri, 2014-05-30 at 12:50 -0700, Erick Erickson wrote:
> If you do an optimize, btw, the internal doc IDs may change.....
> 
> 
> But _why_ do you want to keep them? You may have very good reasons,
> but it's not clear that this is necessary/desirable from what you've
> said so far...
> 
> 
> Best,
> Erick
> 
> 
> On Fri, May 30, 2014 at 7:49 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi,
>         
>         thanks Michael and Alan. Is enough to know that re-opening the
>         index
>         there is no guarantee that the docids are maintained also if
>         the index
>         does not change.
>         
>         And I will try the question also on the Solr mailinglist.
>         
>         
>         nicola.
>         
>         
>         On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
>         > There is a Solr document cache that holds field values too,
>         see:
>         > http://wiki.apache.org/solr/SolrCaching
>         >
>         > Maybe take this question over to the solr mailing list?
>         >
>         > -Mike
>         >
>         > On 5/30/2014 10:32 AM, Alan Woodward wrote:
>         > > Solr caches hold lucene docids, which are invalidated
>         every time a new searcher is opened.  The various fields for a
>         response aren't cached as far as I know, they're reloaded on
>         each request.  But loading the fields for 10 documents is
>         typically very fast, compared to searching over a very large
>         collection.
>         > >
>         > > Alan Woodward
>         > > www.flax.co.uk
>         > >
>         > >
>         > > On 30 May 2014, at 11:20, Nicola Buso wrote:
>         > >
>         > >> Hi Alan,
>         > >>
>         > >> just to make it more typical (yes there are not
>         IndexWriters open on
>         > >> that indexes) how solr is caching results? the first
>         thing I would like
>         > >> to do is to store the docs ids and return to the reader
>         for the real
>         > >> content. Is solr storing the whole results with all
>         values?
>         > >>
>         > >>
>         > >> nicola.
>         > >>
>         > >>
>         > >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
>         > >>> If the index is truly unchanging (ie there's no
>         IndexWriter open on
>         > >>> it) then I guess the document numbers will be stable
>         across reopens.
>         > >>> But this is a pretty specialized situation, and the docs
>         are really
>         > >>> there to warn you off trying to rely on this for more
>         typical uses.
>         > >>>
>         > >>> Alan Woodward
>         > >>> www.flax.co.uk
>         > >>>
>         > >>>
>         > >>>
>         > >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
>         > >>>
>         > >>>> Hi Alan,
>         > >>>>
>         > >>>> thanks a lot for the reply.
>         > >>>>
>         > >>>> For what I understood from your reply if the index is
>         not changing
>         > >>>> (no
>         > >>>> adds, deletes even updates) the docs id viewed by the
>         MultiReader
>         > >>>> will
>         > >>>> not change if you open more times that unchanged index
>         also in
>         > >>>> different
>         > >>>> environments.
>         > >>>>
>         > >>>> If this is true (my understanding) the word "ephemeral"
>         in the API
>         > >>>> could
>         > >>>> be elaborated a bit more.
>         > >>>>
>         > >>>>
>         > >>>> nicola
>         > >>>>
>         > >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
>         > >>>>> Hi Nicola,
>         > >>>>>
>         > >>>>>
>         > >>>>> 1) A session here means as long as you have that
>         MultiReader open.
>         > >>>>> IndexReaders see a snapshot of the index and so
>         document ids
>         > >>>>> shouldn't change over the lifetime of an IndexReader,
>         even if the
>         > >>>>> index is being updated.
>         > >>>>>
>         > >>>>>
>         > >>>>> 2) MultiReader just takes an array of subindexes, so
>         as long as
>         > >>>>> the
>         > >>>>> subindexes are passed to the MultiReader constructor
>         in the same
>         > >>>>> order
>         > >>>>> on both machines, the docBase assigned to each reader
>         context
>         > >>>>> should
>         > >>>>> be the same.
>         > >>>>>
>         > >>>>> Alan Woodward
>         > >>>>> www.flax.co.uk
>         > >>>>>
>         > >>>>>
>         > >>>>>
>         > >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
>         > >>>>>
>         > >>>>>> Hi,
>         > >>>>>>
>         > >>>>>> from the javadocs:
>         > >>>>>>
>         > >>>>>> ----
>         > >>>>>> For efficiency, in this API documents are often
>         referred to via
>         > >>>>>> document
>         > >>>>>> numbers, non-negative integers which each name a
>         unique document
>         > >>>>>> in
>         > >>>>>> the
>         > >>>>>> index. These document numbers are ephemeral -- they
>         may change
>         > >>>>>> as
>         > >>>>>> documents are added to and deleted from an index.
>         Clients should
>         > >>>>>> thus
>         > >>>>>> not rely on a given document having the same number
>         between
>         > >>>>>> sessions.
>         > >>>>>> ----
>         > >>>>>>
>         > >>>>>> What does it mean in this context "sessions"? Are
>         search
>         > >>>>>> sessions?
>         > >>>>>>
>         > >>>>>> 1) If I have an index that does not change (no
>         deletes or
>         > >>>>>> updates)
>         > >>>>>> and
>         > >>>>>> I'm keeping the MultiReader open, can the docid
>         change executing
>         > >>>>>> more
>         > >>>>>> times the same search on that reader?
>         > >>>>>>
>         > >>>>>> 2) Opening the same set of indexes in a MultiReader
>         on different
>         > >>>>>> machines will assign different docids to the same
>         document at
>         > >>>>>> runtime or
>         > >>>>>> the algorithm to calculate such docids in some way
>         can guarantee
>         > >>>>>> that
>         > >>>>>> static indexes will have the same docids in different
>         machines
>         > >>>>>> (than
>         > >>>>>> separated JVMs)?
>         > >>>>>>
>         > >>>>>>
>         > >>>>>> nicola.
>         > >>>>>>
>         > >>>>>>
>         > >>>>>>
>         > >>>>>> --
>         > >>>>>> Nicola Buso <nb...@ebi.ac.uk>
>         > >>>>>> EMBL-EBI
>         > >>>>>>
>         > >>>>>>
>         > >>>>>>
>         ---------------------------------------------------------------------
>         > >>>>>> To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         > >>>>>> For additional commands, e-mail:
>         > >>>>>> java-user-help@lucene.apache.org
>         > >>>>>>
>         > >>>>>>
>         > >>>>>
>         > >>>> --
>         > >>>> Nicola Buso <nb...@ebi.ac.uk>
>         > >>>> EMBL-EBI
>         > >>>>
>         > >>>>
>         > >>>
>         > >> --
>         > >> Nicola Buso
>         > >> Software Engineer - Web Production Team
>         > >>
>         > >> European Bioinformatics Institute (EMBL-EBI)
>         > >> European Molecular Biology Laboratory
>         > >>
>         > >> Wellcome Trust Genome Campus
>         > >> Hinxton
>         > >> Cambridge CB10 1SD
>         > >> United Kingdom
>         > >>
>         > >> URL: http://www.ebi.ac.uk
>         > >>
>         > >
>         >
>         
>         --
>         Nicola Buso
>         Software Engineer - Web Production Team
>         
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         United Kingdom
>         
>         URL: http://www.ebi.ac.uk
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
>         
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Erick Erickson <er...@gmail.com>.
If you do an optimize, btw, the internal doc IDs may change.....

But _why_ do you want to keep them? You may have very good reasons, but
it's not clear that this is necessary/desirable from what you've said so
far...

Best,
Erick


On Fri, May 30, 2014 at 7:49 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi,
>
> thanks Michael and Alan. Is enough to know that re-opening the index
> there is no guarantee that the docids are maintained also if the index
> does not change.
>
> And I will try the question also on the Solr mailinglist.
>
>
> nicola.
>
>
> On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
> > There is a Solr document cache that holds field values too, see:
> > http://wiki.apache.org/solr/SolrCaching
> >
> > Maybe take this question over to the solr mailing list?
> >
> > -Mike
> >
> > On 5/30/2014 10:32 AM, Alan Woodward wrote:
> > > Solr caches hold lucene docids, which are invalidated every time a new
> searcher is opened.  The various fields for a response aren't cached as far
> as I know, they're reloaded on each request.  But loading the fields for 10
> documents is typically very fast, compared to searching over a very large
> collection.
> > >
> > > Alan Woodward
> > > www.flax.co.uk
> > >
> > >
> > > On 30 May 2014, at 11:20, Nicola Buso wrote:
> > >
> > >> Hi Alan,
> > >>
> > >> just to make it more typical (yes there are not IndexWriters open on
> > >> that indexes) how solr is caching results? the first thing I would
> like
> > >> to do is to store the docs ids and return to the reader for the real
> > >> content. Is solr storing the whole results with all values?
> > >>
> > >>
> > >> nicola.
> > >>
> > >>
> > >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
> > >>> If the index is truly unchanging (ie there's no IndexWriter open on
> > >>> it) then I guess the document numbers will be stable across reopens.
> > >>> But this is a pretty specialized situation, and the docs are really
> > >>> there to warn you off trying to rely on this for more typical uses.
> > >>>
> > >>> Alan Woodward
> > >>> www.flax.co.uk
> > >>>
> > >>>
> > >>>
> > >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
> > >>>
> > >>>> Hi Alan,
> > >>>>
> > >>>> thanks a lot for the reply.
> > >>>>
> > >>>> For what I understood from your reply if the index is not changing
> > >>>> (no
> > >>>> adds, deletes even updates) the docs id viewed by the MultiReader
> > >>>> will
> > >>>> not change if you open more times that unchanged index also in
> > >>>> different
> > >>>> environments.
> > >>>>
> > >>>> If this is true (my understanding) the word "ephemeral" in the API
> > >>>> could
> > >>>> be elaborated a bit more.
> > >>>>
> > >>>>
> > >>>> nicola
> > >>>>
> > >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> > >>>>> Hi Nicola,
> > >>>>>
> > >>>>>
> > >>>>> 1) A session here means as long as you have that MultiReader open.
> > >>>>> IndexReaders see a snapshot of the index and so document ids
> > >>>>> shouldn't change over the lifetime of an IndexReader, even if the
> > >>>>> index is being updated.
> > >>>>>
> > >>>>>
> > >>>>> 2) MultiReader just takes an array of subindexes, so as long as
> > >>>>> the
> > >>>>> subindexes are passed to the MultiReader constructor in the same
> > >>>>> order
> > >>>>> on both machines, the docBase assigned to each reader context
> > >>>>> should
> > >>>>> be the same.
> > >>>>>
> > >>>>> Alan Woodward
> > >>>>> www.flax.co.uk
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> from the javadocs:
> > >>>>>>
> > >>>>>> ----
> > >>>>>> For efficiency, in this API documents are often referred to via
> > >>>>>> document
> > >>>>>> numbers, non-negative integers which each name a unique document
> > >>>>>> in
> > >>>>>> the
> > >>>>>> index. These document numbers are ephemeral -- they may change
> > >>>>>> as
> > >>>>>> documents are added to and deleted from an index. Clients should
> > >>>>>> thus
> > >>>>>> not rely on a given document having the same number between
> > >>>>>> sessions.
> > >>>>>> ----
> > >>>>>>
> > >>>>>> What does it mean in this context "sessions"? Are search
> > >>>>>> sessions?
> > >>>>>>
> > >>>>>> 1) If I have an index that does not change (no deletes or
> > >>>>>> updates)
> > >>>>>> and
> > >>>>>> I'm keeping the MultiReader open, can the docid change executing
> > >>>>>> more
> > >>>>>> times the same search on that reader?
> > >>>>>>
> > >>>>>> 2) Opening the same set of indexes in a MultiReader on different
> > >>>>>> machines will assign different docids to the same document at
> > >>>>>> runtime or
> > >>>>>> the algorithm to calculate such docids in some way can guarantee
> > >>>>>> that
> > >>>>>> static indexes will have the same docids in different machines
> > >>>>>> (than
> > >>>>>> separated JVMs)?
> > >>>>>>
> > >>>>>>
> > >>>>>> nicola.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Nicola Buso <nb...@ebi.ac.uk>
> > >>>>>> EMBL-EBI
> > >>>>>>
> > >>>>>>
> > >>>>>>
> ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>>> For additional commands, e-mail:
> > >>>>>> java-user-help@lucene.apache.org
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>> --
> > >>>> Nicola Buso <nb...@ebi.ac.uk>
> > >>>> EMBL-EBI
> > >>>>
> > >>>>
> > >>>
> > >> --
> > >> Nicola Buso
> > >> Software Engineer - Web Production Team
> > >>
> > >> European Bioinformatics Institute (EMBL-EBI)
> > >> European Molecular Biology Laboratory
> > >>
> > >> Wellcome Trust Genome Campus
> > >> Hinxton
> > >> Cambridge CB10 1SD
> > >> United Kingdom
> > >>
> > >> URL: http://www.ebi.ac.uk
> > >>
> > >
> >
>
> --
> Nicola Buso
> Software Engineer - Web Production Team
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> URL: http://www.ebi.ac.uk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: MultiReader docid reliability

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,

thanks Michael and Alan. Is enough to know that re-opening the index
there is no guarantee that the docids are maintained also if the index
does not change.

And I will try the question also on the Solr mailinglist.


nicola.


On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
> There is a Solr document cache that holds field values too, see: 
> http://wiki.apache.org/solr/SolrCaching
> 
> Maybe take this question over to the solr mailing list?
> 
> -Mike
> 
> On 5/30/2014 10:32 AM, Alan Woodward wrote:
> > Solr caches hold lucene docids, which are invalidated every time a new searcher is opened.  The various fields for a response aren't cached as far as I know, they're reloaded on each request.  But loading the fields for 10 documents is typically very fast, compared to searching over a very large collection.
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 30 May 2014, at 11:20, Nicola Buso wrote:
> >
> >> Hi Alan,
> >>
> >> just to make it more typical (yes there are not IndexWriters open on
> >> that indexes) how solr is caching results? the first thing I would like
> >> to do is to store the docs ids and return to the reader for the real
> >> content. Is solr storing the whole results with all values?
> >>
> >>
> >> nicola.
> >>
> >>
> >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
> >>> If the index is truly unchanging (ie there's no IndexWriter open on
> >>> it) then I guess the document numbers will be stable across reopens.
> >>> But this is a pretty specialized situation, and the docs are really
> >>> there to warn you off trying to rely on this for more typical uses.
> >>>
> >>> Alan Woodward
> >>> www.flax.co.uk
> >>>
> >>>
> >>>
> >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
> >>>
> >>>> Hi Alan,
> >>>>
> >>>> thanks a lot for the reply.
> >>>>
> >>>> For what I understood from your reply if the index is not changing
> >>>> (no
> >>>> adds, deletes even updates) the docs id viewed by the MultiReader
> >>>> will
> >>>> not change if you open more times that unchanged index also in
> >>>> different
> >>>> environments.
> >>>>
> >>>> If this is true (my understanding) the word "ephemeral" in the API
> >>>> could
> >>>> be elaborated a bit more.
> >>>>
> >>>>
> >>>> nicola
> >>>>
> >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> >>>>> Hi Nicola,
> >>>>>
> >>>>>
> >>>>> 1) A session here means as long as you have that MultiReader open.
> >>>>> IndexReaders see a snapshot of the index and so document ids
> >>>>> shouldn't change over the lifetime of an IndexReader, even if the
> >>>>> index is being updated.
> >>>>>
> >>>>>
> >>>>> 2) MultiReader just takes an array of subindexes, so as long as
> >>>>> the
> >>>>> subindexes are passed to the MultiReader constructor in the same
> >>>>> order
> >>>>> on both machines, the docBase assigned to each reader context
> >>>>> should
> >>>>> be the same.
> >>>>>
> >>>>> Alan Woodward
> >>>>> www.flax.co.uk
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> from the javadocs:
> >>>>>>
> >>>>>> ----
> >>>>>> For efficiency, in this API documents are often referred to via
> >>>>>> document
> >>>>>> numbers, non-negative integers which each name a unique document
> >>>>>> in
> >>>>>> the
> >>>>>> index. These document numbers are ephemeral -- they may change
> >>>>>> as
> >>>>>> documents are added to and deleted from an index. Clients should
> >>>>>> thus
> >>>>>> not rely on a given document having the same number between
> >>>>>> sessions.
> >>>>>> ----
> >>>>>>
> >>>>>> What does it mean in this context "sessions"? Are search
> >>>>>> sessions?
> >>>>>>
> >>>>>> 1) If I have an index that does not change (no deletes or
> >>>>>> updates)
> >>>>>> and
> >>>>>> I'm keeping the MultiReader open, can the docid change executing
> >>>>>> more
> >>>>>> times the same search on that reader?
> >>>>>>
> >>>>>> 2) Opening the same set of indexes in a MultiReader on different
> >>>>>> machines will assign different docids to the same document at
> >>>>>> runtime or
> >>>>>> the algorithm to calculate such docids in some way can guarantee
> >>>>>> that
> >>>>>> static indexes will have the same docids in different machines
> >>>>>> (than
> >>>>>> separated JVMs)?
> >>>>>>
> >>>>>>
> >>>>>> nicola.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -- 
> >>>>>> Nicola Buso <nb...@ebi.ac.uk>
> >>>>>> EMBL-EBI
> >>>>>>
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail:
> >>>>>> java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>
> >>>> -- 
> >>>> Nicola Buso <nb...@ebi.ac.uk>
> >>>> EMBL-EBI
> >>>>
> >>>>
> >>>
> >> -- 
> >> Nicola Buso
> >> Software Engineer - Web Production Team
> >>
> >> European Bioinformatics Institute (EMBL-EBI)
> >> European Molecular Biology Laboratory
> >>
> >> Wellcome Trust Genome Campus
> >> Hinxton
> >> Cambridge CB10 1SD
> >> United Kingdom
> >>
> >> URL: http://www.ebi.ac.uk
> >>
> >
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
There is a Solr document cache that holds field values too, see: 
http://wiki.apache.org/solr/SolrCaching

Maybe take this question over to the solr mailing list?

-Mike

On 5/30/2014 10:32 AM, Alan Woodward wrote:
> Solr caches hold lucene docids, which are invalidated every time a new searcher is opened.  The various fields for a response aren't cached as far as I know, they're reloaded on each request.  But loading the fields for 10 documents is typically very fast, compared to searching over a very large collection.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 30 May 2014, at 11:20, Nicola Buso wrote:
>
>> Hi Alan,
>>
>> just to make it more typical (yes there are not IndexWriters open on
>> that indexes) how solr is caching results? the first thing I would like
>> to do is to store the docs ids and return to the reader for the real
>> content. Is solr storing the whole results with all values?
>>
>>
>> nicola.
>>
>>
>> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
>>> If the index is truly unchanging (ie there's no IndexWriter open on
>>> it) then I guess the document numbers will be stable across reopens.
>>> But this is a pretty specialized situation, and the docs are really
>>> there to warn you off trying to rely on this for more typical uses.
>>>
>>> Alan Woodward
>>> www.flax.co.uk
>>>
>>>
>>>
>>> On 30 May 2014, at 10:39, Nicola Buso wrote:
>>>
>>>> Hi Alan,
>>>>
>>>> thanks a lot for the reply.
>>>>
>>>> For what I understood from your reply if the index is not changing
>>>> (no
>>>> adds, deletes even updates) the docs id viewed by the MultiReader
>>>> will
>>>> not change if you open more times that unchanged index also in
>>>> different
>>>> environments.
>>>>
>>>> If this is true (my understanding) the word "ephemeral" in the API
>>>> could
>>>> be elaborated a bit more.
>>>>
>>>>
>>>> nicola
>>>>
>>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
>>>>> Hi Nicola,
>>>>>
>>>>>
>>>>> 1) A session here means as long as you have that MultiReader open.
>>>>> IndexReaders see a snapshot of the index and so document ids
>>>>> shouldn't change over the lifetime of an IndexReader, even if the
>>>>> index is being updated.
>>>>>
>>>>>
>>>>> 2) MultiReader just takes an array of subindexes, so as long as
>>>>> the
>>>>> subindexes are passed to the MultiReader constructor in the same
>>>>> order
>>>>> on both machines, the docBase assigned to each reader context
>>>>> should
>>>>> be the same.
>>>>>
>>>>> Alan Woodward
>>>>> www.flax.co.uk
>>>>>
>>>>>
>>>>>
>>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> from the javadocs:
>>>>>>
>>>>>> ----
>>>>>> For efficiency, in this API documents are often referred to via
>>>>>> document
>>>>>> numbers, non-negative integers which each name a unique document
>>>>>> in
>>>>>> the
>>>>>> index. These document numbers are ephemeral -- they may change
>>>>>> as
>>>>>> documents are added to and deleted from an index. Clients should
>>>>>> thus
>>>>>> not rely on a given document having the same number between
>>>>>> sessions.
>>>>>> ----
>>>>>>
>>>>>> What does it mean in this context "sessions"? Are search
>>>>>> sessions?
>>>>>>
>>>>>> 1) If I have an index that does not change (no deletes or
>>>>>> updates)
>>>>>> and
>>>>>> I'm keeping the MultiReader open, can the docid change executing
>>>>>> more
>>>>>> times the same search on that reader?
>>>>>>
>>>>>> 2) Opening the same set of indexes in a MultiReader on different
>>>>>> machines will assign different docids to the same document at
>>>>>> runtime or
>>>>>> the algorithm to calculate such docids in some way can guarantee
>>>>>> that
>>>>>> static indexes will have the same docids in different machines
>>>>>> (than
>>>>>> separated JVMs)?
>>>>>>
>>>>>>
>>>>>> nicola.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Nicola Buso <nb...@ebi.ac.uk>
>>>>>> EMBL-EBI
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail:
>>>>>> java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> Nicola Buso <nb...@ebi.ac.uk>
>>>> EMBL-EBI
>>>>
>>>>
>>>
>> -- 
>> Nicola Buso
>> Software Engineer - Web Production Team
>>
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>>
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>> United Kingdom
>>
>> URL: http://www.ebi.ac.uk
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Alan Woodward <al...@flax.co.uk>.
Solr caches hold lucene docids, which are invalidated every time a new searcher is opened.  The various fields for a response aren't cached as far as I know, they're reloaded on each request.  But loading the fields for 10 documents is typically very fast, compared to searching over a very large collection.

Alan Woodward
www.flax.co.uk


On 30 May 2014, at 11:20, Nicola Buso wrote:

> Hi Alan,
> 
> just to make it more typical (yes there are not IndexWriters open on
> that indexes) how solr is caching results? the first thing I would like
> to do is to store the docs ids and return to the reader for the real
> content. Is solr storing the whole results with all values?
> 
> 
> nicola.
> 
> 
> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
>> If the index is truly unchanging (ie there's no IndexWriter open on
>> it) then I guess the document numbers will be stable across reopens.
>> But this is a pretty specialized situation, and the docs are really
>> there to warn you off trying to rely on this for more typical uses.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> 
>> On 30 May 2014, at 10:39, Nicola Buso wrote:
>> 
>>> Hi Alan,
>>> 
>>> thanks a lot for the reply.
>>> 
>>> For what I understood from your reply if the index is not changing
>>> (no
>>> adds, deletes even updates) the docs id viewed by the MultiReader
>>> will
>>> not change if you open more times that unchanged index also in
>>> different
>>> environments.
>>> 
>>> If this is true (my understanding) the word "ephemeral" in the API
>>> could
>>> be elaborated a bit more.
>>> 
>>> 
>>> nicola
>>> 
>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
>>>> Hi Nicola,
>>>> 
>>>> 
>>>> 1) A session here means as long as you have that MultiReader open.
>>>> IndexReaders see a snapshot of the index and so document ids
>>>> shouldn't change over the lifetime of an IndexReader, even if the
>>>> index is being updated.
>>>> 
>>>> 
>>>> 2) MultiReader just takes an array of subindexes, so as long as
>>>> the
>>>> subindexes are passed to the MultiReader constructor in the same
>>>> order
>>>> on both machines, the docBase assigned to each reader context
>>>> should
>>>> be the same.
>>>> 
>>>> Alan Woodward
>>>> www.flax.co.uk
>>>> 
>>>> 
>>>> 
>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> from the javadocs:
>>>>> 
>>>>> ----
>>>>> For efficiency, in this API documents are often referred to via
>>>>> document
>>>>> numbers, non-negative integers which each name a unique document
>>>>> in
>>>>> the
>>>>> index. These document numbers are ephemeral -- they may change
>>>>> as
>>>>> documents are added to and deleted from an index. Clients should
>>>>> thus
>>>>> not rely on a given document having the same number between
>>>>> sessions. 
>>>>> ----
>>>>> 
>>>>> What does it mean in this context "sessions"? Are search
>>>>> sessions?
>>>>> 
>>>>> 1) If I have an index that does not change (no deletes or
>>>>> updates)
>>>>> and
>>>>> I'm keeping the MultiReader open, can the docid change executing
>>>>> more
>>>>> times the same search on that reader?
>>>>> 
>>>>> 2) Opening the same set of indexes in a MultiReader on different
>>>>> machines will assign different docids to the same document at
>>>>> runtime or
>>>>> the algorithm to calculate such docids in some way can guarantee
>>>>> that
>>>>> static indexes will have the same docids in different machines
>>>>> (than
>>>>> separated JVMs)?
>>>>> 
>>>>> 
>>>>> nicola.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Nicola Buso <nb...@ebi.ac.uk>
>>>>> EMBL-EBI
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail:
>>>>> java-user-help@lucene.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> Nicola Buso <nb...@ebi.ac.uk>
>>> EMBL-EBI
>>> 
>>> 
>> 
>> 
> 
> -- 
> Nicola Buso
> Software Engineer - Web Production Team
> 
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> 
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
> 
> URL: http://www.ebi.ac.uk
> 


Re: MultiReader docid reliability

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Alan,

just to make it more typical (yes there are not IndexWriters open on
that indexes) how solr is caching results? the first thing I would like
to do is to store the docs ids and return to the reader for the real
content. Is solr storing the whole results with all values?


nicola.


On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
> If the index is truly unchanging (ie there's no IndexWriter open on
> it) then I guess the document numbers will be stable across reopens.
>  But this is a pretty specialized situation, and the docs are really
> there to warn you off trying to rely on this for more typical uses.
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> 
> On 30 May 2014, at 10:39, Nicola Buso wrote:
> 
> > Hi Alan,
> > 
> > thanks a lot for the reply.
> > 
> > For what I understood from your reply if the index is not changing
> > (no
> > adds, deletes even updates) the docs id viewed by the MultiReader
> > will
> > not change if you open more times that unchanged index also in
> > different
> > environments.
> > 
> > If this is true (my understanding) the word "ephemeral" in the API
> > could
> > be elaborated a bit more.
> > 
> > 
> > nicola
> > 
> > On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> > > Hi Nicola,
> > > 
> > > 
> > > 1) A session here means as long as you have that MultiReader open.
> > > IndexReaders see a snapshot of the index and so document ids
> > > shouldn't change over the lifetime of an IndexReader, even if the
> > > index is being updated.
> > > 
> > > 
> > > 2) MultiReader just takes an array of subindexes, so as long as
> > > the
> > > subindexes are passed to the MultiReader constructor in the same
> > > order
> > > on both machines, the docBase assigned to each reader context
> > > should
> > > be the same.
> > > 
> > > Alan Woodward
> > > www.flax.co.uk
> > > 
> > > 
> > > 
> > > On 29 May 2014, at 14:29, Nicola Buso wrote:
> > > 
> > > > Hi,
> > > > 
> > > > from the javadocs:
> > > > 
> > > > ----
> > > > For efficiency, in this API documents are often referred to via
> > > > document
> > > > numbers, non-negative integers which each name a unique document
> > > > in
> > > > the
> > > > index. These document numbers are ephemeral -- they may change
> > > > as
> > > > documents are added to and deleted from an index. Clients should
> > > > thus
> > > > not rely on a given document having the same number between
> > > > sessions. 
> > > > ----
> > > > 
> > > > What does it mean in this context "sessions"? Are search
> > > > sessions?
> > > > 
> > > > 1) If I have an index that does not change (no deletes or
> > > > updates)
> > > > and
> > > > I'm keeping the MultiReader open, can the docid change executing
> > > > more
> > > > times the same search on that reader?
> > > > 
> > > > 2) Opening the same set of indexes in a MultiReader on different
> > > > machines will assign different docids to the same document at
> > > > runtime or
> > > > the algorithm to calculate such docids in some way can guarantee
> > > > that
> > > > static indexes will have the same docids in different machines
> > > > (than
> > > > separated JVMs)?
> > > > 
> > > > 
> > > > nicola.
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > Nicola Buso <nb...@ebi.ac.uk>
> > > > EMBL-EBI
> > > > 
> > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > 
> > > > 
> > > 
> > > 
> > 
> > -- 
> > Nicola Buso <nb...@ebi.ac.uk>
> > EMBL-EBI
> > 
> > 
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Alan Woodward <al...@flax.co.uk>.
If the index is truly unchanging (ie there's no IndexWriter open on it) then I guess the document numbers will be stable across reopens.  But this is a pretty specialized situation, and the docs are really there to warn you off trying to rely on this for more typical uses.

Alan Woodward
www.flax.co.uk


On 30 May 2014, at 10:39, Nicola Buso wrote:

> Hi Alan,
> 
> thanks a lot for the reply.
> 
> For what I understood from your reply if the index is not changing (no
> adds, deletes even updates) the docs id viewed by the MultiReader will
> not change if you open more times that unchanged index also in different
> environments.
> 
> If this is true (my understanding) the word "ephemeral" in the API could
> be elaborated a bit more.
> 
> 
> nicola
> 
> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
>> Hi Nicola,
>> 
>> 
>> 1) A session here means as long as you have that MultiReader open.
>> IndexReaders see a snapshot of the index and so document ids
>> shouldn't change over the lifetime of an IndexReader, even if the
>> index is being updated.
>> 
>> 
>> 2) MultiReader just takes an array of subindexes, so as long as the
>> subindexes are passed to the MultiReader constructor in the same order
>> on both machines, the docBase assigned to each reader context should
>> be the same.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> 
>> On 29 May 2014, at 14:29, Nicola Buso wrote:
>> 
>>> Hi,
>>> 
>>> from the javadocs:
>>> 
>>> ----
>>> For efficiency, in this API documents are often referred to via
>>> document
>>> numbers, non-negative integers which each name a unique document in
>>> the
>>> index. These document numbers are ephemeral -- they may change as
>>> documents are added to and deleted from an index. Clients should
>>> thus
>>> not rely on a given document having the same number between
>>> sessions. 
>>> ----
>>> 
>>> What does it mean in this context "sessions"? Are search sessions?
>>> 
>>> 1) If I have an index that does not change (no deletes or updates)
>>> and
>>> I'm keeping the MultiReader open, can the docid change executing
>>> more
>>> times the same search on that reader?
>>> 
>>> 2) Opening the same set of indexes in a MultiReader on different
>>> machines will assign different docids to the same document at
>>> runtime or
>>> the algorithm to calculate such docids in some way can guarantee
>>> that
>>> static indexes will have the same docids in different machines (than
>>> separated JVMs)?
>>> 
>>> 
>>> nicola.
>>> 
>>> 
>>> 
>>> -- 
>>> Nicola Buso <nb...@ebi.ac.uk>
>>> EMBL-EBI
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>>> 
>> 
>> 
> 
> -- 
> Nicola Buso <nb...@ebi.ac.uk>
> EMBL-EBI
> 


Re: MultiReader docid reliability

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Alan,

thanks a lot for the reply.

For what I understood from your reply if the index is not changing (no
adds, deletes even updates) the docs id viewed by the MultiReader will
not change if you open more times that unchanged index also in different
environments.

If this is true (my understanding) the word "ephemeral" in the API could
be elaborated a bit more.


nicola

On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> Hi Nicola,
> 
> 
> 1) A session here means as long as you have that MultiReader open.
>  IndexReaders see a snapshot of the index and so document ids
> shouldn't change over the lifetime of an IndexReader, even if the
> index is being updated.
> 
> 
> 2) MultiReader just takes an array of subindexes, so as long as the
> subindexes are passed to the MultiReader constructor in the same order
> on both machines, the docBase assigned to each reader context should
> be the same.
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> 
> On 29 May 2014, at 14:29, Nicola Buso wrote:
> 
> > Hi,
> > 
> > from the javadocs:
> > 
> > ----
> > For efficiency, in this API documents are often referred to via
> > document
> > numbers, non-negative integers which each name a unique document in
> > the
> > index. These document numbers are ephemeral -- they may change as
> > documents are added to and deleted from an index. Clients should
> > thus
> > not rely on a given document having the same number between
> > sessions. 
> > ----
> > 
> > What does it mean in this context "sessions"? Are search sessions?
> > 
> > 1) If I have an index that does not change (no deletes or updates)
> > and
> > I'm keeping the MultiReader open, can the docid change executing
> > more
> > times the same search on that reader?
> > 
> > 2) Opening the same set of indexes in a MultiReader on different
> > machines will assign different docids to the same document at
> > runtime or
> > the algorithm to calculate such docids in some way can guarantee
> > that
> > static indexes will have the same docids in different machines (than
> > separated JVMs)?
> > 
> > 
> > nicola.
> > 
> > 
> > 
> > -- 
> > Nicola Buso <nb...@ebi.ac.uk>
> > EMBL-EBI
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 

-- 
Nicola Buso <nb...@ebi.ac.uk>
EMBL-EBI


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiReader docid reliability

Posted by Alan Woodward <al...@flax.co.uk>.
Hi Nicola,

1) A session here means as long as you have that MultiReader open.  IndexReaders see a snapshot of the index and so document ids shouldn't change over the lifetime of an IndexReader, even if the index is being updated.

2) MultiReader just takes an array of subindexes, so as long as the subindexes are passed to the MultiReader constructor in the same order on both machines, the docBase assigned to each reader context should be the same.

Alan Woodward
www.flax.co.uk


On 29 May 2014, at 14:29, Nicola Buso wrote:

> Hi,
> 
> from the javadocs:
> 
> ----
> For efficiency, in this API documents are often referred to via document
> numbers, non-negative integers which each name a unique document in the
> index. These document numbers are ephemeral -- they may change as
> documents are added to and deleted from an index. Clients should thus
> not rely on a given document having the same number between sessions. 
> ----
> 
> What does it mean in this context "sessions"? Are search sessions?
> 
> 1) If I have an index that does not change (no deletes or updates) and
> I'm keeping the MultiReader open, can the docid change executing more
> times the same search on that reader?
> 
> 2) Opening the same set of indexes in a MultiReader on different
> machines will assign different docids to the same document at runtime or
> the algorithm to calculate such docids in some way can guarantee that
> static indexes will have the same docids in different machines (than
> separated JVMs)?
> 
> 
> nicola.
> 
> 
> 
> -- 
> Nicola Buso <nb...@ebi.ac.uk>
> EMBL-EBI
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>