You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by John Wang <jo...@gmail.com> on 2009/09/22 06:56:11 UTC

2.9 NRT w.r.t. sorting and field cache

Looking at the code, seems there is a disconnect between how/when field
cache is loaded when IndexWriter.getReader() is called.

Is FieldCache updated? Otherwise, are we reloading FieldCache for each
reader instance?

Seems for operations that lazy loads field cache, e.g. sorting, this has a
significant performance issue.

Please advise.

Thanks

-John

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Michael McCandless <lu...@mikemccandless.com>.

The cat is awake!  And catching up on all these interesting emails...

I think Mark said it all very well :)

The warming you actually do in what you pass to
setMergedSegmentWarmer, I think, should look just like the "normal"
warming you'd do before bringing a newly opened reader into
production.

Lucene must load norms per searched field, so run one search per such
field.  It loads FieldCache entry for sorting/function queries, so run
one search for each such field.  As of 2.9, terms dict index is now
loaded when you open the reader.  Then, if you want to warm the OS's
IO cache to match your "frequent queries" you need to carry over some
sort cache that tracks such queries (like Solr).

The warming is the same, but when you do it in IndexWriter, when a
large segment merge completes, it won't block your ongoing streams of
updates which is usually crucial in a large scale NRT app.

Nevertheless, the shear CPU and IO cost of merging, and Java's
inability to down-prioritize merging IO and control the OS's IO cache,
will still impact search performance.  So it may still be necessary to
entirely avoid large segment merges.

Mike

On Tue, Sep 22, 2009 at 8:46 PM, Mark Miller <ma...@gmail.com> wrote:
> What I would do is:
>
> In the warm method, load a FieldCache for every field I was going to end
> up using a FieldCache for.
> If its just for sorting, I might do a search with a sort on every field
> I was going to sort on.
> That will get the segment FieldCaches into RAM before the SegmentReader
> is put into use.
>
> I might also do a search or two that hits a lot of terms to get some of
> the index into RAM. Or maybe walk a termenum- or anything one normally
> does when warming Readers (like Solr does, or many other home grown
> solutions.)
>
> I don't think there is anything special in this case. You don't have to
> hit it with every unique search you expect it to see - you just get some
> key pieces (especially the FieldCaches) into RAM.
>
> Don't give mike a hard time about his valuable time - I'm sure he would
> have answered, but he's likely in bed (That cat wakes early it seems. ).
> He's a lot nicer than I am ;)
>
> John Wang wrote:
>> No worries.
>> Just trying to understand things.
>>
>> I wanted to double check but didn't want to write "My IDE told me that
>> was the case" to sound pissy.
>>
>> I did look at the code, sometimes too much actually, but I never want
>> to claim I understand the code 100%, hence going to the source is
>> probably the best, even at the expense of sounding dumb, it is usually
>> worthy it ;)
>>
>> My question is more on how would a person do it on the public API
>> level without having to hack into the source code.
>>
>> My main misunderstanding at this point is that I had thought
>> IndexReaderWarmer can directly warm the field cache deterministically.
>>
>> Thanks
>>
>> -John
>>
>> On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <markrmiller@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Don't take me too seriously John - I doubt anyone does :)
>>
>>     And I wasn't implying Mike's time was more valuable than yours. I was
>>     being ... uh ... me :)
>>
>>     And I don't claim that all of your many questions could have been
>>     found
>>     in 5 seconds ;)
>>
>>     Just the ones you were asking - its very quick (at least with eclipse)
>>     to see that there is no default impl.
>>     Its also very quick to see that a segment reader is passed to the warm
>>     method every time. I think its just
>>     a generic IndexReader because you would warm a multi-reader the
>>     same way
>>     as a segmentreader.
>>
>>     I was just suggesting you look at the code a bit, because I think its
>>     fairly easy to figure out the basics of the warmer (hey, if I can
>>     do it
>>     ;) ).
>>
>>     Again, don't take me too seriously. I send out my comments faster
>>     than I
>>     can think of them. And I've probably wasted more of Mike's time
>>     than anyone.
>>
>>     The only way you will load the entire FieldCache is to use a top level
>>     Reader outside of the core API - the core api works per segment
>>     now. And
>>     the IndexReaderWarmer is always passed a segmentreader from the
>>     readerPool.
>>
>>     - Mark
>>
>>     John Wang wrote:
>>     > Mark:
>>     >
>>     > I did spend at least a quarter of an ounce. :) And I am sure Mike's
>>     > time is more valuable than mine, but it was meant to be a
>>     "double-check"
>>     >
>>     > I was under the impression there is a default impl from previous
>>     email
>>     > threads on how to handle field cache warming, perhaps I
>>     misunderstood.
>>     >
>>     > The real question here is "warms the reader" From a public API point
>>     > of view, I wasn't sure if passing in a IndexReader impl is something
>>     > we can do to avoid loading the entire field cache. e.g. would I need
>>     > to down cast? can it be a filtered reader? etc.
>>     >
>>     > If you think there is something I could have done witin 5 sec,
>>     please
>>     > point me to the right direction.
>>     >
>>     > Thanks
>>     >
>>     > -John
>>     >
>>     > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller
>>     <markrmiller@gmail.com <ma...@gmail.com>
>>     > <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>>     wrote:
>>     >
>>     >     Come on dude :) Spend a half ounce of effort first. Mike's
>>     time is too
>>     >     valuable !
>>     >
>>     >     Luckily mine is not.
>>     >
>>     >     There is no default impl - the class is dead simple (and the
>>     class has
>>     >     been pointed out like 3 times in this thread - I'm not even
>>     fully
>>     >     following and I know where to find it):
>>     >
>>     >      public static abstract class IndexReaderWarmer {
>>     >        public abstract void warm(IndexReader reader) throws
>>     IOException;
>>     >      }
>>     >
>>     >     Now pass something in that warms the reader. Load a
>>     fieldcache - do a
>>     >     search. Do the hokey pokey and turn your self around ...
>>     >
>>     >     Investigation time: 5 seconds.
>>     >
>>     >     John Wang wrote:
>>     >     > Hi Michael:
>>     >     >
>>     >     >      Thanks for the pointer!
>>     >     >
>>     >     >       Pardon my ignorance, but I am still no seeing the
>>     connection
>>     >     > between this api to per/segment loading of FieldCache.
>>     (the api
>>     >     takes
>>     >     > in an IndexReader instead of maybe SegmentReader[])
>>     >     >
>>     >     >       Can you point me to maybe the default impl of
>>     >     IndexReaderWarmer
>>     >     > to help me understand?
>>     >     >
>>     >     > Thanks
>>     >     >
>>     >     > -John
>>     >     >
>>     >     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>>     >     > <lucene@mikemccandless.com
>>     <ma...@mikemccandless.com>
>>     <mailto:lucene@mikemccandless.com <ma...@mikemccandless.com>>
>>     >     <mailto:lucene@mikemccandless.com
>>     <ma...@mikemccandless.com>
>>     >     <mailto:lucene@mikemccandless.com
>>     <ma...@mikemccandless.com>>>> wrote:
>>     >     >
>>     >     >     This is exactly why we added
>>     >     IndexWriter.setMergedSegmentWarmer -- you
>>     >     >     can warm the reader w/o blocking ongoing updates.
>>     >     >
>>     >     >     Mike
>>     >     >
>>     >     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>>     >     >     <markrmiller@gmail.com <ma...@gmail.com>
>>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>
>>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>> wrote:
>>     >     >     > Right - when a large segment is invalidated, you
>>     will have
>>     >     a bigger
>>     >     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
>>     >     the *whole*
>>     >     >     > field cache every time though. Sounds like you are
>>     trying to
>>     >     >     deal with
>>     >     >     > those large segments changing anyway :) They are
>>     always an
>>     >     issue
>>     >     >     when
>>     >     >     > doing RT it seems.
>>     >     >     >
>>     >     >     > I don't believe deletes invalidate a field cache -
>>     terms from
>>     >     >     deleted
>>     >     >     > docs stay in a field cache and segmentreaders use their
>>     >     >     freqStream as
>>     >     >     > the fieldcache key. Only when the deletes are merged out
>>     >     would they
>>     >     >     > invalidate - but because your writing a new segment
>>     anyway ...
>>     >     >     >
>>     >     >     > - Mark
>>     >     >     >
>>     >     >     > John Wang wrote:
>>     >     >     >> I understand what you are saying. Let me detail
>>     what I am
>>     >     >     trying to say:
>>     >     >     >>
>>     >     >     >> When "currently processed segments" are flushed down,
>>     >     merge may
>>     >     >     >> happen. When merges happen, some of those "stable
>>     >     segments" will be
>>     >     >     >> invalidated, and so will the fieldcache data keyed
>>     by them.
>>     >     >     >>
>>     >     >     >> In a high update environment, such scenarios can
>>     happen quite
>>     >     >     often.
>>     >     >     >>
>>     >     >     >> The way the default mergePolicy works is that small
>>     >     segments get
>>     >     >     >> merged into the larger segments. Eventually, what
>>     will be
>>     >     >     invalidated
>>     >     >     >> would be a large segment, and when that happens, a
>>     large
>>     >     chunk
>>     >     >     of the
>>     >     >     >> field cache would be invalidated.
>>     >     >     >>
>>     >     >     >> Furthermore, in the case where there are high updates,
>>     >     the stable
>>     >     >     >> segments can be invalidate much sooner when there
>>     are deletes
>>     >     >     in those
>>     >     >     >> segments, and I would guess the corresponding
>>     FieldCache
>>     >     needs
>>     >     >     to be
>>     >     >     >> adjusted. Not sure how it is handled right now.
>>     >     >     >>
>>     >     >     >> Just my two cents, and of course when I find the
>>     time I will
>>     >     >     need to
>>     >     >     >> run some tests to see.
>>     >     >     >>
>>     >     >     >> -John
>>     >     >     >>
>>     >     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
>>     >     <uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>>     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
>>     >     >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>> wrote:
>>     >     >     >>
>>     >     >     >>     The NRT reader coming from the
>>     >     IndexWriter.getReader() has only
>>     >     >     >>     changes in the currently processed segments, the
>>     >     other segments
>>     >     >     >>     keep stable (and even their IndexReader keys
>>     used for the
>>     >     >     >>     FieldCache). The rest of the segments keep stable.
>>     >     For the
>>     >     >     >>     consumer it looks like a normal reader (it is
>>     in fact a
>>     >     >     >>     ReadOnlyDirectoryReader) supporting
>>     >     >     getSequentialSubReaders() and
>>     >     >     >>     so on.
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>     -----
>>     >     >     >>     Uwe Schindler
>>     >     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>>     >     >     >>     http://www.thetaphi.de
>>     >     >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
>>     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>
>>     >     >     >>
>>     >     >     >>
>>     >     >
>>     >
>>     ------------------------------------------------------------------------
>>     >     >     >>
>>     >     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     >     >     <mailto:john.wang@gmail.com
>>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>>     <ma...@gmail.com>>>
>>     >     >     >>     <mailto:john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>]
>>     >     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>>     >     >     >>     *To:* java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>>>>
>>     >     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field
>>     cache
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>     Thanks Mark for the pointer!
>>     >     >     >>
>>     >     >     >>     I guess my point is with NRT, and when segment
>>     files
>>     >     change
>>     >     >     often,
>>     >     >     >>     this would be an issue, no?
>>     >     >     >>
>>     >     >     >>     Anyway, I can run some tests.
>>     >     >     >>
>>     >     >     >>     Thanks
>>     >     >     >>
>>     >     >     >>     -John
>>     >     >     >>
>>     >     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>>     >     >     >>     <markrmiller@gmail.com
>>     <ma...@gmail.com> <mailto:markrmiller@gmail.com
>>     <ma...@gmail.com>>
>>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>>     >     >     <mailto:markrmiller@gmail.com
>>     <ma...@gmail.com> <mailto:markrmiller@gmail.com
>>     <ma...@gmail.com>>
>>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>>>
>>     wrote:
>>     >     >     >>
>>     >     >     >>     1483 - indexsearcher pulls out a readers subreaders
>>     >     >     >>     (segmentreaders) and sends a collector over
>>     them one
>>     >     by one,
>>     >     >     >>     rather than using the multireader. So only fc
>>     for seg
>>     >     >     readers that
>>     >     >     >>     change need to be reloaded.
>>     >     >     >>
>>     >     >     >>     - Mark
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>     http://www.lucidimagination.com (mobile)
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
>>     >     <john.wang@gmail.com <ma...@gmail.com>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     >     >     <mailto:john.wang@gmail.com
>>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>>     <ma...@gmail.com>>>
>>     >     >     >>     <mailto:john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
>>     >     >     wrote:
>>     >     >     >>
>>     >     >     >>>     Hi Yonik:
>>     >     >     >>>
>>     >     >     >>>          Actually that is what I am looking for.
>>     Can you
>>     >     >     please point
>>     >     >     >>>     me to where/how sorting is done per-segment?
>>     >     >     >>>
>>     >     >     >>>          When heaving indexing introduces or modifies
>>     >     >     segments, would
>>     >     >     >>>     it cause reloading of FieldCache at query time and
>>     >     thus would
>>     >     >     >>>     impact search performance?
>>     >     >     >>>
>>     >     >     >>>     thanks
>>     >     >     >>>
>>     >     >     >>>     -John
>>     >     >     >>>
>>     >     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>     >     >     >>>     <yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>
>>     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>>
>>     >     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>
>>     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>>>
>>     >     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>
>>     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>>
>>     >     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>
>>     >     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>>>>>
>>     >     >     >>>     wrote:
>>     >     >     >>>
>>     >     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>>     >     >     <john.wang@gmail.com <ma...@gmail.com>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>>     >     >     >>>     <mailto:john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
>>     >     >     wrote:
>>     >     >     >>>     > Looking at the code, seems there is a disconnect
>>     >     between
>>     >     >     >>>     how/when field
>>     >     >     >>>     > cache is loaded when IndexWriter.getReader() is
>>     >     called.
>>     >     >     >>>
>>     >     >     >>>     I'm not sure what you mean by "disconnect"
>>     >     >     >>>
>>     >     >     >>>     > Is FieldCache updated?
>>     >     >     >>>
>>     >     >     >>>     FieldCache entries are populated on demand, as
>>     they
>>     >     always
>>     >     >     have been.
>>     >     >     >>>
>>     >     >     >>>
>>     >     >     >>>     > Otherwise, are we reloading FieldCache for each
>>     >     >     >>>     > reader instance?
>>     >     >     >>>
>>     >     >     >>>     Searching/sorting is now per-segment, and so
>>     is the
>>     >     use of the
>>     >     >     >>>     FieldCache.  Segments that don't change shouldn't
>>     >     have to
>>     >     >     reload
>>     >     >     >>>     their
>>     >     >     >>>     FieldCache entries.
>>     >     >     >>>
>>     >     >     >>>     -Yonik
>>     >     >     >>>     http://www.lucidimagination.com
>>     >     >     >>>
>>     >     >     >>>
>>     >     >
>>     >
>>     ---------------------------------------------------------------------
>>     >     >     >>>     To unsubscribe, e-mail:
>>     >     >     java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>>>
>>     >     >     >>>     For additional commands, e-mail:
>>     >     >     java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     >>>     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>>>
>>     >     >     >>>
>>     >     >     >>>
>>     >     >     >>>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >
>>     >     >     >
>>     >     >     > --
>>     >     >     > - Mark
>>     >     >     >
>>     >     >     > http://www.lucidimagination.com
>>     >     >     >
>>     >     >     >
>>     >     >     >
>>     >     >     >
>>     >     >     >
>>     >     >
>>     >
>>     ---------------------------------------------------------------------
>>     >     >     > To unsubscribe, e-mail:
>>     >     java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     > For additional commands, e-mail:
>>     >     java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     >
>>     >     >     >
>>     >     >
>>     >     >
>>     >
>>     ---------------------------------------------------------------------
>>     >     >     To unsubscribe, e-mail:
>>     >     java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >     For additional commands, e-mail:
>>     >     java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>>
>>     >     >
>>     >     >
>>     >
>>     >
>>     >     --
>>     >     - Mark
>>     >
>>     >     http://www.lucidimagination.com
>>     >
>>     >
>>     >
>>     >
>>     >
>>     ---------------------------------------------------------------------
>>     >     To unsubscribe, e-mail:
>>     java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >     For additional commands, e-mail:
>>     java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >
>>     >
>>
>>
>>     --
>>     - Mark
>>
>>     http://www.lucidimagination.com
>>
>>
>>
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Thanks Mark for the information!

-John

On Wed, Sep 23, 2009 at 8:46 AM, Mark Miller <ma...@gmail.com> wrote:

> What I would do is:
>
> In the warm method, load a FieldCache for every field I was going to end
> up using a FieldCache for.
> If its just for sorting, I might do a search with a sort on every field
> I was going to sort on.
> That will get the segment FieldCaches into RAM before the SegmentReader
> is put into use.
>
> I might also do a search or two that hits a lot of terms to get some of
> the index into RAM. Or maybe walk a termenum- or anything one normally
> does when warming Readers (like Solr does, or many other home grown
> solutions.)
>
> I don't think there is anything special in this case. You don't have to
> hit it with every unique search you expect it to see - you just get some
> key pieces (especially the FieldCaches) into RAM.
>
> Don't give mike a hard time about his valuable time - I'm sure he would
> have answered, but he's likely in bed (That cat wakes early it seems. ).
> He's a lot nicer than I am ;)
>
> John Wang wrote:
> > No worries.
> > Just trying to understand things.
> >
> > I wanted to double check but didn't want to write "My IDE told me that
> > was the case" to sound pissy.
> >
> > I did look at the code, sometimes too much actually, but I never want
> > to claim I understand the code 100%, hence going to the source is
> > probably the best, even at the expense of sounding dumb, it is usually
> > worthy it ;)
> >
> > My question is more on how would a person do it on the public API
> > level without having to hack into the source code.
> >
> > My main misunderstanding at this point is that I had thought
> > IndexReaderWarmer can directly warm the field cache deterministically.
> >
> > Thanks
> >
> > -John
> >
> > On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <markrmiller@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Don't take me too seriously John - I doubt anyone does :)
> >
> >     And I wasn't implying Mike's time was more valuable than yours. I was
> >     being ... uh ... me :)
> >
> >     And I don't claim that all of your many questions could have been
> >     found
> >     in 5 seconds ;)
> >
> >     Just the ones you were asking - its very quick (at least with
> eclipse)
> >     to see that there is no default impl.
> >     Its also very quick to see that a segment reader is passed to the
> warm
> >     method every time. I think its just
> >     a generic IndexReader because you would warm a multi-reader the
> >     same way
> >     as a segmentreader.
> >
> >     I was just suggesting you look at the code a bit, because I think its
> >     fairly easy to figure out the basics of the warmer (hey, if I can
> >     do it
> >     ;) ).
> >
> >     Again, don't take me too seriously. I send out my comments faster
> >     than I
> >     can think of them. And I've probably wasted more of Mike's time
> >     than anyone.
> >
> >     The only way you will load the entire FieldCache is to use a top
> level
> >     Reader outside of the core API - the core api works per segment
> >     now. And
> >     the IndexReaderWarmer is always passed a segmentreader from the
> >     readerPool.
> >
> >     - Mark
> >
> >     John Wang wrote:
> >     > Mark:
> >     >
> >     > I did spend at least a quarter of an ounce. :) And I am sure Mike's
> >     > time is more valuable than mine, but it was meant to be a
> >     "double-check"
> >     >
> >     > I was under the impression there is a default impl from previous
> >     email
> >     > threads on how to handle field cache warming, perhaps I
> >     misunderstood.
> >     >
> >     > The real question here is "warms the reader" From a public API
> point
> >     > of view, I wasn't sure if passing in a IndexReader impl is
> something
> >     > we can do to avoid loading the entire field cache. e.g. would I
> need
> >     > to down cast? can it be a filtered reader? etc.
> >     >
> >     > If you think there is something I could have done witin 5 sec,
> >     please
> >     > point me to the right direction.
> >     >
> >     > Thanks
> >     >
> >     > -John
> >     >
> >     > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller
> >     <markrmiller@gmail.com <ma...@gmail.com>
> >     > <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >     >
> >     >     Come on dude :) Spend a half ounce of effort first. Mike's
> >     time is too
> >     >     valuable !
> >     >
> >     >     Luckily mine is not.
> >     >
> >     >     There is no default impl - the class is dead simple (and the
> >     class has
> >     >     been pointed out like 3 times in this thread - I'm not even
> >     fully
> >     >     following and I know where to find it):
> >     >
> >     >      public static abstract class IndexReaderWarmer {
> >     >        public abstract void warm(IndexReader reader) throws
> >     IOException;
> >     >      }
> >     >
> >     >     Now pass something in that warms the reader. Load a
> >     fieldcache - do a
> >     >     search. Do the hokey pokey and turn your self around ...
> >     >
> >     >     Investigation time: 5 seconds.
> >     >
> >     >     John Wang wrote:
> >     >     > Hi Michael:
> >     >     >
> >     >     >      Thanks for the pointer!
> >     >     >
> >     >     >       Pardon my ignorance, but I am still no seeing the
> >     connection
> >     >     > between this api to per/segment loading of FieldCache.
> >     (the api
> >     >     takes
> >     >     > in an IndexReader instead of maybe SegmentReader[])
> >     >     >
> >     >     >       Can you point me to maybe the default impl of
> >     >     IndexReaderWarmer
> >     >     > to help me understand?
> >     >     >
> >     >     > Thanks
> >     >     >
> >     >     > -John
> >     >     >
> >     >     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> >     >     > <lucene@mikemccandless.com
> >     <ma...@mikemccandless.com>
> >     <mailto:lucene@mikemccandless.com <mailto:lucene@mikemccandless.com
> >>
> >     >     <mailto:lucene@mikemccandless.com
> >     <ma...@mikemccandless.com>
> >     >     <mailto:lucene@mikemccandless.com
> >     <ma...@mikemccandless.com>>>> wrote:
> >     >     >
> >     >     >     This is exactly why we added
> >     >     IndexWriter.setMergedSegmentWarmer -- you
> >     >     >     can warm the reader w/o blocking ongoing updates.
> >     >     >
> >     >     >     Mike
> >     >     >
> >     >     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >     >     >     <markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>
> >     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>>
> wrote:
> >     >     >     > Right - when a large segment is invalidated, you
> >     will have
> >     >     a bigger
> >     >     >     > fieldcache piece to reload - pre 2.9, you'd be
> reloading
> >     >     the *whole*
> >     >     >     > field cache every time though. Sounds like you are
> >     trying to
> >     >     >     deal with
> >     >     >     > those large segments changing anyway :) They are
> >     always an
> >     >     issue
> >     >     >     when
> >     >     >     > doing RT it seems.
> >     >     >     >
> >     >     >     > I don't believe deletes invalidate a field cache -
> >     terms from
> >     >     >     deleted
> >     >     >     > docs stay in a field cache and segmentreaders use their
> >     >     >     freqStream as
> >     >     >     > the fieldcache key. Only when the deletes are merged
> out
> >     >     would they
> >     >     >     > invalidate - but because your writing a new segment
> >     anyway ...
> >     >     >     >
> >     >     >     > - Mark
> >     >     >     >
> >     >     >     > John Wang wrote:
> >     >     >     >> I understand what you are saying. Let me detail
> >     what I am
> >     >     >     trying to say:
> >     >     >     >>
> >     >     >     >> When "currently processed segments" are flushed down,
> >     >     merge may
> >     >     >     >> happen. When merges happen, some of those "stable
> >     >     segments" will be
> >     >     >     >> invalidated, and so will the fieldcache data keyed
> >     by them.
> >     >     >     >>
> >     >     >     >> In a high update environment, such scenarios can
> >     happen quite
> >     >     >     often.
> >     >     >     >>
> >     >     >     >> The way the default mergePolicy works is that small
> >     >     segments get
> >     >     >     >> merged into the larger segments. Eventually, what
> >     will be
> >     >     >     invalidated
> >     >     >     >> would be a large segment, and when that happens, a
> >     large
> >     >     chunk
> >     >     >     of the
> >     >     >     >> field cache would be invalidated.
> >     >     >     >>
> >     >     >     >> Furthermore, in the case where there are high updates,
> >     >     the stable
> >     >     >     >> segments can be invalidate much sooner when there
> >     are deletes
> >     >     >     in those
> >     >     >     >> segments, and I would guess the corresponding
> >     FieldCache
> >     >     needs
> >     >     >     to be
> >     >     >     >> adjusted. Not sure how it is handled right now.
> >     >     >     >>
> >     >     >     >> Just my two cents, and of course when I find the
> >     time I will
> >     >     >     need to
> >     >     >     >> run some tests to see.
> >     >     >     >>
> >     >     >     >> -John
> >     >     >     >>
> >     >     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
> >     >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
> >     >     >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>> wrote:
> >     >     >     >>
> >     >     >     >>     The NRT reader coming from the
> >     >     IndexWriter.getReader() has only
> >     >     >     >>     changes in the currently processed segments, the
> >     >     other segments
> >     >     >     >>     keep stable (and even their IndexReader keys
> >     used for the
> >     >     >     >>     FieldCache). The rest of the segments keep stable.
> >     >     For the
> >     >     >     >>     consumer it looks like a normal reader (it is
> >     in fact a
> >     >     >     >>     ReadOnlyDirectoryReader) supporting
> >     >     >     getSequentialSubReaders() and
> >     >     >     >>     so on.
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>     -----
> >     >     >     >>     Uwe Schindler
> >     >     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >     >     >>     http://www.thetaphi.de
> >     >     >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
> >     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>
> >     >     >     >>
> >     >     >     >>
> >     >     >
> >     >
> >
> ------------------------------------------------------------------------
> >     >     >     >>
> >     >     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     >     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com> <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>>>
> >     >     >     >>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>]
> >     >     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >     >     >     >>     *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>>>
> >     >     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field
> >     cache
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>     Thanks Mark for the pointer!
> >     >     >     >>
> >     >     >     >>     I guess my point is with NRT, and when segment
> >     files
> >     >     change
> >     >     >     often,
> >     >     >     >>     this would be an issue, no?
> >     >     >     >>
> >     >     >     >>     Anyway, I can run some tests.
> >     >     >     >>
> >     >     >     >>     Thanks
> >     >     >     >>
> >     >     >     >>     -John
> >     >     >     >>
> >     >     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >     >     >     >>     <markrmiller@gmail.com
> >     <ma...@gmail.com> <mailto:markrmiller@gmail.com
> >     <ma...@gmail.com>>
> >     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> >     >     >     <mailto:markrmiller@gmail.com
> >     <ma...@gmail.com> <mailto:markrmiller@gmail.com
> >     <ma...@gmail.com>>
> >     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>>>
> >     wrote:
> >     >     >     >>
> >     >     >     >>     1483 - indexsearcher pulls out a readers
> subreaders
> >     >     >     >>     (segmentreaders) and sends a collector over
> >     them one
> >     >     by one,
> >     >     >     >>     rather than using the multireader. So only fc
> >     for seg
> >     >     >     readers that
> >     >     >     >>     change need to be reloaded.
> >     >     >     >>
> >     >     >     >>     - Mark
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>     http://www.lucidimagination.com (mobile)
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
> >     >     <john.wang@gmail.com <ma...@gmail.com>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     >     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com> <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>>>
> >     >     >     >>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
> >     >     >     wrote:
> >     >     >     >>
> >     >     >     >>>     Hi Yonik:
> >     >     >     >>>
> >     >     >     >>>          Actually that is what I am looking for.
> >     Can you
> >     >     >     please point
> >     >     >     >>>     me to where/how sorting is done per-segment?
> >     >     >     >>>
> >     >     >     >>>          When heaving indexing introduces or modifies
> >     >     >     segments, would
> >     >     >     >>>     it cause reloading of FieldCache at query time
> and
> >     >     thus would
> >     >     >     >>>     impact search performance?
> >     >     >     >>>
> >     >     >     >>>     thanks
> >     >     >     >>>
> >     >     >     >>>     -John
> >     >     >     >>>
> >     >     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >     >     >     >>>     <yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>
> >     >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>>
> >     >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>
> >     >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>>>>
> >     >     >     >>>     wrote:
> >     >     >     >>>
> >     >     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >     >     >     <john.wang@gmail.com <ma...@gmail.com>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
> >     >     >     >>>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
> >     >     >     wrote:
> >     >     >     >>>     > Looking at the code, seems there is a
> disconnect
> >     >     between
> >     >     >     >>>     how/when field
> >     >     >     >>>     > cache is loaded when IndexWriter.getReader() is
> >     >     called.
> >     >     >     >>>
> >     >     >     >>>     I'm not sure what you mean by "disconnect"
> >     >     >     >>>
> >     >     >     >>>     > Is FieldCache updated?
> >     >     >     >>>
> >     >     >     >>>     FieldCache entries are populated on demand, as
> >     they
> >     >     always
> >     >     >     have been.
> >     >     >     >>>
> >     >     >     >>>
> >     >     >     >>>     > Otherwise, are we reloading FieldCache for each
> >     >     >     >>>     > reader instance?
> >     >     >     >>>
> >     >     >     >>>     Searching/sorting is now per-segment, and so
> >     is the
> >     >     use of the
> >     >     >     >>>     FieldCache.  Segments that don't change shouldn't
> >     >     have to
> >     >     >     reload
> >     >     >     >>>     their
> >     >     >     >>>     FieldCache entries.
> >     >     >     >>>
> >     >     >     >>>     -Yonik
> >     >     >     >>>     http://www.lucidimagination.com
> >     >     >     >>>
> >     >     >     >>>
> >     >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     >     >>>     To unsubscribe, e-mail:
> >     >     >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>>>
> >     >     >     >>>     For additional commands, e-mail:
> >     >     >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     >>>     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>>>
> >     >     >     >>>
> >     >     >     >>>
> >     >     >     >>>
> >     >     >     >>
> >     >     >     >>
> >     >     >     >>
> >     >     >     >
> >     >     >     >
> >     >     >     > --
> >     >     >     > - Mark
> >     >     >     >
> >     >     >     > http://www.lucidimagination.com
> >     >     >     >
> >     >     >     >
> >     >     >     >
> >     >     >     >
> >     >     >     >
> >     >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     >     > To unsubscribe, e-mail:
> >     >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     > For additional commands, e-mail:
> >     >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     >
> >     >     >     >
> >     >     >
> >     >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     >     To unsubscribe, e-mail:
> >     >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >     For additional commands, e-mail:
> >     >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >
> >     >     >
> >     >
> >     >
> >     >     --
> >     >     - Mark
> >     >
> >     >     http://www.lucidimagination.com
> >     >
> >     >
> >     >
> >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >
> >     >
> >
> >
> >     --
> >     - Mark
> >
> >     http://www.lucidimagination.com
> >
> >
> >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

What I would do is:

In the warm method, load a FieldCache for every field I was going to end
up using a FieldCache for.
If its just for sorting, I might do a search with a sort on every field
I was going to sort on.
That will get the segment FieldCaches into RAM before the SegmentReader
is put into use.

I might also do a search or two that hits a lot of terms to get some of
the index into RAM. Or maybe walk a termenum- or anything one normally
does when warming Readers (like Solr does, or many other home grown
solutions.)

I don't think there is anything special in this case. You don't have to
hit it with every unique search you expect it to see - you just get some
key pieces (especially the FieldCaches) into RAM.

Don't give mike a hard time about his valuable time - I'm sure he would
have answered, but he's likely in bed (That cat wakes early it seems. ).
He's a lot nicer than I am ;)

John Wang wrote:
> No worries.
> Just trying to understand things.
>
> I wanted to double check but didn't want to write "My IDE told me that
> was the case" to sound pissy.
>
> I did look at the code, sometimes too much actually, but I never want
> to claim I understand the code 100%, hence going to the source is
> probably the best, even at the expense of sounding dumb, it is usually
> worthy it ;)
>
> My question is more on how would a person do it on the public API
> level without having to hack into the source code.
>
> My main misunderstanding at this point is that I had thought
> IndexReaderWarmer can directly warm the field cache deterministically.
>
> Thanks
>
> -John
>
> On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <markrmiller@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Don't take me too seriously John - I doubt anyone does :)
>
>     And I wasn't implying Mike's time was more valuable than yours. I was
>     being ... uh ... me :)
>
>     And I don't claim that all of your many questions could have been
>     found
>     in 5 seconds ;)
>
>     Just the ones you were asking - its very quick (at least with eclipse)
>     to see that there is no default impl.
>     Its also very quick to see that a segment reader is passed to the warm
>     method every time. I think its just
>     a generic IndexReader because you would warm a multi-reader the
>     same way
>     as a segmentreader.
>
>     I was just suggesting you look at the code a bit, because I think its
>     fairly easy to figure out the basics of the warmer (hey, if I can
>     do it
>     ;) ).
>
>     Again, don't take me too seriously. I send out my comments faster
>     than I
>     can think of them. And I've probably wasted more of Mike's time
>     than anyone.
>
>     The only way you will load the entire FieldCache is to use a top level
>     Reader outside of the core API - the core api works per segment
>     now. And
>     the IndexReaderWarmer is always passed a segmentreader from the
>     readerPool.
>
>     - Mark
>
>     John Wang wrote:
>     > Mark:
>     >
>     > I did spend at least a quarter of an ounce. :) And I am sure Mike's
>     > time is more valuable than mine, but it was meant to be a
>     "double-check"
>     >
>     > I was under the impression there is a default impl from previous
>     email
>     > threads on how to handle field cache warming, perhaps I
>     misunderstood.
>     >
>     > The real question here is "warms the reader" From a public API point
>     > of view, I wasn't sure if passing in a IndexReader impl is something
>     > we can do to avoid loading the entire field cache. e.g. would I need
>     > to down cast? can it be a filtered reader? etc.
>     >
>     > If you think there is something I could have done witin 5 sec,
>     please
>     > point me to the right direction.
>     >
>     > Thanks
>     >
>     > -John
>     >
>     > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller
>     <markrmiller@gmail.com <ma...@gmail.com>
>     > <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>     wrote:
>     >
>     >     Come on dude :) Spend a half ounce of effort first. Mike's
>     time is too
>     >     valuable !
>     >
>     >     Luckily mine is not.
>     >
>     >     There is no default impl - the class is dead simple (and the
>     class has
>     >     been pointed out like 3 times in this thread - I'm not even
>     fully
>     >     following and I know where to find it):
>     >
>     >      public static abstract class IndexReaderWarmer {
>     >        public abstract void warm(IndexReader reader) throws
>     IOException;
>     >      }
>     >
>     >     Now pass something in that warms the reader. Load a
>     fieldcache - do a
>     >     search. Do the hokey pokey and turn your self around ...
>     >
>     >     Investigation time: 5 seconds.
>     >
>     >     John Wang wrote:
>     >     > Hi Michael:
>     >     >
>     >     >      Thanks for the pointer!
>     >     >
>     >     >       Pardon my ignorance, but I am still no seeing the
>     connection
>     >     > between this api to per/segment loading of FieldCache.
>     (the api
>     >     takes
>     >     > in an IndexReader instead of maybe SegmentReader[])
>     >     >
>     >     >       Can you point me to maybe the default impl of
>     >     IndexReaderWarmer
>     >     > to help me understand?
>     >     >
>     >     > Thanks
>     >     >
>     >     > -John
>     >     >
>     >     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>     >     > <lucene@mikemccandless.com
>     <ma...@mikemccandless.com>
>     <mailto:lucene@mikemccandless.com <ma...@mikemccandless.com>>
>     >     <mailto:lucene@mikemccandless.com
>     <ma...@mikemccandless.com>
>     >     <mailto:lucene@mikemccandless.com
>     <ma...@mikemccandless.com>>>> wrote:
>     >     >
>     >     >     This is exactly why we added
>     >     IndexWriter.setMergedSegmentWarmer -- you
>     >     >     can warm the reader w/o blocking ongoing updates.
>     >     >
>     >     >     Mike
>     >     >
>     >     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>     >     >     <markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>
>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>> wrote:
>     >     >     > Right - when a large segment is invalidated, you
>     will have
>     >     a bigger
>     >     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
>     >     the *whole*
>     >     >     > field cache every time though. Sounds like you are
>     trying to
>     >     >     deal with
>     >     >     > those large segments changing anyway :) They are
>     always an
>     >     issue
>     >     >     when
>     >     >     > doing RT it seems.
>     >     >     >
>     >     >     > I don't believe deletes invalidate a field cache -
>     terms from
>     >     >     deleted
>     >     >     > docs stay in a field cache and segmentreaders use their
>     >     >     freqStream as
>     >     >     > the fieldcache key. Only when the deletes are merged out
>     >     would they
>     >     >     > invalidate - but because your writing a new segment
>     anyway ...
>     >     >     >
>     >     >     > - Mark
>     >     >     >
>     >     >     > John Wang wrote:
>     >     >     >> I understand what you are saying. Let me detail
>     what I am
>     >     >     trying to say:
>     >     >     >>
>     >     >     >> When "currently processed segments" are flushed down,
>     >     merge may
>     >     >     >> happen. When merges happen, some of those "stable
>     >     segments" will be
>     >     >     >> invalidated, and so will the fieldcache data keyed
>     by them.
>     >     >     >>
>     >     >     >> In a high update environment, such scenarios can
>     happen quite
>     >     >     often.
>     >     >     >>
>     >     >     >> The way the default mergePolicy works is that small
>     >     segments get
>     >     >     >> merged into the larger segments. Eventually, what
>     will be
>     >     >     invalidated
>     >     >     >> would be a large segment, and when that happens, a
>     large
>     >     chunk
>     >     >     of the
>     >     >     >> field cache would be invalidated.
>     >     >     >>
>     >     >     >> Furthermore, in the case where there are high updates,
>     >     the stable
>     >     >     >> segments can be invalidate much sooner when there
>     are deletes
>     >     >     in those
>     >     >     >> segments, and I would guess the corresponding
>     FieldCache
>     >     needs
>     >     >     to be
>     >     >     >> adjusted. Not sure how it is handled right now.
>     >     >     >>
>     >     >     >> Just my two cents, and of course when I find the
>     time I will
>     >     >     need to
>     >     >     >> run some tests to see.
>     >     >     >>
>     >     >     >> -John
>     >     >     >>
>     >     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
>     >     <uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
>     >     >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>> wrote:
>     >     >     >>
>     >     >     >>     The NRT reader coming from the
>     >     IndexWriter.getReader() has only
>     >     >     >>     changes in the currently processed segments, the
>     >     other segments
>     >     >     >>     keep stable (and even their IndexReader keys
>     used for the
>     >     >     >>     FieldCache). The rest of the segments keep stable.
>     >     For the
>     >     >     >>     consumer it looks like a normal reader (it is
>     in fact a
>     >     >     >>     ReadOnlyDirectoryReader) supporting
>     >     >     getSequentialSubReaders() and
>     >     >     >>     so on.
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>     -----
>     >     >     >>     Uwe Schindler
>     >     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     >     >     >>     http://www.thetaphi.de
>     >     >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
>     >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>>
>     >     >     >>
>     >     >     >>
>     >     >
>     >    
>     ------------------------------------------------------------------------
>     >     >     >>
>     >     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     >     <mailto:john.wang@gmail.com
>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>     <ma...@gmail.com>>>
>     >     >     >>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>]
>     >     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     >     >     >>     *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>>>
>     >     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field
>     cache
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>     Thanks Mark for the pointer!
>     >     >     >>
>     >     >     >>     I guess my point is with NRT, and when segment
>     files
>     >     change
>     >     >     often,
>     >     >     >>     this would be an issue, no?
>     >     >     >>
>     >     >     >>     Anyway, I can run some tests.
>     >     >     >>
>     >     >     >>     Thanks
>     >     >     >>
>     >     >     >>     -John
>     >     >     >>
>     >     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     >     >     >>     <markrmiller@gmail.com
>     <ma...@gmail.com> <mailto:markrmiller@gmail.com
>     <ma...@gmail.com>>
>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>     >     >     <mailto:markrmiller@gmail.com
>     <ma...@gmail.com> <mailto:markrmiller@gmail.com
>     <ma...@gmail.com>>
>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>>>
>     wrote:
>     >     >     >>
>     >     >     >>     1483 - indexsearcher pulls out a readers subreaders
>     >     >     >>     (segmentreaders) and sends a collector over
>     them one
>     >     by one,
>     >     >     >>     rather than using the multireader. So only fc
>     for seg
>     >     >     readers that
>     >     >     >>     change need to be reloaded.
>     >     >     >>
>     >     >     >>     - Mark
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>     http://www.lucidimagination.com (mobile)
>     >     >     >>
>     >     >     >>
>     >     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
>     >     <john.wang@gmail.com <ma...@gmail.com>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     >     <mailto:john.wang@gmail.com
>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>     <ma...@gmail.com>>>
>     >     >     >>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
>     >     >     wrote:
>     >     >     >>
>     >     >     >>>     Hi Yonik:
>     >     >     >>>
>     >     >     >>>          Actually that is what I am looking for.
>     Can you
>     >     >     please point
>     >     >     >>>     me to where/how sorting is done per-segment?
>     >     >     >>>
>     >     >     >>>          When heaving indexing introduces or modifies
>     >     >     segments, would
>     >     >     >>>     it cause reloading of FieldCache at query time and
>     >     thus would
>     >     >     >>>     impact search performance?
>     >     >     >>>
>     >     >     >>>     thanks
>     >     >     >>>
>     >     >     >>>     -John
>     >     >     >>>
>     >     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>     >     >     >>>     <yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>
>     >     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>>
>     >     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>
>     >     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>>>>
>     >     >     >>>     wrote:
>     >     >     >>>
>     >     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>     >     >     <john.wang@gmail.com <ma...@gmail.com>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>     >     >     >>>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>>>>
>     >     >     wrote:
>     >     >     >>>     > Looking at the code, seems there is a disconnect
>     >     between
>     >     >     >>>     how/when field
>     >     >     >>>     > cache is loaded when IndexWriter.getReader() is
>     >     called.
>     >     >     >>>
>     >     >     >>>     I'm not sure what you mean by "disconnect"
>     >     >     >>>
>     >     >     >>>     > Is FieldCache updated?
>     >     >     >>>
>     >     >     >>>     FieldCache entries are populated on demand, as
>     they
>     >     always
>     >     >     have been.
>     >     >     >>>
>     >     >     >>>
>     >     >     >>>     > Otherwise, are we reloading FieldCache for each
>     >     >     >>>     > reader instance?
>     >     >     >>>
>     >     >     >>>     Searching/sorting is now per-segment, and so
>     is the
>     >     use of the
>     >     >     >>>     FieldCache.  Segments that don't change shouldn't
>     >     have to
>     >     >     reload
>     >     >     >>>     their
>     >     >     >>>     FieldCache entries.
>     >     >     >>>
>     >     >     >>>     -Yonik
>     >     >     >>>     http://www.lucidimagination.com
>     >     >     >>>
>     >     >     >>>
>     >     >
>     >    
>     ---------------------------------------------------------------------
>     >     >     >>>     To unsubscribe, e-mail:
>     >     >     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>>>
>     >     >     >>>     For additional commands, e-mail:
>     >     >     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     >>>     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>>>
>     >     >     >>>
>     >     >     >>>
>     >     >     >>>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >
>     >     >     >
>     >     >     > --
>     >     >     > - Mark
>     >     >     >
>     >     >     > http://www.lucidimagination.com
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >
>     >    
>     ---------------------------------------------------------------------
>     >     >     > To unsubscribe, e-mail:
>     >     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     > For additional commands, e-mail:
>     >     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     >
>     >     >     >
>     >     >
>     >     >
>     >    
>     ---------------------------------------------------------------------
>     >     >     To unsubscribe, e-mail:
>     >     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >     For additional commands, e-mail:
>     >     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >
>     >     >
>     >
>     >
>     >     --
>     >     - Mark
>     >
>     >     http://www.lucidimagination.com
>     >
>     >
>     >
>     >
>     >    
>     ---------------------------------------------------------------------
>     >     To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >
>     >
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

No worries.
Just trying to understand things.

I wanted to double check but didn't want to write "My IDE told me that was
the case" to sound pissy.

I did look at the code, sometimes too much actually, but I never want to
claim I understand the code 100%, hence going to the source is probably the
best, even at the expense of sounding dumb, it is usually worthy it ;)

My question is more on how would a person do it on the public API level
without having to hack into the source code.

My main misunderstanding at this point is that I had thought
IndexReaderWarmer can directly warm the field cache deterministically.

Thanks

-John

On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <ma...@gmail.com> wrote:

> Don't take me too seriously John - I doubt anyone does :)
>
> And I wasn't implying Mike's time was more valuable than yours. I was
> being ... uh ... me :)
>
> And I don't claim that all of your many questions could have been found
> in 5 seconds ;)
>
> Just the ones you were asking - its very quick (at least with eclipse)
> to see that there is no default impl.
> Its also very quick to see that a segment reader is passed to the warm
> method every time. I think its just
> a generic IndexReader because you would warm a multi-reader the same way
> as a segmentreader.
>
> I was just suggesting you look at the code a bit, because I think its
> fairly easy to figure out the basics of the warmer (hey, if I can do it
> ;) ).
>
> Again, don't take me too seriously. I send out my comments faster than I
> can think of them. And I've probably wasted more of Mike's time than
> anyone.
>
> The only way you will load the entire FieldCache is to use a top level
> Reader outside of the core API - the core api works per segment now. And
> the IndexReaderWarmer is always passed a segmentreader from the readerPool.
>
> - Mark
>
> John Wang wrote:
> > Mark:
> >
> > I did spend at least a quarter of an ounce. :) And I am sure Mike's
> > time is more valuable than mine, but it was meant to be a "double-check"
> >
> > I was under the impression there is a default impl from previous email
> > threads on how to handle field cache warming, perhaps I misunderstood.
> >
> > The real question here is "warms the reader" From a public API point
> > of view, I wasn't sure if passing in a IndexReader impl is something
> > we can do to avoid loading the entire field cache. e.g. would I need
> > to down cast? can it be a filtered reader? etc.
> >
> > If you think there is something I could have done witin 5 sec, please
> > point me to the right direction.
> >
> > Thanks
> >
> > -John
> >
> > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller <markrmiller@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Come on dude :) Spend a half ounce of effort first. Mike's time is
> too
> >     valuable !
> >
> >     Luckily mine is not.
> >
> >     There is no default impl - the class is dead simple (and the class
> has
> >     been pointed out like 3 times in this thread - I'm not even fully
> >     following and I know where to find it):
> >
> >      public static abstract class IndexReaderWarmer {
> >        public abstract void warm(IndexReader reader) throws IOException;
> >      }
> >
> >     Now pass something in that warms the reader. Load a fieldcache - do a
> >     search. Do the hokey pokey and turn your self around ...
> >
> >     Investigation time: 5 seconds.
> >
> >     John Wang wrote:
> >     > Hi Michael:
> >     >
> >     >      Thanks for the pointer!
> >     >
> >     >       Pardon my ignorance, but I am still no seeing the connection
> >     > between this api to per/segment loading of FieldCache. (the api
> >     takes
> >     > in an IndexReader instead of maybe SegmentReader[])
> >     >
> >     >       Can you point me to maybe the default impl of
> >     IndexReaderWarmer
> >     > to help me understand?
> >     >
> >     > Thanks
> >     >
> >     > -John
> >     >
> >     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> >     > <lucene@mikemccandless.com <ma...@mikemccandless.com>
> >     <mailto:lucene@mikemccandless.com
> >     <ma...@mikemccandless.com>>> wrote:
> >     >
> >     >     This is exactly why we added
> >     IndexWriter.setMergedSegmentWarmer -- you
> >     >     can warm the reader w/o blocking ongoing updates.
> >     >
> >     >     Mike
> >     >
> >     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >     >     <markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> wrote:
> >     >     > Right - when a large segment is invalidated, you will have
> >     a bigger
> >     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
> >     the *whole*
> >     >     > field cache every time though. Sounds like you are trying to
> >     >     deal with
> >     >     > those large segments changing anyway :) They are always an
> >     issue
> >     >     when
> >     >     > doing RT it seems.
> >     >     >
> >     >     > I don't believe deletes invalidate a field cache - terms from
> >     >     deleted
> >     >     > docs stay in a field cache and segmentreaders use their
> >     >     freqStream as
> >     >     > the fieldcache key. Only when the deletes are merged out
> >     would they
> >     >     > invalidate - but because your writing a new segment anyway
> ...
> >     >     >
> >     >     > - Mark
> >     >     >
> >     >     > John Wang wrote:
> >     >     >> I understand what you are saying. Let me detail what I am
> >     >     trying to say:
> >     >     >>
> >     >     >> When "currently processed segments" are flushed down,
> >     merge may
> >     >     >> happen. When merges happen, some of those "stable
> >     segments" will be
> >     >     >> invalidated, and so will the fieldcache data keyed by them.
> >     >     >>
> >     >     >> In a high update environment, such scenarios can happen
> quite
> >     >     often.
> >     >     >>
> >     >     >> The way the default mergePolicy works is that small
> >     segments get
> >     >     >> merged into the larger segments. Eventually, what will be
> >     >     invalidated
> >     >     >> would be a large segment, and when that happens, a large
> >     chunk
> >     >     of the
> >     >     >> field cache would be invalidated.
> >     >     >>
> >     >     >> Furthermore, in the case where there are high updates,
> >     the stable
> >     >     >> segments can be invalidate much sooner when there are
> deletes
> >     >     in those
> >     >     >> segments, and I would guess the corresponding FieldCache
> >     needs
> >     >     to be
> >     >     >> adjusted. Not sure how it is handled right now.
> >     >     >>
> >     >     >> Just my two cents, and of course when I find the time I will
> >     >     need to
> >     >     >> run some tests to see.
> >     >     >>
> >     >     >> -John
> >     >     >>
> >     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>> wrote:
> >     >     >>
> >     >     >>     The NRT reader coming from the
> >     IndexWriter.getReader() has only
> >     >     >>     changes in the currently processed segments, the
> >     other segments
> >     >     >>     keep stable (and even their IndexReader keys used for
> the
> >     >     >>     FieldCache). The rest of the segments keep stable.
> >     For the
> >     >     >>     consumer it looks like a normal reader (it is in fact a
> >     >     >>     ReadOnlyDirectoryReader) supporting
> >     >     getSequentialSubReaders() and
> >     >     >>     so on.
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     -----
> >     >     >>     Uwe Schindler
> >     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >     >>     http://www.thetaphi.de
> >     >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
> >     >     >>
> >     >     >>
> >     >
> >
> ------------------------------------------------------------------------
> >     >     >>
> >     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     >>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com> <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>>>]
> >     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >     >     >>     *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     Thanks Mark for the pointer!
> >     >     >>
> >     >     >>     I guess my point is with NRT, and when segment files
> >     change
> >     >     often,
> >     >     >>     this would be an issue, no?
> >     >     >>
> >     >     >>     Anyway, I can run some tests.
> >     >     >>
> >     >     >>     Thanks
> >     >     >>
> >     >     >>     -John
> >     >     >>
> >     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >     >     >>     <markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>
> >     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>>
> wrote:
> >     >     >>
> >     >     >>     1483 - indexsearcher pulls out a readers subreaders
> >     >     >>     (segmentreaders) and sends a collector over them one
> >     by one,
> >     >     >>     rather than using the multireader. So only fc for seg
> >     >     readers that
> >     >     >>     change need to be reloaded.
> >     >     >>
> >     >     >>     - Mark
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     http://www.lucidimagination.com (mobile)
> >     >     >>
> >     >     >>
> >     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
> >     <john.wang@gmail.com <ma...@gmail.com>
> >     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     >>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com> <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>>>>
> >     >     wrote:
> >     >     >>
> >     >     >>>     Hi Yonik:
> >     >     >>>
> >     >     >>>          Actually that is what I am looking for. Can you
> >     >     please point
> >     >     >>>     me to where/how sorting is done per-segment?
> >     >     >>>
> >     >     >>>          When heaving indexing introduces or modifies
> >     >     segments, would
> >     >     >>>     it cause reloading of FieldCache at query time and
> >     thus would
> >     >     >>>     impact search performance?
> >     >     >>>
> >     >     >>>     thanks
> >     >     >>>
> >     >     >>>     -John
> >     >     >>>
> >     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >     >     >>>     <yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>>>
> >     >     >>>     wrote:
> >     >     >>>
> >     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >     >     <john.wang@gmail.com <ma...@gmail.com>
> >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
> >     >     >>>     <mailto:john.wang@gmail.com
> >     <ma...@gmail.com> <mailto:john.wang@gmail.com
> >     <ma...@gmail.com>>>>
> >     >     wrote:
> >     >     >>>     > Looking at the code, seems there is a disconnect
> >     between
> >     >     >>>     how/when field
> >     >     >>>     > cache is loaded when IndexWriter.getReader() is
> >     called.
> >     >     >>>
> >     >     >>>     I'm not sure what you mean by "disconnect"
> >     >     >>>
> >     >     >>>     > Is FieldCache updated?
> >     >     >>>
> >     >     >>>     FieldCache entries are populated on demand, as they
> >     always
> >     >     have been.
> >     >     >>>
> >     >     >>>
> >     >     >>>     > Otherwise, are we reloading FieldCache for each
> >     >     >>>     > reader instance?
> >     >     >>>
> >     >     >>>     Searching/sorting is now per-segment, and so is the
> >     use of the
> >     >     >>>     FieldCache.  Segments that don't change shouldn't
> >     have to
> >     >     reload
> >     >     >>>     their
> >     >     >>>     FieldCache entries.
> >     >     >>>
> >     >     >>>     -Yonik
> >     >     >>>     http://www.lucidimagination.com
> >     >     >>>
> >     >     >>>
> >     >
> >     ---------------------------------------------------------------------
> >     >     >>>     To unsubscribe, e-mail:
> >     >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >>>     For additional commands, e-mail:
> >     >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >>>     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>>
> >     >     >>>
> >     >     >>>
> >     >     >>>
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >
> >     >     >
> >     >     > --
> >     >     > - Mark
> >     >     >
> >     >     > http://www.lucidimagination.com
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     > To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     > For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     >
> >     >     >
> >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >     For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >
> >     >
> >
> >
> >     --
> >     - Mark
> >
> >     http://www.lucidimagination.com
> >
> >
> >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

Don't take me too seriously John - I doubt anyone does :)

And I wasn't implying Mike's time was more valuable than yours. I was
being ... uh ... me :)

And I don't claim that all of your many questions could have been found
in 5 seconds ;)

Just the ones you were asking - its very quick (at least with eclipse)
to see that there is no default impl.
Its also very quick to see that a segment reader is passed to the warm
method every time. I think its just
a generic IndexReader because you would warm a multi-reader the same way
as a segmentreader.

I was just suggesting you look at the code a bit, because I think its
fairly easy to figure out the basics of the warmer (hey, if I can do it
;) ).

Again, don't take me too seriously. I send out my comments faster than I
can think of them. And I've probably wasted more of Mike's time than anyone.

The only way you will load the entire FieldCache is to use a top level
Reader outside of the core API - the core api works per segment now. And
the IndexReaderWarmer is always passed a segmentreader from the readerPool.

- Mark

John Wang wrote:
> Mark:
>
> I did spend at least a quarter of an ounce. :) And I am sure Mike's
> time is more valuable than mine, but it was meant to be a "double-check"
>
> I was under the impression there is a default impl from previous email
> threads on how to handle field cache warming, perhaps I misunderstood.
>
> The real question here is "warms the reader" From a public API point
> of view, I wasn't sure if passing in a IndexReader impl is something
> we can do to avoid loading the entire field cache. e.g. would I need
> to down cast? can it be a filtered reader? etc.
>
> If you think there is something I could have done witin 5 sec, please
> point me to the right direction.
>
> Thanks
>
> -John
>
> On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller <markrmiller@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Come on dude :) Spend a half ounce of effort first. Mike's time is too
>     valuable !
>
>     Luckily mine is not.
>
>     There is no default impl - the class is dead simple (and the class has
>     been pointed out like 3 times in this thread - I'm not even fully
>     following and I know where to find it):
>
>      public static abstract class IndexReaderWarmer {
>        public abstract void warm(IndexReader reader) throws IOException;
>      }
>
>     Now pass something in that warms the reader. Load a fieldcache - do a
>     search. Do the hokey pokey and turn your self around ...
>
>     Investigation time: 5 seconds.
>
>     John Wang wrote:
>     > Hi Michael:
>     >
>     >      Thanks for the pointer!
>     >
>     >       Pardon my ignorance, but I am still no seeing the connection
>     > between this api to per/segment loading of FieldCache. (the api
>     takes
>     > in an IndexReader instead of maybe SegmentReader[])
>     >
>     >       Can you point me to maybe the default impl of
>     IndexReaderWarmer
>     > to help me understand?
>     >
>     > Thanks
>     >
>     > -John
>     >
>     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>     > <lucene@mikemccandless.com <ma...@mikemccandless.com>
>     <mailto:lucene@mikemccandless.com
>     <ma...@mikemccandless.com>>> wrote:
>     >
>     >     This is exactly why we added
>     IndexWriter.setMergedSegmentWarmer -- you
>     >     can warm the reader w/o blocking ongoing updates.
>     >
>     >     Mike
>     >
>     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>     >     <markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>> wrote:
>     >     > Right - when a large segment is invalidated, you will have
>     a bigger
>     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
>     the *whole*
>     >     > field cache every time though. Sounds like you are trying to
>     >     deal with
>     >     > those large segments changing anyway :) They are always an
>     issue
>     >     when
>     >     > doing RT it seems.
>     >     >
>     >     > I don't believe deletes invalidate a field cache - terms from
>     >     deleted
>     >     > docs stay in a field cache and segmentreaders use their
>     >     freqStream as
>     >     > the fieldcache key. Only when the deletes are merged out
>     would they
>     >     > invalidate - but because your writing a new segment anyway ...
>     >     >
>     >     > - Mark
>     >     >
>     >     > John Wang wrote:
>     >     >> I understand what you are saying. Let me detail what I am
>     >     trying to say:
>     >     >>
>     >     >> When "currently processed segments" are flushed down,
>     merge may
>     >     >> happen. When merges happen, some of those "stable
>     segments" will be
>     >     >> invalidated, and so will the fieldcache data keyed by them.
>     >     >>
>     >     >> In a high update environment, such scenarios can happen quite
>     >     often.
>     >     >>
>     >     >> The way the default mergePolicy works is that small
>     segments get
>     >     >> merged into the larger segments. Eventually, what will be
>     >     invalidated
>     >     >> would be a large segment, and when that happens, a large
>     chunk
>     >     of the
>     >     >> field cache would be invalidated.
>     >     >>
>     >     >> Furthermore, in the case where there are high updates,
>     the stable
>     >     >> segments can be invalidate much sooner when there are deletes
>     >     in those
>     >     >> segments, and I would guess the corresponding FieldCache
>     needs
>     >     to be
>     >     >> adjusted. Not sure how it is handled right now.
>     >     >>
>     >     >> Just my two cents, and of course when I find the time I will
>     >     need to
>     >     >> run some tests to see.
>     >     >>
>     >     >> -John
>     >     >>
>     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>
>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>> wrote:
>     >     >>
>     >     >>     The NRT reader coming from the
>     IndexWriter.getReader() has only
>     >     >>     changes in the currently processed segments, the
>     other segments
>     >     >>     keep stable (and even their IndexReader keys used for the
>     >     >>     FieldCache). The rest of the segments keep stable.
>     For the
>     >     >>     consumer it looks like a normal reader (it is in fact a
>     >     >>     ReadOnlyDirectoryReader) supporting
>     >     getSequentialSubReaders() and
>     >     >>     so on.
>     >     >>
>     >     >>
>     >     >>
>     >     >>     -----
>     >     >>     Uwe Schindler
>     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     >     >>     http://www.thetaphi.de
>     >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>>
>     >     >>
>     >     >>
>     >    
>     ------------------------------------------------------------------------
>     >     >>
>     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     >>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>     <ma...@gmail.com>>>]
>     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     >     >>     *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>     >     >>
>     >     >>
>     >     >>
>     >     >>     Thanks Mark for the pointer!
>     >     >>
>     >     >>     I guess my point is with NRT, and when segment files
>     change
>     >     often,
>     >     >>     this would be an issue, no?
>     >     >>
>     >     >>     Anyway, I can run some tests.
>     >     >>
>     >     >>     Thanks
>     >     >>
>     >     >>     -John
>     >     >>
>     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     >     >>     <markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>
>     >     <mailto:markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>> wrote:
>     >     >>
>     >     >>     1483 - indexsearcher pulls out a readers subreaders
>     >     >>     (segmentreaders) and sends a collector over them one
>     by one,
>     >     >>     rather than using the multireader. So only fc for seg
>     >     readers that
>     >     >>     change need to be reloaded.
>     >     >>
>     >     >>     - Mark
>     >     >>
>     >     >>
>     >     >>
>     >     >>     http://www.lucidimagination.com (mobile)
>     >     >>
>     >     >>
>     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
>     <john.wang@gmail.com <ma...@gmail.com>
>     >     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     >>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>     <ma...@gmail.com>>>>
>     >     wrote:
>     >     >>
>     >     >>>     Hi Yonik:
>     >     >>>
>     >     >>>          Actually that is what I am looking for. Can you
>     >     please point
>     >     >>>     me to where/how sorting is done per-segment?
>     >     >>>
>     >     >>>          When heaving indexing introduces or modifies
>     >     segments, would
>     >     >>>     it cause reloading of FieldCache at query time and
>     thus would
>     >     >>>     impact search performance?
>     >     >>>
>     >     >>>     thanks
>     >     >>>
>     >     >>>     -John
>     >     >>>
>     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>     >     >>>     <yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>>>
>     >     >>>     wrote:
>     >     >>>
>     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>     >     <john.wang@gmail.com <ma...@gmail.com>
>     <mailto:john.wang@gmail.com <ma...@gmail.com>>
>     >     >>>     <mailto:john.wang@gmail.com
>     <ma...@gmail.com> <mailto:john.wang@gmail.com
>     <ma...@gmail.com>>>>
>     >     wrote:
>     >     >>>     > Looking at the code, seems there is a disconnect
>     between
>     >     >>>     how/when field
>     >     >>>     > cache is loaded when IndexWriter.getReader() is
>     called.
>     >     >>>
>     >     >>>     I'm not sure what you mean by "disconnect"
>     >     >>>
>     >     >>>     > Is FieldCache updated?
>     >     >>>
>     >     >>>     FieldCache entries are populated on demand, as they
>     always
>     >     have been.
>     >     >>>
>     >     >>>
>     >     >>>     > Otherwise, are we reloading FieldCache for each
>     >     >>>     > reader instance?
>     >     >>>
>     >     >>>     Searching/sorting is now per-segment, and so is the
>     use of the
>     >     >>>     FieldCache.  Segments that don't change shouldn't
>     have to
>     >     reload
>     >     >>>     their
>     >     >>>     FieldCache entries.
>     >     >>>
>     >     >>>     -Yonik
>     >     >>>     http://www.lucidimagination.com
>     >     >>>
>     >     >>>
>     >    
>     ---------------------------------------------------------------------
>     >     >>>     To unsubscribe, e-mail:
>     >     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >>>     For additional commands, e-mail:
>     >     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >>>     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>>
>     >     >>>
>     >     >>>
>     >     >>>
>     >     >>
>     >     >>
>     >     >>
>     >     >
>     >     >
>     >     > --
>     >     > - Mark
>     >     >
>     >     > http://www.lucidimagination.com
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >    
>     ---------------------------------------------------------------------
>     >     > To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     > For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     >
>     >     >
>     >
>     >    
>     ---------------------------------------------------------------------
>     >     To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >     For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >
>     >
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Mark:

I did spend at least a quarter of an ounce. :) And I am sure Mike's time is
more valuable than mine, but it was meant to be a "double-check"

I was under the impression there is a default impl from previous email
threads on how to handle field cache warming, perhaps I misunderstood.

The real question here is "warms the reader" From a public API point of
view, I wasn't sure if passing in a IndexReader impl is something we can do
to avoid loading the entire field cache. e.g. would I need to down cast? can
it be a filtered reader? etc.

If you think there is something I could have done witin 5 sec, please point
me to the right direction.

Thanks

-John

On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller <ma...@gmail.com> wrote:

> Come on dude :) Spend a half ounce of effort first. Mike's time is too
> valuable !
>
> Luckily mine is not.
>
> There is no default impl - the class is dead simple (and the class has
> been pointed out like 3 times in this thread - I'm not even fully
> following and I know where to find it):
>
>  public static abstract class IndexReaderWarmer {
>    public abstract void warm(IndexReader reader) throws IOException;
>  }
>
> Now pass something in that warms the reader. Load a fieldcache - do a
> search. Do the hokey pokey and turn your self around ...
>
> Investigation time: 5 seconds.
>
> John Wang wrote:
> > Hi Michael:
> >
> >      Thanks for the pointer!
> >
> >       Pardon my ignorance, but I am still no seeing the connection
> > between this api to per/segment loading of FieldCache. (the api takes
> > in an IndexReader instead of maybe SegmentReader[])
> >
> >       Can you point me to maybe the default impl of IndexReaderWarmer
> > to help me understand?
> >
> > Thanks
> >
> > -John
> >
> > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> > <lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
> >
> >     This is exactly why we added IndexWriter.setMergedSegmentWarmer --
> you
> >     can warm the reader w/o blocking ongoing updates.
> >
> >     Mike
> >
> >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
> >     > Right - when a large segment is invalidated, you will have a bigger
> >     > fieldcache piece to reload - pre 2.9, you'd be reloading the
> *whole*
> >     > field cache every time though. Sounds like you are trying to
> >     deal with
> >     > those large segments changing anyway :) They are always an issue
> >     when
> >     > doing RT it seems.
> >     >
> >     > I don't believe deletes invalidate a field cache - terms from
> >     deleted
> >     > docs stay in a field cache and segmentreaders use their
> >     freqStream as
> >     > the fieldcache key. Only when the deletes are merged out would they
> >     > invalidate - but because your writing a new segment anyway ...
> >     >
> >     > - Mark
> >     >
> >     > John Wang wrote:
> >     >> I understand what you are saying. Let me detail what I am
> >     trying to say:
> >     >>
> >     >> When "currently processed segments" are flushed down, merge may
> >     >> happen. When merges happen, some of those "stable segments" will
> be
> >     >> invalidated, and so will the fieldcache data keyed by them.
> >     >>
> >     >> In a high update environment, such scenarios can happen quite
> >     often.
> >     >>
> >     >> The way the default mergePolicy works is that small segments get
> >     >> merged into the larger segments. Eventually, what will be
> >     invalidated
> >     >> would be a large segment, and when that happens, a large chunk
> >     of the
> >     >> field cache would be invalidated.
> >     >>
> >     >> Furthermore, in the case where there are high updates, the stable
> >     >> segments can be invalidate much sooner when there are deletes
> >     in those
> >     >> segments, and I would guess the corresponding FieldCache needs
> >     to be
> >     >> adjusted. Not sure how it is handled right now.
> >     >>
> >     >> Just my two cents, and of course when I find the time I will
> >     need to
> >     >> run some tests to see.
> >     >>
> >     >> -John
> >     >>
> >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> >     <ma...@thetaphi.de>
> >     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >     >>
> >     >>     The NRT reader coming from the IndexWriter.getReader() has
> only
> >     >>     changes in the currently processed segments, the other
> segments
> >     >>     keep stable (and even their IndexReader keys used for the
> >     >>     FieldCache). The rest of the segments keep stable. For the
> >     >>     consumer it looks like a normal reader (it is in fact a
> >     >>     ReadOnlyDirectoryReader) supporting
> >     getSequentialSubReaders() and
> >     >>     so on.
> >     >>
> >     >>
> >     >>
> >     >>     -----
> >     >>     Uwe Schindler
> >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >>     http://www.thetaphi.de
> >     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     >>
> >     >>
> >
> ------------------------------------------------------------------------
> >     >>
> >     >>     *From:* John Wang [mailto:john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>]
> >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >     >>     *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >     >>
> >     >>
> >     >>
> >     >>     Thanks Mark for the pointer!
> >     >>
> >     >>     I guess my point is with NRT, and when segment files change
> >     often,
> >     >>     this would be an issue, no?
> >     >>
> >     >>     Anyway, I can run some tests.
> >     >>
> >     >>     Thanks
> >     >>
> >     >>     -John
> >     >>
> >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >     >>     <markrmiller@gmail.com <ma...@gmail.com>
> >     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> wrote:
> >     >>
> >     >>     1483 - indexsearcher pulls out a readers subreaders
> >     >>     (segmentreaders) and sends a collector over them one by one,
> >     >>     rather than using the multireader. So only fc for seg
> >     readers that
> >     >>     change need to be reloaded.
> >     >>
> >     >>     - Mark
> >     >>
> >     >>
> >     >>
> >     >>     http://www.lucidimagination.com (mobile)
> >     >>
> >     >>
> >     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
> >     <ma...@gmail.com>
> >     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >     >>
> >     >>>     Hi Yonik:
> >     >>>
> >     >>>          Actually that is what I am looking for. Can you
> >     please point
> >     >>>     me to where/how sorting is done per-segment?
> >     >>>
> >     >>>          When heaving indexing introduces or modifies
> >     segments, would
> >     >>>     it cause reloading of FieldCache at query time and thus would
> >     >>>     impact search performance?
> >     >>>
> >     >>>     thanks
> >     >>>
> >     >>>     -John
> >     >>>
> >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >     >>>     <yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>
> >     <mailto:yonik@lucidimagination.com
> >     <ma...@lucidimagination.com>>>
> >     >>>     wrote:
> >     >>>
> >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >     <john.wang@gmail.com <ma...@gmail.com>
> >     >>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >     >>>     > Looking at the code, seems there is a disconnect between
> >     >>>     how/when field
> >     >>>     > cache is loaded when IndexWriter.getReader() is called.
> >     >>>
> >     >>>     I'm not sure what you mean by "disconnect"
> >     >>>
> >     >>>     > Is FieldCache updated?
> >     >>>
> >     >>>     FieldCache entries are populated on demand, as they always
> >     have been.
> >     >>>
> >     >>>
> >     >>>     > Otherwise, are we reloading FieldCache for each
> >     >>>     > reader instance?
> >     >>>
> >     >>>     Searching/sorting is now per-segment, and so is the use of
> the
> >     >>>     FieldCache.  Segments that don't change shouldn't have to
> >     reload
> >     >>>     their
> >     >>>     FieldCache entries.
> >     >>>
> >     >>>     -Yonik
> >     >>>     http://www.lucidimagination.com
> >     >>>
> >     >>>
> >     ---------------------------------------------------------------------
> >     >>>     To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >>>     For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >>>     <mailto:java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     >>>
> >     >>>
> >     >>>
> >     >>
> >     >>
> >     >>
> >     >
> >     >
> >     > --
> >     > - Mark
> >     >
> >     > http://www.lucidimagination.com
> >     >
> >     >
> >     >
> >     >
> >     >
> >     ---------------------------------------------------------------------
> >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >
> >     >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Thanks Mike for your valuable time!

Sorry to be a pest, I am trying to write a fair perf test and to understand
the feature. If there are other experts on the subject of index reader
warming, please chime in.

I am not seeing the connection between given an IndexReader and the
FieldCacheImpl API, e.g. how to warm up the FieldCache for this particular
segment?

Are you suggesting to just do a IndexSearcher.search on the given index for
warming up within the IndexReaderWarmer impl? In which case the searcher
would need to know the incoming searches pretty well I guess.

Thanks

-John



On Wed, Sep 23, 2009 at 7:57 AM, Mark Miller <ma...@gmail.com> wrote:

> Oh - yeah - also - youll be passed a segment reader if thats what makes
> sense. And sense it does, you will be passed one every time. You can
> warm a multireader the same way though, so no reason to pin it down.
>
> Mark Miller wrote:
> > Come on dude :) Spend a half ounce of effort first. Mike's time is too
> > valuable !
> >
> > Luckily mine is not.
> >
> > There is no default impl - the class is dead simple (and the class has
> > been pointed out like 3 times in this thread - I'm not even fully
> > following and I know where to find it):
> >
> >   public static abstract class IndexReaderWarmer {
> >     public abstract void warm(IndexReader reader) throws IOException;
> >   }
> >
> > Now pass something in that warms the reader. Load a fieldcache - do a
> > search. Do the hokey pokey and turn your self around ...
> >
> > Investigation time: 5 seconds.
> >
> > John Wang wrote:
> >
> >> Hi Michael:
> >>
> >>      Thanks for the pointer!
> >>
> >>       Pardon my ignorance, but I am still no seeing the connection
> >> between this api to per/segment loading of FieldCache. (the api takes
> >> in an IndexReader instead of maybe SegmentReader[])
> >>
> >>       Can you point me to maybe the default impl of IndexReaderWarmer
> >> to help me understand?
> >>
> >> Thanks
> >>
> >> -John
> >>
> >> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> >> <lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
> >>
> >>     This is exactly why we added IndexWriter.setMergedSegmentWarmer --
> you
> >>     can warm the reader w/o blocking ongoing updates.
> >>
> >>     Mike
> >>
> >>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
> >>     > Right - when a large segment is invalidated, you will have a
> bigger
> >>     > fieldcache piece to reload - pre 2.9, you'd be reloading the
> *whole*
> >>     > field cache every time though. Sounds like you are trying to
> >>     deal with
> >>     > those large segments changing anyway :) They are always an issue
> >>     when
> >>     > doing RT it seems.
> >>     >
> >>     > I don't believe deletes invalidate a field cache - terms from
> >>     deleted
> >>     > docs stay in a field cache and segmentreaders use their
> >>     freqStream as
> >>     > the fieldcache key. Only when the deletes are merged out would
> they
> >>     > invalidate - but because your writing a new segment anyway ...
> >>     >
> >>     > - Mark
> >>     >
> >>     > John Wang wrote:
> >>     >> I understand what you are saying. Let me detail what I am
> >>     trying to say:
> >>     >>
> >>     >> When "currently processed segments" are flushed down, merge may
> >>     >> happen. When merges happen, some of those "stable segments" will
> be
> >>     >> invalidated, and so will the fieldcache data keyed by them.
> >>     >>
> >>     >> In a high update environment, such scenarios can happen quite
> >>     often.
> >>     >>
> >>     >> The way the default mergePolicy works is that small segments get
> >>     >> merged into the larger segments. Eventually, what will be
> >>     invalidated
> >>     >> would be a large segment, and when that happens, a large chunk
> >>     of the
> >>     >> field cache would be invalidated.
> >>     >>
> >>     >> Furthermore, in the case where there are high updates, the stable
> >>     >> segments can be invalidate much sooner when there are deletes
> >>     in those
> >>     >> segments, and I would guess the corresponding FieldCache needs
> >>     to be
> >>     >> adjusted. Not sure how it is handled right now.
> >>     >>
> >>     >> Just my two cents, and of course when I find the time I will
> >>     need to
> >>     >> run some tests to see.
> >>     >>
> >>     >> -John
> >>     >>
> >>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> >>     <ma...@thetaphi.de>
> >>     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >>     >>
> >>     >>     The NRT reader coming from the IndexWriter.getReader() has
> only
> >>     >>     changes in the currently processed segments, the other
> segments
> >>     >>     keep stable (and even their IndexReader keys used for the
> >>     >>     FieldCache). The rest of the segments keep stable. For the
> >>     >>     consumer it looks like a normal reader (it is in fact a
> >>     >>     ReadOnlyDirectoryReader) supporting
> >>     getSequentialSubReaders() and
> >>     >>     so on.
> >>     >>
> >>     >>
> >>     >>
> >>     >>     -----
> >>     >>     Uwe Schindler
> >>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >>     >>     http://www.thetaphi.de
> >>     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >>     >>
> >>     >>
> >>
> ------------------------------------------------------------------------
> >>     >>
> >>     >>     *From:* John Wang [mailto:john.wang@gmail.com
> >>     <ma...@gmail.com>
> >>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>]
> >>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >>     >>     *To:* java-dev@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     <mailto:java-dev@lucene.apache.org
> >>     <ma...@lucene.apache.org>>
> >>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >>     >>
> >>     >>
> >>     >>
> >>     >>     Thanks Mark for the pointer!
> >>     >>
> >>     >>     I guess my point is with NRT, and when segment files change
> >>     often,
> >>     >>     this would be an issue, no?
> >>     >>
> >>     >>     Anyway, I can run some tests.
> >>     >>
> >>     >>     Thanks
> >>     >>
> >>     >>     -John
> >>     >>
> >>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >>     >>     <markrmiller@gmail.com <ma...@gmail.com>
> >>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> wrote:
> >>     >>
> >>     >>     1483 - indexsearcher pulls out a readers subreaders
> >>     >>     (segmentreaders) and sends a collector over them one by one,
> >>     >>     rather than using the multireader. So only fc for seg
> >>     readers that
> >>     >>     change need to be reloaded.
> >>     >>
> >>     >>     - Mark
> >>     >>
> >>     >>
> >>     >>
> >>     >>     http://www.lucidimagination.com (mobile)
> >>     >>
> >>     >>
> >>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
> >>     <ma...@gmail.com>
> >>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
> >>     wrote:
> >>     >>
> >>     >>>     Hi Yonik:
> >>     >>>
> >>     >>>          Actually that is what I am looking for. Can you
> >>     please point
> >>     >>>     me to where/how sorting is done per-segment?
> >>     >>>
> >>     >>>          When heaving indexing introduces or modifies
> >>     segments, would
> >>     >>>     it cause reloading of FieldCache at query time and thus
> would
> >>     >>>     impact search performance?
> >>     >>>
> >>     >>>     thanks
> >>     >>>
> >>     >>>     -John
> >>     >>>
> >>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >>     >>>     <yonik@lucidimagination.com
> >>     <ma...@lucidimagination.com>
> >>     <mailto:yonik@lucidimagination.com
> >>     <ma...@lucidimagination.com>>>
> >>     >>>     wrote:
> >>     >>>
> >>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >>     <john.wang@gmail.com <ma...@gmail.com>
> >>     >>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
> >>     wrote:
> >>     >>>     > Looking at the code, seems there is a disconnect between
> >>     >>>     how/when field
> >>     >>>     > cache is loaded when IndexWriter.getReader() is called.
> >>     >>>
> >>     >>>     I'm not sure what you mean by "disconnect"
> >>     >>>
> >>     >>>     > Is FieldCache updated?
> >>     >>>
> >>     >>>     FieldCache entries are populated on demand, as they always
> >>     have been.
> >>     >>>
> >>     >>>
> >>     >>>     > Otherwise, are we reloading FieldCache for each
> >>     >>>     > reader instance?
> >>     >>>
> >>     >>>     Searching/sorting is now per-segment, and so is the use of
> the
> >>     >>>     FieldCache.  Segments that don't change shouldn't have to
> >>     reload
> >>     >>>     their
> >>     >>>     FieldCache entries.
> >>     >>>
> >>     >>>     -Yonik
> >>     >>>     http://www.lucidimagination.com
> >>     >>>
> >>     >>>
> >>
> ---------------------------------------------------------------------
> >>     >>>     To unsubscribe, e-mail:
> >>     java-dev-unsubscribe@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
> >>     <ma...@lucene.apache.org>>
> >>     >>>     For additional commands, e-mail:
> >>     java-dev-help@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     >>>     <mailto:java-dev-help@lucene.apache.org
> >>     <ma...@lucene.apache.org>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>
> >>     >>
> >>     >>
> >>     >
> >>     >
> >>     > --
> >>     > - Mark
> >>     >
> >>     > http://www.lucidimagination.com
> >>     >
> >>     >
> >>     >
> >>     >
> >>     >
> >>
> ---------------------------------------------------------------------
> >>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     >
> >>     >
> >>
> >>
> ---------------------------------------------------------------------
> >>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>     <ma...@lucene.apache.org>
> >>
> >>
> >>
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

Oh - yeah - also - youll be passed a segment reader if thats what makes
sense. And sense it does, you will be passed one every time. You can
warm a multireader the same way though, so no reason to pin it down.

Mark Miller wrote:
> Come on dude :) Spend a half ounce of effort first. Mike's time is too
> valuable !
>
> Luckily mine is not.
>
> There is no default impl - the class is dead simple (and the class has
> been pointed out like 3 times in this thread - I'm not even fully
> following and I know where to find it):
>
>   public static abstract class IndexReaderWarmer {
>     public abstract void warm(IndexReader reader) throws IOException;
>   }
>
> Now pass something in that warms the reader. Load a fieldcache - do a
> search. Do the hokey pokey and turn your self around ...
>
> Investigation time: 5 seconds.
>
> John Wang wrote:
>   
>> Hi Michael:
>>
>>      Thanks for the pointer!
>>
>>       Pardon my ignorance, but I am still no seeing the connection
>> between this api to per/segment loading of FieldCache. (the api takes
>> in an IndexReader instead of maybe SegmentReader[])
>>
>>       Can you point me to maybe the default impl of IndexReaderWarmer
>> to help me understand?
>>
>> Thanks
>>
>> -John
>>
>> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>> <lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
>>
>>     This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
>>     can warm the reader w/o blocking ongoing updates.
>>
>>     Mike
>>
>>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
>>     > Right - when a large segment is invalidated, you will have a bigger
>>     > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
>>     > field cache every time though. Sounds like you are trying to
>>     deal with
>>     > those large segments changing anyway :) They are always an issue
>>     when
>>     > doing RT it seems.
>>     >
>>     > I don't believe deletes invalidate a field cache - terms from
>>     deleted
>>     > docs stay in a field cache and segmentreaders use their
>>     freqStream as
>>     > the fieldcache key. Only when the deletes are merged out would they
>>     > invalidate - but because your writing a new segment anyway ...
>>     >
>>     > - Mark
>>     >
>>     > John Wang wrote:
>>     >> I understand what you are saying. Let me detail what I am
>>     trying to say:
>>     >>
>>     >> When "currently processed segments" are flushed down, merge may
>>     >> happen. When merges happen, some of those "stable segments" will be
>>     >> invalidated, and so will the fieldcache data keyed by them.
>>     >>
>>     >> In a high update environment, such scenarios can happen quite
>>     often.
>>     >>
>>     >> The way the default mergePolicy works is that small segments get
>>     >> merged into the larger segments. Eventually, what will be
>>     invalidated
>>     >> would be a large segment, and when that happens, a large chunk
>>     of the
>>     >> field cache would be invalidated.
>>     >>
>>     >> Furthermore, in the case where there are high updates, the stable
>>     >> segments can be invalidate much sooner when there are deletes
>>     in those
>>     >> segments, and I would guess the corresponding FieldCache needs
>>     to be
>>     >> adjusted. Not sure how it is handled right now.
>>     >>
>>     >> Just my two cents, and of course when I find the time I will
>>     need to
>>     >> run some tests to see.
>>     >>
>>     >> -John
>>     >>
>>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>>     <ma...@thetaphi.de>
>>     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>>     >>
>>     >>     The NRT reader coming from the IndexWriter.getReader() has only
>>     >>     changes in the currently processed segments, the other segments
>>     >>     keep stable (and even their IndexReader keys used for the
>>     >>     FieldCache). The rest of the segments keep stable. For the
>>     >>     consumer it looks like a normal reader (it is in fact a
>>     >>     ReadOnlyDirectoryReader) supporting
>>     getSequentialSubReaders() and
>>     >>     so on.
>>     >>
>>     >>
>>     >>
>>     >>     -----
>>     >>     Uwe Schindler
>>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>>     >>     http://www.thetaphi.de
>>     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>>     >>
>>     >>    
>>     ------------------------------------------------------------------------
>>     >>
>>     >>     *From:* John Wang [mailto:john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>]
>>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>>     >>     *To:* java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     <mailto:java-dev@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>>     >>
>>     >>
>>     >>
>>     >>     Thanks Mark for the pointer!
>>     >>
>>     >>     I guess my point is with NRT, and when segment files change
>>     often,
>>     >>     this would be an issue, no?
>>     >>
>>     >>     Anyway, I can run some tests.
>>     >>
>>     >>     Thanks
>>     >>
>>     >>     -John
>>     >>
>>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>>     >>     <markrmiller@gmail.com <ma...@gmail.com>
>>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>> wrote:
>>     >>
>>     >>     1483 - indexsearcher pulls out a readers subreaders
>>     >>     (segmentreaders) and sends a collector over them one by one,
>>     >>     rather than using the multireader. So only fc for seg
>>     readers that
>>     >>     change need to be reloaded.
>>     >>
>>     >>     - Mark
>>     >>
>>     >>
>>     >>
>>     >>     http://www.lucidimagination.com (mobile)
>>     >>
>>     >>
>>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>>     <ma...@gmail.com>
>>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>>     wrote:
>>     >>
>>     >>>     Hi Yonik:
>>     >>>
>>     >>>          Actually that is what I am looking for. Can you
>>     please point
>>     >>>     me to where/how sorting is done per-segment?
>>     >>>
>>     >>>          When heaving indexing introduces or modifies
>>     segments, would
>>     >>>     it cause reloading of FieldCache at query time and thus would
>>     >>>     impact search performance?
>>     >>>
>>     >>>     thanks
>>     >>>
>>     >>>     -John
>>     >>>
>>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>     >>>     <yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>
>>     <mailto:yonik@lucidimagination.com
>>     <ma...@lucidimagination.com>>>
>>     >>>     wrote:
>>     >>>
>>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>>     <john.wang@gmail.com <ma...@gmail.com>
>>     >>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>>     wrote:
>>     >>>     > Looking at the code, seems there is a disconnect between
>>     >>>     how/when field
>>     >>>     > cache is loaded when IndexWriter.getReader() is called.
>>     >>>
>>     >>>     I'm not sure what you mean by "disconnect"
>>     >>>
>>     >>>     > Is FieldCache updated?
>>     >>>
>>     >>>     FieldCache entries are populated on demand, as they always
>>     have been.
>>     >>>
>>     >>>
>>     >>>     > Otherwise, are we reloading FieldCache for each
>>     >>>     > reader instance?
>>     >>>
>>     >>>     Searching/sorting is now per-segment, and so is the use of the
>>     >>>     FieldCache.  Segments that don't change shouldn't have to
>>     reload
>>     >>>     their
>>     >>>     FieldCache entries.
>>     >>>
>>     >>>     -Yonik
>>     >>>     http://www.lucidimagination.com
>>     >>>
>>     >>>    
>>     ---------------------------------------------------------------------
>>     >>>     To unsubscribe, e-mail:
>>     java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >>>     For additional commands, e-mail:
>>     java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >>>     <mailto:java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>>
>>     >>>
>>     >>>
>>     >>>
>>     >>
>>     >>
>>     >>
>>     >
>>     >
>>     > --
>>     > - Mark
>>     >
>>     > http://www.lucidimagination.com
>>     >
>>     >
>>     >
>>     >
>>     >
>>     ---------------------------------------------------------------------
>>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     >
>>     >
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>
>>
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

Come on dude :) Spend a half ounce of effort first. Mike's time is too
valuable !

Luckily mine is not.

There is no default impl - the class is dead simple (and the class has
been pointed out like 3 times in this thread - I'm not even fully
following and I know where to find it):

  public static abstract class IndexReaderWarmer {
    public abstract void warm(IndexReader reader) throws IOException;
  }

Now pass something in that warms the reader. Load a fieldcache - do a
search. Do the hokey pokey and turn your self around ...

Investigation time: 5 seconds.

John Wang wrote:
> Hi Michael:
>
>      Thanks for the pointer!
>
>       Pardon my ignorance, but I am still no seeing the connection
> between this api to per/segment loading of FieldCache. (the api takes
> in an IndexReader instead of maybe SegmentReader[])
>
>       Can you point me to maybe the default impl of IndexReaderWarmer
> to help me understand?
>
> Thanks
>
> -John
>
> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> <lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
>
>     This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
>     can warm the reader w/o blocking ongoing updates.
>
>     Mike
>
>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
>     > Right - when a large segment is invalidated, you will have a bigger
>     > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
>     > field cache every time though. Sounds like you are trying to
>     deal with
>     > those large segments changing anyway :) They are always an issue
>     when
>     > doing RT it seems.
>     >
>     > I don't believe deletes invalidate a field cache - terms from
>     deleted
>     > docs stay in a field cache and segmentreaders use their
>     freqStream as
>     > the fieldcache key. Only when the deletes are merged out would they
>     > invalidate - but because your writing a new segment anyway ...
>     >
>     > - Mark
>     >
>     > John Wang wrote:
>     >> I understand what you are saying. Let me detail what I am
>     trying to say:
>     >>
>     >> When "currently processed segments" are flushed down, merge may
>     >> happen. When merges happen, some of those "stable segments" will be
>     >> invalidated, and so will the fieldcache data keyed by them.
>     >>
>     >> In a high update environment, such scenarios can happen quite
>     often.
>     >>
>     >> The way the default mergePolicy works is that small segments get
>     >> merged into the larger segments. Eventually, what will be
>     invalidated
>     >> would be a large segment, and when that happens, a large chunk
>     of the
>     >> field cache would be invalidated.
>     >>
>     >> Furthermore, in the case where there are high updates, the stable
>     >> segments can be invalidate much sooner when there are deletes
>     in those
>     >> segments, and I would guess the corresponding FieldCache needs
>     to be
>     >> adjusted. Not sure how it is handled right now.
>     >>
>     >> Just my two cents, and of course when I find the time I will
>     need to
>     >> run some tests to see.
>     >>
>     >> -John
>     >>
>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>     <ma...@thetaphi.de>
>     >> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>     >>
>     >>     The NRT reader coming from the IndexWriter.getReader() has only
>     >>     changes in the currently processed segments, the other segments
>     >>     keep stable (and even their IndexReader keys used for the
>     >>     FieldCache). The rest of the segments keep stable. For the
>     >>     consumer it looks like a normal reader (it is in fact a
>     >>     ReadOnlyDirectoryReader) supporting
>     getSequentialSubReaders() and
>     >>     so on.
>     >>
>     >>
>     >>
>     >>     -----
>     >>     Uwe Schindler
>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     >>     http://www.thetaphi.de
>     >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     >>
>     >>    
>     ------------------------------------------------------------------------
>     >>
>     >>     *From:* John Wang [mailto:john.wang@gmail.com
>     <ma...@gmail.com>
>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>]
>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     >>     *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>     >>
>     >>
>     >>
>     >>     Thanks Mark for the pointer!
>     >>
>     >>     I guess my point is with NRT, and when segment files change
>     often,
>     >>     this would be an issue, no?
>     >>
>     >>     Anyway, I can run some tests.
>     >>
>     >>     Thanks
>     >>
>     >>     -John
>     >>
>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     >>     <markrmiller@gmail.com <ma...@gmail.com>
>     <mailto:markrmiller@gmail.com <ma...@gmail.com>>> wrote:
>     >>
>     >>     1483 - indexsearcher pulls out a readers subreaders
>     >>     (segmentreaders) and sends a collector over them one by one,
>     >>     rather than using the multireader. So only fc for seg
>     readers that
>     >>     change need to be reloaded.
>     >>
>     >>     - Mark
>     >>
>     >>
>     >>
>     >>     http://www.lucidimagination.com (mobile)
>     >>
>     >>
>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>     <ma...@gmail.com>
>     >>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>     wrote:
>     >>
>     >>>     Hi Yonik:
>     >>>
>     >>>          Actually that is what I am looking for. Can you
>     please point
>     >>>     me to where/how sorting is done per-segment?
>     >>>
>     >>>          When heaving indexing introduces or modifies
>     segments, would
>     >>>     it cause reloading of FieldCache at query time and thus would
>     >>>     impact search performance?
>     >>>
>     >>>     thanks
>     >>>
>     >>>     -John
>     >>>
>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>     >>>     <yonik@lucidimagination.com
>     <ma...@lucidimagination.com>
>     <mailto:yonik@lucidimagination.com
>     <ma...@lucidimagination.com>>>
>     >>>     wrote:
>     >>>
>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>     <john.wang@gmail.com <ma...@gmail.com>
>     >>>     <mailto:john.wang@gmail.com <ma...@gmail.com>>>
>     wrote:
>     >>>     > Looking at the code, seems there is a disconnect between
>     >>>     how/when field
>     >>>     > cache is loaded when IndexWriter.getReader() is called.
>     >>>
>     >>>     I'm not sure what you mean by "disconnect"
>     >>>
>     >>>     > Is FieldCache updated?
>     >>>
>     >>>     FieldCache entries are populated on demand, as they always
>     have been.
>     >>>
>     >>>
>     >>>     > Otherwise, are we reloading FieldCache for each
>     >>>     > reader instance?
>     >>>
>     >>>     Searching/sorting is now per-segment, and so is the use of the
>     >>>     FieldCache.  Segments that don't change shouldn't have to
>     reload
>     >>>     their
>     >>>     FieldCache entries.
>     >>>
>     >>>     -Yonik
>     >>>     http://www.lucidimagination.com
>     >>>
>     >>>    
>     ---------------------------------------------------------------------
>     >>>     To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >>>     For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >>>     <mailto:java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     >>>
>     >>>
>     >>>
>     >>
>     >>
>     >>
>     >
>     >
>     > --
>     > - Mark
>     >
>     > http://www.lucidimagination.com
>     >
>     >
>     >
>     >
>     >
>     ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >
>     >
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Hi Michael:

     Thanks for the pointer!

      Pardon my ignorance, but I am still no seeing the connection between
this api to per/segment loading of FieldCache. (the api takes in an
IndexReader instead of maybe SegmentReader[])

      Can you point me to maybe the default impl of IndexReaderWarmer to
help me understand?

Thanks

-John

On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
> can warm the reader w/o blocking ongoing updates.
>
> Mike
>
> On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller <ma...@gmail.com>
> wrote:
> > Right - when a large segment is invalidated, you will have a bigger
> > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
> > field cache every time though. Sounds like you are trying to deal with
> > those large segments changing anyway :) They are always an issue when
> > doing RT it seems.
> >
> > I don't believe deletes invalidate a field cache - terms from deleted
> > docs stay in a field cache and segmentreaders use their freqStream as
> > the fieldcache key. Only when the deletes are merged out would they
> > invalidate - but because your writing a new segment anyway ...
> >
> > - Mark
> >
> > John Wang wrote:
> >> I understand what you are saying. Let me detail what I am trying to say:
> >>
> >> When "currently processed segments" are flushed down, merge may
> >> happen. When merges happen, some of those "stable segments" will be
> >> invalidated, and so will the fieldcache data keyed by them.
> >>
> >> In a high update environment, such scenarios can happen quite often.
> >>
> >> The way the default mergePolicy works is that small segments get
> >> merged into the larger segments. Eventually, what will be invalidated
> >> would be a large segment, and when that happens, a large chunk of the
> >> field cache would be invalidated.
> >>
> >> Furthermore, in the case where there are high updates, the stable
> >> segments can be invalidate much sooner when there are deletes in those
> >> segments, and I would guess the corresponding FieldCache needs to be
> >> adjusted. Not sure how it is handled right now.
> >>
> >> Just my two cents, and of course when I find the time I will need to
> >> run some tests to see.
> >>
> >> -John
> >>
> >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> >> <ma...@thetaphi.de>> wrote:
> >>
> >>     The NRT reader coming from the IndexWriter.getReader() has only
> >>     changes in the currently processed segments, the other segments
> >>     keep stable (and even their IndexReader keys used for the
> >>     FieldCache). The rest of the segments keep stable. For the
> >>     consumer it looks like a normal reader (it is in fact a
> >>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
> >>     so on.
> >>
> >>
> >>
> >>     -----
> >>     Uwe Schindler
> >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >>     http://www.thetaphi.de
> >>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>
> >>
> ------------------------------------------------------------------------
> >>
> >>     *From:* John Wang [mailto:john.wang@gmail.com
> >>     <ma...@gmail.com>]
> >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >>     *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org
> >
> >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >>
> >>
> >>
> >>     Thanks Mark for the pointer!
> >>
> >>     I guess my point is with NRT, and when segment files change often,
> >>     this would be an issue, no?
> >>
> >>     Anyway, I can run some tests.
> >>
> >>     Thanks
> >>
> >>     -John
> >>
> >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
> >>
> >>     1483 - indexsearcher pulls out a readers subreaders
> >>     (segmentreaders) and sends a collector over them one by one,
> >>     rather than using the multireader. So only fc for seg readers that
> >>     change need to be reloaded.
> >>
> >>     - Mark
> >>
> >>
> >>
> >>     http://www.lucidimagination.com (mobile)
> >>
> >>
> >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
> >>     <ma...@gmail.com>> wrote:
> >>
> >>>     Hi Yonik:
> >>>
> >>>          Actually that is what I am looking for. Can you please point
> >>>     me to where/how sorting is done per-segment?
> >>>
> >>>          When heaving indexing introduces or modifies segments, would
> >>>     it cause reloading of FieldCache at query time and thus would
> >>>     impact search performance?
> >>>
> >>>     thanks
> >>>
> >>>     -John
> >>>
> >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >>>     <yonik@lucidimagination.com <ma...@lucidimagination.com>>
> >>>     wrote:
> >>>
> >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
> >>>     <ma...@gmail.com>> wrote:
> >>>     > Looking at the code, seems there is a disconnect between
> >>>     how/when field
> >>>     > cache is loaded when IndexWriter.getReader() is called.
> >>>
> >>>     I'm not sure what you mean by "disconnect"
> >>>
> >>>     > Is FieldCache updated?
> >>>
> >>>     FieldCache entries are populated on demand, as they always have
> been.
> >>>
> >>>
> >>>     > Otherwise, are we reloading FieldCache for each
> >>>     > reader instance?
> >>>
> >>>     Searching/sorting is now per-segment, and so is the use of the
> >>>     FieldCache.  Segments that don't change shouldn't have to reload
> >>>     their
> >>>     FieldCache entries.
> >>>
> >>>     -Yonik
> >>>     http://www.lucidimagination.com
> >>>
> >>>
> ---------------------------------------------------------------------
> >>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>>     <ma...@lucene.apache.org>
> >>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>     <ma...@lucene.apache.org>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Michael McCandless <lu...@mikemccandless.com>.

This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
can warm the reader w/o blocking ongoing updates.

Mike

On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller <ma...@gmail.com> wrote:
> Right - when a large segment is invalidated, you will have a bigger
> fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
> field cache every time though. Sounds like you are trying to deal with
> those large segments changing anyway :) They are always an issue when
> doing RT it seems.
>
> I don't believe deletes invalidate a field cache - terms from deleted
> docs stay in a field cache and segmentreaders use their freqStream as
> the fieldcache key. Only when the deletes are merged out would they
> invalidate - but because your writing a new segment anyway ...
>
> - Mark
>
> John Wang wrote:
>> I understand what you are saying. Let me detail what I am trying to say:
>>
>> When "currently processed segments" are flushed down, merge may
>> happen. When merges happen, some of those "stable segments" will be
>> invalidated, and so will the fieldcache data keyed by them.
>>
>> In a high update environment, such scenarios can happen quite often.
>>
>> The way the default mergePolicy works is that small segments get
>> merged into the larger segments. Eventually, what will be invalidated
>> would be a large segment, and when that happens, a large chunk of the
>> field cache would be invalidated.
>>
>> Furthermore, in the case where there are high updates, the stable
>> segments can be invalidate much sooner when there are deletes in those
>> segments, and I would guess the corresponding FieldCache needs to be
>> adjusted. Not sure how it is handled right now.
>>
>> Just my two cents, and of course when I find the time I will need to
>> run some tests to see.
>>
>> -John
>>
>> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>> <ma...@thetaphi.de>> wrote:
>>
>>     The NRT reader coming from the IndexWriter.getReader() has only
>>     changes in the currently processed segments, the other segments
>>     keep stable (and even their IndexReader keys used for the
>>     FieldCache). The rest of the segments keep stable. For the
>>     consumer it looks like a normal reader (it is in fact a
>>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
>>     so on.
>>
>>
>>
>>     -----
>>     Uwe Schindler
>>     H.-H.-Meier-Allee 63, D-28213 Bremen
>>     http://www.thetaphi.de
>>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>
>>     ------------------------------------------------------------------------
>>
>>     *From:* John Wang [mailto:john.wang@gmail.com
>>     <ma...@gmail.com>]
>>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>>
>>
>>
>>     Thanks Mark for the pointer!
>>
>>     I guess my point is with NRT, and when segment files change often,
>>     this would be an issue, no?
>>
>>     Anyway, I can run some tests.
>>
>>     Thanks
>>
>>     -John
>>
>>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
>>
>>     1483 - indexsearcher pulls out a readers subreaders
>>     (segmentreaders) and sends a collector over them one by one,
>>     rather than using the multireader. So only fc for seg readers that
>>     change need to be reloaded.
>>
>>     - Mark
>>
>>
>>
>>     http://www.lucidimagination.com (mobile)
>>
>>
>>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>>     Hi Yonik:
>>>
>>>          Actually that is what I am looking for. Can you please point
>>>     me to where/how sorting is done per-segment?
>>>
>>>          When heaving indexing introduces or modifies segments, would
>>>     it cause reloading of FieldCache at query time and thus would
>>>     impact search performance?
>>>
>>>     thanks
>>>
>>>     -John
>>>
>>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>>     <yonik@lucidimagination.com <ma...@lucidimagination.com>>
>>>     wrote:
>>>
>>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
>>>     <ma...@gmail.com>> wrote:
>>>     > Looking at the code, seems there is a disconnect between
>>>     how/when field
>>>     > cache is loaded when IndexWriter.getReader() is called.
>>>
>>>     I'm not sure what you mean by "disconnect"
>>>
>>>     > Is FieldCache updated?
>>>
>>>     FieldCache entries are populated on demand, as they always have been.
>>>
>>>
>>>     > Otherwise, are we reloading FieldCache for each
>>>     > reader instance?
>>>
>>>     Searching/sorting is now per-segment, and so is the use of the
>>>     FieldCache.  Segments that don't change shouldn't have to reload
>>>     their
>>>     FieldCache entries.
>>>
>>>     -Yonik
>>>     http://www.lucidimagination.com
>>>
>>>     ---------------------------------------------------------------------
>>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>     <ma...@lucene.apache.org>
>>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>     <ma...@lucene.apache.org>
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

Right - when a large segment is invalidated, you will have a bigger
fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
field cache every time though. Sounds like you are trying to deal with
those large segments changing anyway :) They are always an issue when
doing RT it seems.

I don't believe deletes invalidate a field cache - terms from deleted
docs stay in a field cache and segmentreaders use their freqStream as
the fieldcache key. Only when the deletes are merged out would they
invalidate - but because your writing a new segment anyway ...

- Mark

John Wang wrote:
> I understand what you are saying. Let me detail what I am trying to say:
>
> When "currently processed segments" are flushed down, merge may
> happen. When merges happen, some of those "stable segments" will be
> invalidated, and so will the fieldcache data keyed by them.
>
> In a high update environment, such scenarios can happen quite often.
>
> The way the default mergePolicy works is that small segments get
> merged into the larger segments. Eventually, what will be invalidated
> would be a large segment, and when that happens, a large chunk of the
> field cache would be invalidated.
>
> Furthermore, in the case where there are high updates, the stable
> segments can be invalidate much sooner when there are deletes in those
> segments, and I would guess the corresponding FieldCache needs to be
> adjusted. Not sure how it is handled right now.
>
> Just my two cents, and of course when I find the time I will need to
> run some tests to see.
>
> -John
>
> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
>
>     The NRT reader coming from the IndexWriter.getReader() has only
>     changes in the currently processed segments, the other segments
>     keep stable (and even their IndexReader keys used for the
>     FieldCache). The rest of the segments keep stable. For the
>     consumer it looks like a normal reader (it is in fact a
>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
>     so on.
>
>      
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* John Wang [mailto:john.wang@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>
>      
>
>     Thanks Mark for the pointer!
>
>     I guess my point is with NRT, and when segment files change often,
>     this would be an issue, no?
>
>     Anyway, I can run some tests.
>
>     Thanks
>
>     -John
>
>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     <markrmiller@gmail.com <ma...@gmail.com>> wrote:
>
>     1483 - indexsearcher pulls out a readers subreaders
>     (segmentreaders) and sends a collector over them one by one,
>     rather than using the multireader. So only fc for seg readers that
>     change need to be reloaded.  
>
>     - Mark
>
>      
>
>     http://www.lucidimagination.com (mobile)
>
>
>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>     <ma...@gmail.com>> wrote:
>
>>     Hi Yonik:
>>
>>          Actually that is what I am looking for. Can you please point
>>     me to where/how sorting is done per-segment?
>>
>>          When heaving indexing introduces or modifies segments, would
>>     it cause reloading of FieldCache at query time and thus would
>>     impact search performance?
>>
>>     thanks
>>
>>     -John
>>
>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>     <yonik@lucidimagination.com <ma...@lucidimagination.com>>
>>     wrote:
>>
>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
>>     <ma...@gmail.com>> wrote:
>>     > Looking at the code, seems there is a disconnect between
>>     how/when field
>>     > cache is loaded when IndexWriter.getReader() is called.
>>
>>     I'm not sure what you mean by "disconnect"
>>
>>     > Is FieldCache updated?
>>
>>     FieldCache entries are populated on demand, as they always have been.
>>
>>
>>     > Otherwise, are we reloading FieldCache for each
>>     > reader instance?
>>
>>     Searching/sorting is now per-segment, and so is the use of the
>>     FieldCache.  Segments that don't change shouldn't have to reload
>>     their
>>     FieldCache entries.
>>
>>     -Yonik
>>     http://www.lucidimagination.com
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <ma...@lucene.apache.org>
>>
>>      
>>
>      
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

I understand what you are saying. Let me detail what I am trying to say:

When "currently processed segments" are flushed down, merge may happen. When
merges happen, some of those "stable segments" will be invalidated, and so
will the fieldcache data keyed by them.

In a high update environment, such scenarios can happen quite often.

The way the default mergePolicy works is that small segments get merged into
the larger segments. Eventually, what will be invalidated would be a large
segment, and when that happens, a large chunk of the field cache would be
invalidated.

Furthermore, in the case where there are high updates, the stable segments
can be invalidate much sooner when there are deletes in those segments, and
I would guess the corresponding FieldCache needs to be adjusted. Not sure
how it is handled right now.

Just my two cents, and of course when I find the time I will need to run
some tests to see.

-John

On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  The NRT reader coming from the IndexWriter.getReader() has only changes
> in the currently processed segments, the other segments keep stable (and
> even their IndexReader keys used for the FieldCache). The rest of the
> segments keep stable. For the consumer it looks like a normal reader (it is
> in fact a ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
> so on.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* John Wang [mailto:john.wang@gmail.com]
> *Sent:* Tuesday, September 22, 2009 9:32 AM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>
>
>
> Thanks Mark for the pointer!
>
> I guess my point is with NRT, and when segment files change often, this
> would be an issue, no?
>
> Anyway, I can run some tests.
>
> Thanks
>
> -John
>
> On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller <ma...@gmail.com>
> wrote:
>
> 1483 - indexsearcher pulls out a readers subreaders (segmentreaders) and
> sends a collector over them one by one, rather than using the multireader.
> So only fc for seg readers that change need to be reloaded.
>
> - Mark
>
>
>
> http://www.lucidimagination.com (mobile)
>
>
> On Sep 22, 2009, at 1:27 AM, John Wang <jo...@gmail.com> wrote:
>
>  Hi Yonik:
>
>      Actually that is what I am looking for. Can you please point me to
> where/how sorting is done per-segment?
>
>      When heaving indexing introduces or modifies segments, would it cause
> reloading of FieldCache at query time and thus would impact search
> performance?
>
> thanks
>
> -John
>
> On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <<y...@lucidimagination.com>
> yonik@lucidimagination.com> wrote:
>
> On Tue, Sep 22, 2009 at 12:56 AM, John Wang < <jo...@gmail.com>
> john.wang@gmail.com> wrote:
> > Looking at the code, seems there is a disconnect between how/when field
> > cache is loaded when IndexWriter.getReader() is called.
>
> I'm not sure what you mean by "disconnect"
>
> > Is FieldCache updated?
>
> FieldCache entries are populated on demand, as they always have been.
>
>
> > Otherwise, are we reloading FieldCache for each
> > reader instance?
>
> Searching/sorting is now per-segment, and so is the use of the
> FieldCache.  Segments that don't change shouldn't have to reload their
> FieldCache entries.
>
> -Yonik
>  <http://www.lucidimagination.com>http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: <ja...@lucene.apache.org>
> java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: <ja...@lucene.apache.org>
> java-dev-help@lucene.apache.org
>
>
>
>
>

RE: 2.9 NRT w.r.t. sorting and field cache

Posted by Uwe Schindler <uw...@thetaphi.de>.

The NRT reader coming from the IndexWriter.getReader() has only changes in
the currently processed segments, the other segments keep stable (and even
their IndexReader keys used for the FieldCache). The rest of the segments
keep stable. For the consumer it looks like a normal reader (it is in fact a
ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and so on. 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: John Wang [mailto:john.wang@gmail.com] 
Sent: Tuesday, September 22, 2009 9:32 AM
To: java-dev@lucene.apache.org
Subject: Re: 2.9 NRT w.r.t. sorting and field cache

Thanks Mark for the pointer!

I guess my point is with NRT, and when segment files change often, this
would be an issue, no?

Anyway, I can run some tests.

Thanks

-John

On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller <ma...@gmail.com> wrote:

1483 - indexsearcher pulls out a readers subreaders (segmentreaders) and
sends a collector over them one by one, rather than using the multireader.
So only fc for seg readers that change need to be reloaded.  

- Mark

http://www.lucidimagination.com (mobile)

On Sep 22, 2009, at 1:27 AM, John Wang <jo...@gmail.com> wrote:

Hi Yonik:

     Actually that is what I am looking for. Can you please point me to
where/how sorting is done per-segment?

     When heaving indexing introduces or modifies segments, would it cause
reloading of FieldCache at query time and thus would impact search
performance?

thanks

-John

On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <
<ma...@lucidimagination.com> yonik@lucidimagination.com> wrote:

On Tue, Sep 22, 2009 at 12:56 AM, John Wang < <ma...@gmail.com>
john.wang@gmail.com> wrote:
> Looking at the code, seems there is a disconnect between how/when field
> cache is loaded when IndexWriter.getReader() is called.

I'm not sure what you mean by "disconnect"

> Is FieldCache updated?

FieldCache entries are populated on demand, as they always have been.

> Otherwise, are we reloading FieldCache for each
> reader instance?

Searching/sorting is now per-segment, and so is the use of the
FieldCache.  Segments that don't change shouldn't have to reload their
FieldCache entries.

-Yonik
 <http://www.lucidimagination.com> http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail:  <ma...@lucene.apache.org>
java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail:  <ma...@lucene.apache.org>
java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Thanks Mark for the pointer!

I guess my point is with NRT, and when segment files change often, this
would be an issue, no?

Anyway, I can run some tests.

Thanks

-John

On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller <ma...@gmail.com> wrote:

> 1483 - indexsearcher pulls out a readers subreaders (segmentreaders) and
> sends a collector over them one by one, rather than using the multireader.
> So only fc for seg readers that change need to be reloaded.
>
> - Mark
> http://www.lucidimagination.com (mobile)
>
> On Sep 22, 2009, at 1:27 AM, John Wang <jo...@gmail.com> wrote:
>
> Hi Yonik:
>
>      Actually that is what I am looking for. Can you please point me to
> where/how sorting is done per-segment?
>
>      When heaving indexing introduces or modifies segments, would it cause
> reloading of FieldCache at query time and thus would impact search
> performance?
>
> thanks
>
> -John
>
> On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <<y...@lucidimagination.com>
> yonik@lucidimagination.com> wrote:
>
>> On Tue, Sep 22, 2009 at 12:56 AM, John Wang < <jo...@gmail.com>
>> john.wang@gmail.com> wrote:
>> > Looking at the code, seems there is a disconnect between how/when field
>> > cache is loaded when IndexWriter.getReader() is called.
>>
>> I'm not sure what you mean by "disconnect"
>>
>> > Is FieldCache updated?
>>
>> FieldCache entries are populated on demand, as they always have been.
>>
>> > Otherwise, are we reloading FieldCache for each
>> > reader instance?
>>
>> Searching/sorting is now per-segment, and so is the use of the
>> FieldCache.  Segments that don't change shouldn't have to reload their
>> FieldCache entries.
>>
>> -Yonik
>>  <http://www.lucidimagination.com>http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: <ja...@lucene.apache.org>
>> java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: <ja...@lucene.apache.org>
>> java-dev-help@lucene.apache.org
>>
>>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Mark Miller <ma...@gmail.com>.

1483 - indexsearcher pulls out a readers subreaders (segmentreaders)  
and sends a collector over them one by one, rather than using the  
multireader. So only fc for seg readers that change need to be reloaded.

- Mark

http://www.lucidimagination.com (mobile)

On Sep 22, 2009, at 1:27 AM, John Wang <jo...@gmail.com> wrote:

> Hi Yonik:
>
>      Actually that is what I am looking for. Can you please point me  
> to where/how sorting is done per-segment?
>
>      When heaving indexing introduces or modifies segments, would it  
> cause reloading of FieldCache at query time and thus would impact  
> search performance?
>
> thanks
>
> -John
>
> On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <yonik@lucidimagination.com 
> > wrote:
> On Tue, Sep 22, 2009 at 12:56 AM, John Wang <jo...@gmail.com>  
> wrote:
> > Looking at the code, seems there is a disconnect between how/when  
> field
> > cache is loaded when IndexWriter.getReader() is called.
>
> I'm not sure what you mean by "disconnect"
>
> > Is FieldCache updated?
>
> FieldCache entries are populated on demand, as they always have been.
>
> > Otherwise, are we reloading FieldCache for each
> > reader instance?
>
> Searching/sorting is now per-segment, and so is the use of the
> FieldCache.  Segments that don't change shouldn't have to reload their
> FieldCache entries.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Jason Rutherglen <ja...@gmail.com>.

> When heaving indexing introduces or modifies segments, would
it cause reloading of FieldCache at query time and thus would
impact search performance?

How is this different than previous versions of Lucene? In 2.9
the field caches are only loaded for new segments incrementally,
instead of over the Multi*Reader. It sounds like you're confused
about the key used for the field cache which is now the
freqStream (see IR.getFieldCacheKey) rather than IR so that a
different field cache isn't loaded each time a new instance of
IR is instantiated for an already loaded segment.

Feel free to contribute to the wiki about your experiences, I
think others will have similar questions.

On Mon, Sep 21, 2009 at 10:27 PM, John Wang <jo...@gmail.com> wrote:
> Hi Yonik:
>
>      Actually that is what I am looking for. Can you please point me to
> where/how sorting is done per-segment?
>
>      When heaving indexing introduces or modifies segments, would it cause
> reloading of FieldCache at query time and thus would impact search
> performance?
>
> thanks
>
> -John
>
> On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> On Tue, Sep 22, 2009 at 12:56 AM, John Wang <jo...@gmail.com> wrote:
>> > Looking at the code, seems there is a disconnect between how/when field
>> > cache is loaded when IndexWriter.getReader() is called.
>>
>> I'm not sure what you mean by "disconnect"
>>
>> > Is FieldCache updated?
>>
>> FieldCache entries are populated on demand, as they always have been.
>>
>> > Otherwise, are we reloading FieldCache for each
>> > reader instance?
>>
>> Searching/sorting is now per-segment, and so is the use of the
>> FieldCache.  Segments that don't change shouldn't have to reload their
>> FieldCache entries.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by John Wang <jo...@gmail.com>.

Hi Yonik:

     Actually that is what I am looking for. Can you please point me to
where/how sorting is done per-segment?

     When heaving indexing introduces or modifies segments, would it cause
reloading of FieldCache at query time and thus would impact search
performance?

thanks

-John

On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Tue, Sep 22, 2009 at 12:56 AM, John Wang <jo...@gmail.com> wrote:
> > Looking at the code, seems there is a disconnect between how/when field
> > cache is loaded when IndexWriter.getReader() is called.
>
> I'm not sure what you mean by "disconnect"
>
> > Is FieldCache updated?
>
> FieldCache entries are populated on demand, as they always have been.
>
> > Otherwise, are we reloading FieldCache for each
> > reader instance?
>
> Searching/sorting is now per-segment, and so is the use of the
> FieldCache.  Segments that don't change shouldn't have to reload their
> FieldCache entries.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Sep 22, 2009 at 12:56 AM, John Wang <jo...@gmail.com> wrote:
> Looking at the code, seems there is a disconnect between how/when field
> cache is loaded when IndexWriter.getReader() is called.

I'm not sure what you mean by "disconnect"

> Is FieldCache updated?

FieldCache entries are populated on demand, as they always have been.

> Otherwise, are we reloading FieldCache for each
> reader instance?

Searching/sorting is now per-segment, and so is the use of the
FieldCache.  Segments that don't change shouldn't have to reload their
FieldCache entries.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org