You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ravikumar Govindarajan <ra...@gmail.com> on 2013/11/07 07:33:01 UTC

IndexReader close listeners and NRT

I am trying to cache a BitSet by attaching to IndexReader.addCloseListener,
using the getCoreCacheKey()

But, I find that getCoreCacheKey() returns the IndexReader object itself as
the key.

Whenever the IndexReader re-opens via NRT because of deletes, will it mean
that my cache will be purged, because a new IndexReader is opened?

Are there ways to avoid this purging?

--
Ravi

Re: IndexReader close listeners and NRT

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Thanks Mike. Explicit type-cast to SegmentReader will do the trick for the
moment.

--
Ravi


On Fri, Nov 8, 2013 at 6:17 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Fri, Nov 8, 2013 at 12:22 AM, Ravikumar Govindarajan
> <ra...@gmail.com> wrote:
> >> So, in your code, "reader" is the top-level reader, not the one
> >> segment you are pulling a scorer on (context.reader()).
> >>
> >> So you are building your cache on the top-level reader, not the
> >> segment's reader?  Is that intentional?  (It's not NRT friendly).
> >
> > Not really. It is an IndexSearcher(AtomicReader) that populates the
> BitSet
>
> Hmm, I see the code referencing "reader" but it never assigns it?  So
> I assumed this was your toplevel reader (somewhere).  Maybe you are
> missing an AtomicReader reader = context.getReader() in that code?
>
> >> But, yes, your ReaderClosedListener will be called once that top-level
> >> reader is closed, and that will then evict its entries from the cache.
> >
> > This is the current problem I am facing. I actually want to key on
> > CoreClosedListener for this cache, but lucene exposes only a
> > ReaderClosedListener(), which causes frequent purge/build of the cache
> > during NRT life-cycle.
> >
> > Is it possible to hook into a CoreClosedListener somehow, so that I can
> > mimic FieldCacheImpl behavior and become free from NRT logic?
>
> You can cast the AtomicReader to SegmentReader and call
> .addCoreClosedListener?
>
> > Also, when we have a getCoreCacheKey() exposed from IndexReader, should
> we
> > also not have a addCoreClosedListener() in it? Will it cause too much
> > confusion, as only SegmentReader might have a valid impl for that method?
>
> You really should only use .getCoreCacheKey on SegmentReader; all
> other impls will just "return this" (and then you have full cache
> turnover after every NRT reopen).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IndexReader close listeners and NRT

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Fri, Nov 8, 2013 at 12:22 AM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
>> So, in your code, "reader" is the top-level reader, not the one
>> segment you are pulling a scorer on (context.reader()).
>>
>> So you are building your cache on the top-level reader, not the
>> segment's reader?  Is that intentional?  (It's not NRT friendly).
>
> Not really. It is an IndexSearcher(AtomicReader) that populates the BitSet

Hmm, I see the code referencing "reader" but it never assigns it?  So
I assumed this was your toplevel reader (somewhere).  Maybe you are
missing an AtomicReader reader = context.getReader() in that code?

>> But, yes, your ReaderClosedListener will be called once that top-level
>> reader is closed, and that will then evict its entries from the cache.
>
> This is the current problem I am facing. I actually want to key on
> CoreClosedListener for this cache, but lucene exposes only a
> ReaderClosedListener(), which causes frequent purge/build of the cache
> during NRT life-cycle.
>
> Is it possible to hook into a CoreClosedListener somehow, so that I can
> mimic FieldCacheImpl behavior and become free from NRT logic?

You can cast the AtomicReader to SegmentReader and call .addCoreClosedListener?

> Also, when we have a getCoreCacheKey() exposed from IndexReader, should we
> also not have a addCoreClosedListener() in it? Will it cause too much
> confusion, as only SegmentReader might have a valid impl for that method?

You really should only use .getCoreCacheKey on SegmentReader; all
other impls will just "return this" (and then you have full cache
turnover after every NRT reopen).

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader close listeners and NRT

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
> So, in your code, "reader" is the top-level reader, not the one
> segment you are pulling a scorer on (context.reader()).
>
> So you are building your cache on the top-level reader, not the
> segment's reader?  Is that intentional?  (It's not NRT friendly).

Not really. It is an IndexSearcher(AtomicReader) that populates the BitSet

> But, yes, your ReaderClosedListener will be called once that top-level
> reader is closed, and that will then evict its entries from the cache.

This is the current problem I am facing. I actually want to key on
CoreClosedListener for this cache, but lucene exposes only a
ReaderClosedListener(), which causes frequent purge/build of the cache
during NRT life-cycle.

Is it possible to hook into a CoreClosedListener somehow, so that I can
mimic FieldCacheImpl behavior and become free from NRT logic?

Also, when we have a getCoreCacheKey() exposed from IndexReader, should we
also not have a addCoreClosedListener() in it? Will it cause too much
confusion, as only SegmentReader might have a valid impl for that method?

--
Ravi

On Fri, Nov 8, 2013 at 12:04 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:
>
> On Thu, Nov 7, 2013 at 12:18 PM, Ravikumar Govindarajan
> <ra...@gmail.com> wrote:
> > Thanks Mike.
> >
> > If you look at my impl, I am using the getCoreCacheKey() only, but keyed
> > on a ReaderClosedListener and purging it onClose(). When NRT does
reopens,
> > will it invoke the onClose() method for the expired-reader?.
>
> OK, I see.
>
> So, in your code, "reader" is the top-level reader, not the one
> segment you are pulling a scorer on (context.reader()).
>
> So you are building your cache on the top-level reader, not the
> segment's reader?  Is that intentional?  (It's not NRT friendly).
>
> But, yes, your ReaderClosedListener will be called once that top-level
> reader is closed, and that will then evict its entries from the cache.
>
> > I saw that
> > FieldCacheImpl is using a CoreClosedListener, whereas I am using a
> > ReaderClosedListener. What is the difference between these two?
>
> A single segment's core readers (that read the postings, stored
> fields, term vectors, etc.) are shared between multiple SegmentReader
> instances; each of those SegmentReader instances only changes in what
> documents are deleted.  The core is closed only once all
> SegmentReaders that share that core have been closed.
>
> > I will surely look at the FBS replacement you have pointed out. BTW,
this
> > method actually runs during opening of an index for the first-time. May
be
> > for clarity and organisation, I will refactor this code as you have
> > suggested
>
> Cool....
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: IndexReader close listeners and NRT

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Thu, Nov 7, 2013 at 12:18 PM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> Thanks Mike.
>
> If you look at my impl, I am using the getCoreCacheKey() only, but keyed
> on a ReaderClosedListener and purging it onClose(). When NRT does reopens,
> will it invoke the onClose() method for the expired-reader?.

OK, I see.

So, in your code, "reader" is the top-level reader, not the one
segment you are pulling a scorer on (context.reader()).

So you are building your cache on the top-level reader, not the
segment's reader?  Is that intentional?  (It's not NRT friendly).

But, yes, your ReaderClosedListener will be called once that top-level
reader is closed, and that will then evict its entries from the cache.

> I saw that
> FieldCacheImpl is using a CoreClosedListener, whereas I am using a
> ReaderClosedListener. What is the difference between these two?

A single segment's core readers (that read the postings, stored
fields, term vectors, etc.) are shared between multiple SegmentReader
instances; each of those SegmentReader instances only changes in what
documents are deleted.  The core is closed only once all
SegmentReaders that share that core have been closed.

> I will surely look at the FBS replacement you have pointed out. BTW, this
> method actually runs during opening of an index for the first-time. May be
> for clarity and organisation, I will refactor this code as you have
> suggested

Cool....

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader close listeners and NRT

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Thanks Mike.

If you look at my impl, I am using the getCoreCacheKey() only, but keyed
on a ReaderClosedListener and purging it onClose(). When NRT does reopens,
will it invoke the onClose() method for the expired-reader?. I saw that
FieldCacheImpl is using a CoreClosedListener, whereas I am using a
ReaderClosedListener. What is the difference between these two?

I will surely look at the FBS replacement you have pointed out. BTW, this
method actually runs during opening of an index for the first-time. May be
for clarity and organisation, I will refactor this code as you have
suggested

On Thursday, November 7, 2013, Michael McCandless wrote:

> Hi, a few comments on quickly looking at the code...
>
> It's sort of strange, inside the Weight.scorer() method, to go and
> build an IndexSearcher and run a whole new search, if the cache entry
> is missing.  Could you instead just do a top-level search, which then
> populates the cache per-segment?
>
> Also, FixedBitSet is better here than OpenBitSet: it should be a bit
> faster, because OpenBitSet is "elastic".
>
> When you use the core cache key, it does not change for a
> SegmentReader that was reopened with new deletes; this is the whole
> point of the core cache key. So, if deletes changed for a segment and
> you reopened, your cache entry will reuse whatever was created the
> first time for that segment.  If deletes matter, then it's usually
> best to look at liveDocs "live" and fold them in, instead of
> regenerating the whole cache entry.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Nov 7, 2013 at 8:04 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
> > Thanks Mike. Can you help me out with one more question?
> >
> > I have a sample impl as below, where I am adding a ReaderClosedListener
> to
> > purge the BitSet.
> >
> > When using NRT with applyAllDeletes, old-reader will get closed and
> > new-reader will open. In such a case, will the below impl-cache also be
> > purged and re-built?
> >
> > I also saw that FieldCache uses a CoreClosedListener, instead of
> > ReaderClosedListener and I need such a functionality. It will be great to
> > maintain the BitSet cache at the cost of taking extra hit for testing
> > deletes.
> >
> > @Override
> >
> > public Scorer scorer(AtomicReaderContext context, boolean
> scoreDocsInOrder,
> > boolean topScorer, Bits acceptDocs) {
> >
> > Object key = context.getReader().getCoreCacheKey();
> >
> > OpenBitSet bitSet = cacheMap.get(key);
> >
> >     if (bitSet == null) {
> >
> >       reader.addReaderClosedListener(new ReaderClosedListener() {
> >
> >         @Override
> >
> >         public void onClose(IndexReader reader) {
> >
> >           Object key = reader.getCoreCacheKey();
> >
> >           cacheMap.remove(key);
> >
> >         }
> >
> >       });
> >
> >       final OpenBitSet bs = new OpenBitSet(reader.maxDoc());
> >
> >       //Add the empty bit-set first
> >
> >       cacheMap.put(key, bs);
> >
> >       IndexSearcher searcher = new IndexSearcher(reader);
> >
> >       //Do a search and populate the bitset
> >
> >       return bs;
> >
> >     }
> >
> >    //Proceed with scoring logic
> >
> > }
> >
> > --
> >
> > Ravi
> >
> >
> > On Thu, Nov 7, 2013 at 4:28 PM, Michael McCandless <
> > lucene@mikemccandless.com <javascript:;>> wrote:
> >
> >> You need to call .getCoreCacheKey() on each of the sub-readers
> >> (returned by IndexReader.leaves()), to play well with NRT.
> >>
> >> Typically you'd do so in a context that already sees each leaf, like a
> >> custom Filter or a Collector.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Thu, Nov 7, 2013 at 1:33 AM, Ravikumar Govindarajan
> >> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
> >> > I am trying to cache a BitSet by attaching to
> >> IndexReader.addCloseListener,
> >> > using the getCoreCacheKey()
> >> >
> >> > But, I find that getCoreCacheKey() returns the IndexReader object
> itself
> >> as
> >> > the key.
> >> >
> >> > Whenever the IndexReader re-opens via NRT because of deletes, will it
> >> mean
> >> > that my cache will be purged, because a new IndexReader is opened?
> >> >
> >> > Are there ways to avoid this purging?
> >> >
> >> > --
> >> > Ravi
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
> >> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>
>

Re: IndexReader close listeners and NRT

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi, a few comments on quickly looking at the code...

It's sort of strange, inside the Weight.scorer() method, to go and
build an IndexSearcher and run a whole new search, if the cache entry
is missing.  Could you instead just do a top-level search, which then
populates the cache per-segment?

Also, FixedBitSet is better here than OpenBitSet: it should be a bit
faster, because OpenBitSet is "elastic".

When you use the core cache key, it does not change for a
SegmentReader that was reopened with new deletes; this is the whole
point of the core cache key. So, if deletes changed for a segment and
you reopened, your cache entry will reuse whatever was created the
first time for that segment.  If deletes matter, then it's usually
best to look at liveDocs "live" and fold them in, instead of
regenerating the whole cache entry.


Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 7, 2013 at 8:04 AM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> Thanks Mike. Can you help me out with one more question?
>
> I have a sample impl as below, where I am adding a ReaderClosedListener to
> purge the BitSet.
>
> When using NRT with applyAllDeletes, old-reader will get closed and
> new-reader will open. In such a case, will the below impl-cache also be
> purged and re-built?
>
> I also saw that FieldCache uses a CoreClosedListener, instead of
> ReaderClosedListener and I need such a functionality. It will be great to
> maintain the BitSet cache at the cost of taking extra hit for testing
> deletes.
>
> @Override
>
> public Scorer scorer(AtomicReaderContext context, boolean scoreDocsInOrder,
> boolean topScorer, Bits acceptDocs) {
>
> Object key = context.getReader().getCoreCacheKey();
>
> OpenBitSet bitSet = cacheMap.get(key);
>
>     if (bitSet == null) {
>
>       reader.addReaderClosedListener(new ReaderClosedListener() {
>
>         @Override
>
>         public void onClose(IndexReader reader) {
>
>           Object key = reader.getCoreCacheKey();
>
>           cacheMap.remove(key);
>
>         }
>
>       });
>
>       final OpenBitSet bs = new OpenBitSet(reader.maxDoc());
>
>       //Add the empty bit-set first
>
>       cacheMap.put(key, bs);
>
>       IndexSearcher searcher = new IndexSearcher(reader);
>
>       //Do a search and populate the bitset
>
>       return bs;
>
>     }
>
>    //Proceed with scoring logic
>
> }
>
> --
>
> Ravi
>
>
> On Thu, Nov 7, 2013 at 4:28 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> You need to call .getCoreCacheKey() on each of the sub-readers
>> (returned by IndexReader.leaves()), to play well with NRT.
>>
>> Typically you'd do so in a context that already sees each leaf, like a
>> custom Filter or a Collector.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Nov 7, 2013 at 1:33 AM, Ravikumar Govindarajan
>> <ra...@gmail.com> wrote:
>> > I am trying to cache a BitSet by attaching to
>> IndexReader.addCloseListener,
>> > using the getCoreCacheKey()
>> >
>> > But, I find that getCoreCacheKey() returns the IndexReader object itself
>> as
>> > the key.
>> >
>> > Whenever the IndexReader re-opens via NRT because of deletes, will it
>> mean
>> > that my cache will be purged, because a new IndexReader is opened?
>> >
>> > Are there ways to avoid this purging?
>> >
>> > --
>> > Ravi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader close listeners and NRT

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Thanks Mike. Can you help me out with one more question?

I have a sample impl as below, where I am adding a ReaderClosedListener to
purge the BitSet.

When using NRT with applyAllDeletes, old-reader will get closed and
new-reader will open. In such a case, will the below impl-cache also be
purged and re-built?

I also saw that FieldCache uses a CoreClosedListener, instead of
ReaderClosedListener and I need such a functionality. It will be great to
maintain the BitSet cache at the cost of taking extra hit for testing
deletes.

@Override

public Scorer scorer(AtomicReaderContext context, boolean scoreDocsInOrder,
boolean topScorer, Bits acceptDocs) {

Object key = context.getReader().getCoreCacheKey();

OpenBitSet bitSet = cacheMap.get(key);

    if (bitSet == null) {

      reader.addReaderClosedListener(new ReaderClosedListener() {

        @Override

        public void onClose(IndexReader reader) {

          Object key = reader.getCoreCacheKey();

          cacheMap.remove(key);

        }

      });

      final OpenBitSet bs = new OpenBitSet(reader.maxDoc());

      //Add the empty bit-set first

      cacheMap.put(key, bs);

      IndexSearcher searcher = new IndexSearcher(reader);

      //Do a search and populate the bitset

      return bs;

    }

   //Proceed with scoring logic

}

--

Ravi


On Thu, Nov 7, 2013 at 4:28 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You need to call .getCoreCacheKey() on each of the sub-readers
> (returned by IndexReader.leaves()), to play well with NRT.
>
> Typically you'd do so in a context that already sees each leaf, like a
> custom Filter or a Collector.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Nov 7, 2013 at 1:33 AM, Ravikumar Govindarajan
> <ra...@gmail.com> wrote:
> > I am trying to cache a BitSet by attaching to
> IndexReader.addCloseListener,
> > using the getCoreCacheKey()
> >
> > But, I find that getCoreCacheKey() returns the IndexReader object itself
> as
> > the key.
> >
> > Whenever the IndexReader re-opens via NRT because of deletes, will it
> mean
> > that my cache will be purged, because a new IndexReader is opened?
> >
> > Are there ways to avoid this purging?
> >
> > --
> > Ravi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: IndexReader close listeners and NRT

Posted by Michael McCandless <lu...@mikemccandless.com>.
You need to call .getCoreCacheKey() on each of the sub-readers
(returned by IndexReader.leaves()), to play well with NRT.

Typically you'd do so in a context that already sees each leaf, like a
custom Filter or a Collector.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 7, 2013 at 1:33 AM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> I am trying to cache a BitSet by attaching to IndexReader.addCloseListener,
> using the getCoreCacheKey()
>
> But, I find that getCoreCacheKey() returns the IndexReader object itself as
> the key.
>
> Whenever the IndexReader re-opens via NRT because of deletes, will it mean
> that my cache will be purged, because a new IndexReader is opened?
>
> Are there ways to avoid this purging?
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org