You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Shay Banon <ki...@gmail.com> on 2010/05/17 23:00:36 UTC

NRT and Caching based on IndexReader

Hi,

   I wanted to verify if my understanding is correct. Assuming that I use
NRT, and refresh, say, every 1 second, caching based on IndexReader, such is
what is used in the CachingWrapperFilter is basically useless, since, even
if there is an open sub reader, it gets clones meaning there is a new
instance for it, meaning the cache will explode with readers (it will get
cleaned, but still, not as good as usual open reader, commit writer, reopen
reader).

cheers,
shay.banon

Re: NRT and Caching based on IndexReader

Posted by Shay Banon <ki...@gmail.com>.
Right, make sense.

On Tue, May 18, 2010 at 4:23 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Mon, May 17, 2010 at 9:14 PM, Shay Banon <ki...@gmail.com> wrote:
> > Oh, and one more thing. Deleted docs is a sub case, with NRT, most people
> > will almost always add docs as well... . So it is still not really usable
> > for field cache, right?
>
> FieldCache should be fine for the general cases - the same entry will
> be used if the segment hasn't changed at all, or if the segment has
> only changed which documents are deleted.  Adding new documents adds
> new segments and does affect (until merge) existing segments, so the
> entries will be reused.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> > On Tue, May 18, 2010 at 4:12 AM, Shay Banon <ki...@gmail.com> wrote:
> >
> >> Just saw that you opened a case for that. I think that its important in
> >> your test case to also test for object identity, not just equals. This
> is
> >> because the IndexReader (or the FieldCacheKey) are used as keys in weak
> hash
> >> maps, which uses identity (==) equality for keys.
> >>
> >> If FieldCacheKey is supposed to represent the key by which index readers
> >> should be tested for "equality" (for example, it will be used in the
> >> CachingWrapperFilter), and not the index reader itself, then I think it
> >> should be renamed. What do you think? I am just looking now at what it
> does,
> >> its new...
> >>
> >> -shay.banon
> >>
> >>
> >> On Tue, May 18, 2010 at 4:04 AM, Yonik Seeley <
> yonik@lucidimagination.com>wrote:
> >>
> >>> On Mon, May 17, 2010 at 9:00 PM, Shay Banon <ki...@gmail.com> wrote:
> >>> > Great, so I am not imagining things this late into the night ... ;),
> not
> >>> so
> >>> > great, since using NRT with field cache (like sorting) or caching
> >>> filters,
> >>> > or anything that caches based on IndexReader not really an option.
> This
> >>> > makes NRT very problematic to use in a real application.
> >>>
> >>> NRT is still pretty new :-)  And I do believe this is a bug, so we'll
> >>> get it fixed.
> >>> It's not actually a problem for FieldCache though - it no longer keys
> >>> on the reader directly (if deleted docs are the only things that have
> >>> changed, the FieldCache entry can still be shared).
> >>>
> >>> -Yonik
> >>> http://www.lucidimagination.com
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT and Caching based on IndexReader

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, May 17, 2010 at 9:14 PM, Shay Banon <ki...@gmail.com> wrote:
> Oh, and one more thing. Deleted docs is a sub case, with NRT, most people
> will almost always add docs as well... . So it is still not really usable
> for field cache, right?

FieldCache should be fine for the general cases - the same entry will
be used if the segment hasn't changed at all, or if the segment has
only changed which documents are deleted.  Adding new documents adds
new segments and does affect (until merge) existing segments, so the
entries will be reused.

-Yonik
http://www.lucidimagination.com


> On Tue, May 18, 2010 at 4:12 AM, Shay Banon <ki...@gmail.com> wrote:
>
>> Just saw that you opened a case for that. I think that its important in
>> your test case to also test for object identity, not just equals. This is
>> because the IndexReader (or the FieldCacheKey) are used as keys in weak hash
>> maps, which uses identity (==) equality for keys.
>>
>> If FieldCacheKey is supposed to represent the key by which index readers
>> should be tested for "equality" (for example, it will be used in the
>> CachingWrapperFilter), and not the index reader itself, then I think it
>> should be renamed. What do you think? I am just looking now at what it does,
>> its new...
>>
>> -shay.banon
>>
>>
>> On Tue, May 18, 2010 at 4:04 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>>
>>> On Mon, May 17, 2010 at 9:00 PM, Shay Banon <ki...@gmail.com> wrote:
>>> > Great, so I am not imagining things this late into the night ... ;), not
>>> so
>>> > great, since using NRT with field cache (like sorting) or caching
>>> filters,
>>> > or anything that caches based on IndexReader not really an option. This
>>> > makes NRT very problematic to use in a real application.
>>>
>>> NRT is still pretty new :-)  And I do believe this is a bug, so we'll
>>> get it fixed.
>>> It's not actually a problem for FieldCache though - it no longer keys
>>> on the reader directly (if deleted docs are the only things that have
>>> changed, the FieldCache entry can still be shared).
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT and Caching based on IndexReader

Posted by Shay Banon <ki...@gmail.com>.
Oh, and one more thing. Deleted docs is a sub case, with NRT, most people
will almost always add docs as well... . So it is still not really usable
for field cache, right?

On Tue, May 18, 2010 at 4:12 AM, Shay Banon <ki...@gmail.com> wrote:

> Just saw that you opened a case for that. I think that its important in
> your test case to also test for object identity, not just equals. This is
> because the IndexReader (or the FieldCacheKey) are used as keys in weak hash
> maps, which uses identity (==) equality for keys.
>
> If FieldCacheKey is supposed to represent the key by which index readers
> should be tested for "equality" (for example, it will be used in the
> CachingWrapperFilter), and not the index reader itself, then I think it
> should be renamed. What do you think? I am just looking now at what it does,
> its new...
>
> -shay.banon
>
>
> On Tue, May 18, 2010 at 4:04 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>
>> On Mon, May 17, 2010 at 9:00 PM, Shay Banon <ki...@gmail.com> wrote:
>> > Great, so I am not imagining things this late into the night ... ;), not
>> so
>> > great, since using NRT with field cache (like sorting) or caching
>> filters,
>> > or anything that caches based on IndexReader not really an option. This
>> > makes NRT very problematic to use in a real application.
>>
>> NRT is still pretty new :-)  And I do believe this is a bug, so we'll
>> get it fixed.
>> It's not actually a problem for FieldCache though - it no longer keys
>> on the reader directly (if deleted docs are the only things that have
>> changed, the FieldCache entry can still be shared).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: NRT and Caching based on IndexReader

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, May 17, 2010 at 9:12 PM, Shay Banon <ki...@gmail.com> wrote:
> Just saw that you opened a case for that. I think that its important in your
> test case to also test for object identity, not just equals. This is because
> the IndexReader (or the FieldCacheKey) are used as keys in weak hash maps,
> which uses identity (==) equality for keys.

Yeah, just me being lazy... I just knew that those objects don't
implement equals and hence it ends up the same as ==.  But I agree an
explicit == would be better.

> If FieldCacheKey is supposed to represent the key by which index readers
> should be tested for "equality" (for example, it will be used in the
> CachingWrapperFilter), and not the index reader itself, then I think it
> should be renamed. What do you think? I am just looking now at what it does,
> its new...

I don't think it's general purpose, since it ignores things like a
change in deleted documents.  I think we should use the same reader
when the segment has not been changed.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT and Caching based on IndexReader

Posted by Shay Banon <ki...@gmail.com>.
Just saw that you opened a case for that. I think that its important in your
test case to also test for object identity, not just equals. This is because
the IndexReader (or the FieldCacheKey) are used as keys in weak hash maps,
which uses identity (==) equality for keys.

If FieldCacheKey is supposed to represent the key by which index readers
should be tested for "equality" (for example, it will be used in the
CachingWrapperFilter), and not the index reader itself, then I think it
should be renamed. What do you think? I am just looking now at what it does,
its new...

-shay.banon

On Tue, May 18, 2010 at 4:04 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Mon, May 17, 2010 at 9:00 PM, Shay Banon <ki...@gmail.com> wrote:
> > Great, so I am not imagining things this late into the night ... ;), not
> so
> > great, since using NRT with field cache (like sorting) or caching
> filters,
> > or anything that caches based on IndexReader not really an option. This
> > makes NRT very problematic to use in a real application.
>
> NRT is still pretty new :-)  And I do believe this is a bug, so we'll
> get it fixed.
> It's not actually a problem for FieldCache though - it no longer keys
> on the reader directly (if deleted docs are the only things that have
> changed, the FieldCache entry can still be shared).
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT and Caching based on IndexReader

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, May 17, 2010 at 9:00 PM, Shay Banon <ki...@gmail.com> wrote:
> Great, so I am not imagining things this late into the night ... ;), not so
> great, since using NRT with field cache (like sorting) or caching filters,
> or anything that caches based on IndexReader not really an option. This
> makes NRT very problematic to use in a real application.

NRT is still pretty new :-)  And I do believe this is a bug, so we'll
get it fixed.
It's not actually a problem for FieldCache though - it no longer keys
on the reader directly (if deleted docs are the only things that have
changed, the FieldCache entry can still be shared).

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT and Caching based on IndexReader

Posted by Shay Banon <ki...@gmail.com>.
Great, so I am not imagining things this late into the night ... ;), not so
great, since using NRT with field cache (like sorting) or caching filters,
or anything that caches based on IndexReader not really an option. This
makes NRT very problematic to use in a real application.

-shay.banon

On Tue, May 18, 2010 at 3:44 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> Yep, confirmed what you are seeing.  I'll check into it and open an issue.
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, May 17, 2010 at 5:54 PM, Shay Banon <ki...@gmail.com> wrote:
> > Yea, I noticed that ;). Even so, I think that with NRT, even the lower
> level
> > readers are cloned, meaning that you always get a new instance... . Here
> is
> > a sample program that tests this behavior, am I doing something wrong? By
> > the way, if what I say is correct, it affects field cache as well....
> >
> > public static void main(String[] args) throws Exception {
> >    Directory dir = new RAMDirectory();
> >    IndexWriter indexWriter = new IndexWriter(dir, new
> > StandardAnalyzer(Version.LUCENE_CURRENT), true,
> > IndexWriter.MaxFieldLength.UNLIMITED);
> >    IndexReader reader = indexWriter.getReader();
> >
> >    Set<IndexReader> readers = new HashSet<IndexReader>(); // tracks all
> > readers
> >    for (int i = 0; i < 100; i++) {
> >        readers.add(reader);
> >        Document doc = new Document();
> >        doc.add(new Field("id", Integer.toString(i), Field.Store.YES,
> > Field.Index.NO));
> >        indexWriter.addDocument(doc);
> >
> >        IndexReader newReader = reader.reopen(true);
> >        if (reader == newReader) {
> >            System.err.println("Should not get the same reader...");
> >        } else {
> >            reader.close();
> >            reader = newReader;
> >        }
> >    }
> >
> >    reader.close();
> >
> >    // now, go and check that all are ref == 0
> >    // and, that all readers, even sub readers, are unique instances
> > (sadly...)
> >    Set<IndexReader> allReaders = new HashSet<IndexReader>();
> >    for (IndexReader reader1 : readers) {
> >        if (reader1.getRefCount() != 0) {
> >            System.err.println("A reader is not closed");
> >        }
> >        if (allReaders.contains(reader1)) {
> >            System.err.println("Found an existing reader...");
> >        }
> >        allReaders.add(reader1);
> >        if (reader1.getSequentialSubReaders() != null) {
> >            for (IndexReader reader2 : reader1.getSequentialSubReaders())
> {
> >                if (reader2.getRefCount() != 0) {
> >                    System.err.println("A reader is not closed...");
> >                }
> >                if (allReaders.contains(reader2)) {
> >                    System.err.println("Found an existing reader...");
> >                }
> >                allReaders.add(reader2);
> >
> >                // there should not be additional readers...
> >                if (reader2.getSequentialSubReaders() != null) {
> >                    System.err.println("Should not be more readers...");
> >                }
> >            }
> >        }
> >    }
> >
> >    indexWriter.close();
> > }
> >
> >
> >
> >
> > On Tue, May 18, 2010 at 12:30 AM, Yonik Seeley
> > <yo...@lucidimagination.com>wrote:
> >
> >> On Mon, May 17, 2010 at 5:00 PM, Shay Banon <ki...@gmail.com> wrote:
> >> >   I wanted to verify if my understanding is correct. Assuming that I
> use
> >> > NRT, and refresh, say, every 1 second, caching based on IndexReader,
> such
> >> is
> >> > what is used in the CachingWrapperFilter is basically useless
> >>
> >> No, it's fine.  Searching in Lucene is now done per-segment, and so
> >> the readers that are passed to Filter.getDocIdSet are the segment
> >> readers, not the top-level readers.  Caching is now per-segment.
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT and Caching based on IndexReader

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Yep, confirmed what you are seeing.  I'll check into it and open an issue.

-Yonik
http://www.lucidimagination.com

On Mon, May 17, 2010 at 5:54 PM, Shay Banon <ki...@gmail.com> wrote:
> Yea, I noticed that ;). Even so, I think that with NRT, even the lower level
> readers are cloned, meaning that you always get a new instance... . Here is
> a sample program that tests this behavior, am I doing something wrong? By
> the way, if what I say is correct, it affects field cache as well....
>
> public static void main(String[] args) throws Exception {
>    Directory dir = new RAMDirectory();
>    IndexWriter indexWriter = new IndexWriter(dir, new
> StandardAnalyzer(Version.LUCENE_CURRENT), true,
> IndexWriter.MaxFieldLength.UNLIMITED);
>    IndexReader reader = indexWriter.getReader();
>
>    Set<IndexReader> readers = new HashSet<IndexReader>(); // tracks all
> readers
>    for (int i = 0; i < 100; i++) {
>        readers.add(reader);
>        Document doc = new Document();
>        doc.add(new Field("id", Integer.toString(i), Field.Store.YES,
> Field.Index.NO));
>        indexWriter.addDocument(doc);
>
>        IndexReader newReader = reader.reopen(true);
>        if (reader == newReader) {
>            System.err.println("Should not get the same reader...");
>        } else {
>            reader.close();
>            reader = newReader;
>        }
>    }
>
>    reader.close();
>
>    // now, go and check that all are ref == 0
>    // and, that all readers, even sub readers, are unique instances
> (sadly...)
>    Set<IndexReader> allReaders = new HashSet<IndexReader>();
>    for (IndexReader reader1 : readers) {
>        if (reader1.getRefCount() != 0) {
>            System.err.println("A reader is not closed");
>        }
>        if (allReaders.contains(reader1)) {
>            System.err.println("Found an existing reader...");
>        }
>        allReaders.add(reader1);
>        if (reader1.getSequentialSubReaders() != null) {
>            for (IndexReader reader2 : reader1.getSequentialSubReaders()) {
>                if (reader2.getRefCount() != 0) {
>                    System.err.println("A reader is not closed...");
>                }
>                if (allReaders.contains(reader2)) {
>                    System.err.println("Found an existing reader...");
>                }
>                allReaders.add(reader2);
>
>                // there should not be additional readers...
>                if (reader2.getSequentialSubReaders() != null) {
>                    System.err.println("Should not be more readers...");
>                }
>            }
>        }
>    }
>
>    indexWriter.close();
> }
>
>
>
>
> On Tue, May 18, 2010 at 12:30 AM, Yonik Seeley
> <yo...@lucidimagination.com>wrote:
>
>> On Mon, May 17, 2010 at 5:00 PM, Shay Banon <ki...@gmail.com> wrote:
>> >   I wanted to verify if my understanding is correct. Assuming that I use
>> > NRT, and refresh, say, every 1 second, caching based on IndexReader, such
>> is
>> > what is used in the CachingWrapperFilter is basically useless
>>
>> No, it's fine.  Searching in Lucene is now done per-segment, and so
>> the readers that are passed to Filter.getDocIdSet are the segment
>> readers, not the top-level readers.  Caching is now per-segment.
>>
>> -Yonik
>> http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NRT and Caching based on IndexReader

Posted by Shay Banon <ki...@gmail.com>.
Yea, I noticed that ;). Even so, I think that with NRT, even the lower level
readers are cloned, meaning that you always get a new instance... . Here is
a sample program that tests this behavior, am I doing something wrong? By
the way, if what I say is correct, it affects field cache as well....

public static void main(String[] args) throws Exception {
    Directory dir = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter(dir, new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.UNLIMITED);
    IndexReader reader = indexWriter.getReader();

    Set<IndexReader> readers = new HashSet<IndexReader>(); // tracks all
readers
    for (int i = 0; i < 100; i++) {
        readers.add(reader);
        Document doc = new Document();
        doc.add(new Field("id", Integer.toString(i), Field.Store.YES,
Field.Index.NO));
        indexWriter.addDocument(doc);

        IndexReader newReader = reader.reopen(true);
        if (reader == newReader) {
            System.err.println("Should not get the same reader...");
        } else {
            reader.close();
            reader = newReader;
        }
    }

    reader.close();

    // now, go and check that all are ref == 0
    // and, that all readers, even sub readers, are unique instances
(sadly...)
    Set<IndexReader> allReaders = new HashSet<IndexReader>();
    for (IndexReader reader1 : readers) {
        if (reader1.getRefCount() != 0) {
            System.err.println("A reader is not closed");
        }
        if (allReaders.contains(reader1)) {
            System.err.println("Found an existing reader...");
        }
        allReaders.add(reader1);
        if (reader1.getSequentialSubReaders() != null) {
            for (IndexReader reader2 : reader1.getSequentialSubReaders()) {
                if (reader2.getRefCount() != 0) {
                    System.err.println("A reader is not closed...");
                }
                if (allReaders.contains(reader2)) {
                    System.err.println("Found an existing reader...");
                }
                allReaders.add(reader2);

                // there should not be additional readers...
                if (reader2.getSequentialSubReaders() != null) {
                    System.err.println("Should not be more readers...");
                }
            }
        }
    }

    indexWriter.close();
}




On Tue, May 18, 2010 at 12:30 AM, Yonik Seeley
<yo...@lucidimagination.com>wrote:

> On Mon, May 17, 2010 at 5:00 PM, Shay Banon <ki...@gmail.com> wrote:
> >   I wanted to verify if my understanding is correct. Assuming that I use
> > NRT, and refresh, say, every 1 second, caching based on IndexReader, such
> is
> > what is used in the CachingWrapperFilter is basically useless
>
> No, it's fine.  Searching in Lucene is now done per-segment, and so
> the readers that are passed to Filter.getDocIdSet are the segment
> readers, not the top-level readers.  Caching is now per-segment.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT and Caching based on IndexReader

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, May 17, 2010 at 5:00 PM, Shay Banon <ki...@gmail.com> wrote:
>   I wanted to verify if my understanding is correct. Assuming that I use
> NRT, and refresh, say, every 1 second, caching based on IndexReader, such is
> what is used in the CachingWrapperFilter is basically useless

No, it's fine.  Searching in Lucene is now done per-segment, and so
the readers that are passed to Filter.getDocIdSet are the segment
readers, not the top-level readers.  Caching is now per-segment.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org