You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/08/22 19:13:02 UTC

Separate issue for appendable field caches / doc values

LUCENE-2312 needs appendable field caches.  I can include this
functionality into LUCENE-2312, or separate it out into a separate
issue / patch.

However it would only be useful for RT / LUCENE-2312.  Also, I'm not
sure how this functionality relates to doc values.  If we used doc
values, then we would not be able to port LUCENE-2312 to Lucene 3.x.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Jason Rutherglen <ja...@gmail.com>.

Also a FieldCache getCaches(Reader) method is needed.  I'm not sure
exactly what getCaches would return.  It's the entries?  So maybe it
would be called getEntries(Reader).  That way the DWPT can get the
existing field caches.  Oh, and also it would need to be able to add
an event listener that notifies when a new field cache [entry?] is
created for a given reader / key.  I think if this makes sense / is
workable then this can be a separate issue.

On Tue, Aug 23, 2011 at 4:00 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> I just examined the field cache code.  I don't think replacing FCs
> needs to be difficult.  Lets make the CachedArray values variable
> volatile.  values is already public.
>
> On Tue, Aug 23, 2011 at 3:02 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>> Oh duh right -- you should be able to alloc too-large arrays and only
>>> realloc once you run out.  So amortized cost is low but some reopens
>>> will take a hit to grow....
>>
>> This hit will be minimal, ie, less than what we're doing now with
>> cloned deleted docs bit vectors, which feasibly happen on each
>> getReader() call.  Growing the field cache will occur far less.
>>
>> Given there isn't a use case for the appendable field cache outside of
>> RT / LUCENE-2312.  I may bake it in, it's hard to extract, it's hard
>> to maintain two patches.  However the discussion was good.
>>
>>> DV is also provided by the IR (perDocValues method) so the RT reader
>>
>> Ok, it's not clear when / how DVs are used instead of field caches,
>> and why their access isn't merged together?
>>
>> On Tue, Aug 23, 2011 at 12:30 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen
>>> <ja...@gmail.com> wrote:
>>>>> But, we are trying to move FC under IR control (Martijn has a patch),
>>>>> at which point an RT IR could have its own appending impl...?
>>>>
>>>> LUCENE-3360?  It's placing the field cache into IR / SR.
>>>
>>> Yes!
>>>
>>>> The RAM
>>>> reader could return it's own impl where the underlying array can be
>>>> atomically replaced (when a larger sized array is needed).  I agree
>>>> that is logical.
>>>
>>> Good.
>>>
>>>>> But then... FC still returns fixed arrays so you can't append until we fix that?
>>>>
>>>> I don't think anything should depend on the size of the field cache
>>>> array.  If it does, it seems odd.  Are you preferring moving field
>>>> cache access to method calls?  What is the difference between that and
>>>> primitive array access?
>>>
>>> Oh duh right -- you should be able to alloc too-large arrays and only
>>> realloc once you run out.  So amortized cost is low but some reopens
>>> will take a hit to grow....
>>>
>>>> For now I will create an independent field cache implementation that
>>>> is appendable.  It will only be associate-able with the DWPT / RAM
>>>> reader.  Maybe somehow it can work with LUCENE-3360 and / or the
>>>> existing static FC access system.
>>>
>>> Sounds good.
>>>
>>>> Still not sure how doc values fits in.
>>>
>>> DV is also provided by the IR (perDocValues method) so the RT reader
>>> should be able to impl its own.  Each lookup is a method call so it
>>> should be easier to back that w/ a more RT friendly data structure...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Jason Rutherglen <ja...@gmail.com>.

I just examined the field cache code.  I don't think replacing FCs
needs to be difficult.  Lets make the CachedArray values variable
volatile.  values is already public.

On Tue, Aug 23, 2011 at 3:02 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
>> Oh duh right -- you should be able to alloc too-large arrays and only
>> realloc once you run out.  So amortized cost is low but some reopens
>> will take a hit to grow....
>
> This hit will be minimal, ie, less than what we're doing now with
> cloned deleted docs bit vectors, which feasibly happen on each
> getReader() call.  Growing the field cache will occur far less.
>
> Given there isn't a use case for the appendable field cache outside of
> RT / LUCENE-2312.  I may bake it in, it's hard to extract, it's hard
> to maintain two patches.  However the discussion was good.
>
>> DV is also provided by the IR (perDocValues method) so the RT reader
>
> Ok, it's not clear when / how DVs are used instead of field caches,
> and why their access isn't merged together?
>
> On Tue, Aug 23, 2011 at 12:30 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen
>> <ja...@gmail.com> wrote:
>>>> But, we are trying to move FC under IR control (Martijn has a patch),
>>>> at which point an RT IR could have its own appending impl...?
>>>
>>> LUCENE-3360?  It's placing the field cache into IR / SR.
>>
>> Yes!
>>
>>> The RAM
>>> reader could return it's own impl where the underlying array can be
>>> atomically replaced (when a larger sized array is needed).  I agree
>>> that is logical.
>>
>> Good.
>>
>>>> But then... FC still returns fixed arrays so you can't append until we fix that?
>>>
>>> I don't think anything should depend on the size of the field cache
>>> array.  If it does, it seems odd.  Are you preferring moving field
>>> cache access to method calls?  What is the difference between that and
>>> primitive array access?
>>
>> Oh duh right -- you should be able to alloc too-large arrays and only
>> realloc once you run out.  So amortized cost is low but some reopens
>> will take a hit to grow....
>>
>>> For now I will create an independent field cache implementation that
>>> is appendable.  It will only be associate-able with the DWPT / RAM
>>> reader.  Maybe somehow it can work with LUCENE-3360 and / or the
>>> existing static FC access system.
>>
>> Sounds good.
>>
>>> Still not sure how doc values fits in.
>>
>> DV is also provided by the IR (perDocValues method) so the RT reader
>> should be able to impl its own.  Each lookup is a method call so it
>> should be easier to back that w/ a more RT friendly data structure...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Jason Rutherglen <ja...@gmail.com>.

> Oh duh right -- you should be able to alloc too-large arrays and only
> realloc once you run out.  So amortized cost is low but some reopens
> will take a hit to grow....

This hit will be minimal, ie, less than what we're doing now with
cloned deleted docs bit vectors, which feasibly happen on each
getReader() call.  Growing the field cache will occur far less.

Given there isn't a use case for the appendable field cache outside of
RT / LUCENE-2312.  I may bake it in, it's hard to extract, it's hard
to maintain two patches.  However the discussion was good.

> DV is also provided by the IR (perDocValues method) so the RT reader

Ok, it's not clear when / how DVs are used instead of field caches,
and why their access isn't merged together?

On Tue, Aug 23, 2011 at 12:30 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>> But, we are trying to move FC under IR control (Martijn has a patch),
>>> at which point an RT IR could have its own appending impl...?
>>
>> LUCENE-3360?  It's placing the field cache into IR / SR.
>
> Yes!
>
>> The RAM
>> reader could return it's own impl where the underlying array can be
>> atomically replaced (when a larger sized array is needed).  I agree
>> that is logical.
>
> Good.
>
>>> But then... FC still returns fixed arrays so you can't append until we fix that?
>>
>> I don't think anything should depend on the size of the field cache
>> array.  If it does, it seems odd.  Are you preferring moving field
>> cache access to method calls?  What is the difference between that and
>> primitive array access?
>
> Oh duh right -- you should be able to alloc too-large arrays and only
> realloc once you run out.  So amortized cost is low but some reopens
> will take a hit to grow....
>
>> For now I will create an independent field cache implementation that
>> is appendable.  It will only be associate-able with the DWPT / RAM
>> reader.  Maybe somehow it can work with LUCENE-3360 and / or the
>> existing static FC access system.
>
> Sounds good.
>
>> Still not sure how doc values fits in.
>
> DV is also provided by the IR (perDocValues method) so the RT reader
> should be able to impl its own.  Each lookup is a method call so it
> should be easier to back that w/ a more RT friendly data structure...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen
<ja...@gmail.com> wrote:
>> But, we are trying to move FC under IR control (Martijn has a patch),
>> at which point an RT IR could have its own appending impl...?
>
> LUCENE-3360?  It's placing the field cache into IR / SR.

Yes!

> The RAM
> reader could return it's own impl where the underlying array can be
> atomically replaced (when a larger sized array is needed).  I agree
> that is logical.

Good.

>> But then... FC still returns fixed arrays so you can't append until we fix that?
>
> I don't think anything should depend on the size of the field cache
> array.  If it does, it seems odd.  Are you preferring moving field
> cache access to method calls?  What is the difference between that and
> primitive array access?

Oh duh right -- you should be able to alloc too-large arrays and only
realloc once you run out.  So amortized cost is low but some reopens
will take a hit to grow....

> For now I will create an independent field cache implementation that
> is appendable.  It will only be associate-able with the DWPT / RAM
> reader.  Maybe somehow it can work with LUCENE-3360 and / or the
> existing static FC access system.

Sounds good.

> Still not sure how doc values fits in.

DV is also provided by the IR (perDocValues method) so the RT reader
should be able to impl its own.  Each lookup is a method call so it
should be easier to back that w/ a more RT friendly data structure...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Jason Rutherglen <ja...@gmail.com>.

> But, we are trying to move FC under IR control (Martijn has a patch),
> at which point an RT IR could have its own appending impl...?

LUCENE-3360?  It's placing the field cache into IR / SR.  The RAM
reader could return it's own impl where the underlying array can be
atomically replaced (when a larger sized array is needed).  I agree
that is logical.

> But then... FC still returns fixed arrays so you can't append until we fix that?

I don't think anything should depend on the size of the field cache
array.  If it does, it seems odd.  Are you preferring moving field
cache access to method calls?  What is the difference between that and
primitive array access?

For now I will create an independent field cache implementation that
is appendable.  It will only be associate-able with the DWPT / RAM
reader.  Maybe somehow it can work with LUCENE-3360 and / or the
existing static FC access system.

Still not sure how doc values fits in.

On Mon, Aug 22, 2011 at 6:48 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Separate issue would make sense I think?
>
> But, we are trying to move FC under IR control (Martijn has a patch),
> at which point an RT IR could have its own appending impl...?
>
> But then... FC still returns fixed arrays so you can't append until we fix that?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Aug 22, 2011 at 1:13 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> LUCENE-2312 needs appendable field caches.  I can include this
>> functionality into LUCENE-2312, or separate it out into a separate
>> issue / patch.
>>
>> However it would only be useful for RT / LUCENE-2312.  Also, I'm not
>> sure how this functionality relates to doc values.  If we used doc
>> values, then we would not be able to port LUCENE-2312 to Lucene 3.x.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Separate issue for appendable field caches / doc values

Posted by Michael McCandless <lu...@mikemccandless.com>.

Separate issue would make sense I think?

But, we are trying to move FC under IR control (Martijn has a patch),
at which point an RT IR could have its own appending impl...?

But then... FC still returns fixed arrays so you can't append until we fix that?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 22, 2011 at 1:13 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> LUCENE-2312 needs appendable field caches.  I can include this
> functionality into LUCENE-2312, or separate it out into a separate
> issue / patch.
>
> However it would only be useful for RT / LUCENE-2312.  Also, I'm not
> sure how this functionality relates to doc values.  If we used doc
> values, then we would not be able to port LUCENE-2312 to Lucene 3.x.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org