You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael Busch <bu...@gmail.com> on 2008/04/18 11:32:22 UTC

Re: Does LUCENE-831) "Complete overhaul of FieldCache API" provide fieldcache offloading to disk?

Chris Hostetter wrote:
> : But then the FieldCache is just starting to feel alot like column-stride
> : fields
> : (LUCENE-1231).
> 
> that's what i've been thinking ... my goal with LUCENE-831 was to make it 
> easier to manage FieldCache and hopefully the norms[] as well particularly 
> in the case of reopen ... but with column-stride fields the need for both 
> of those might go away completely)
>

(moved to java-dev, java-user cc'd)

My goal is it not to get rid of the FieldCache by adding column-stride 
fields (CSF), but instead to make them the default source for the 
FieldCache.

We should introduce an interface, named maybe FieldValueSource, that the 
new FieldCache implements, and also the CSF API. That has some advantages:
- Norms can be stored as CSF, and can be accessed using the 
FieldValueSource API. Then we can easily add an option to IndexReader 
whether to cache norms in memory (i. e. the new FieldCache) or not. When 
users have huge indexes on 32bit machines, where the norms would consume 
too much memory, they can disable caching them, of course search 
performance will suffer (but that's better than OutOfMemoryErrors)
- The function queries can use the FieldValueSource interface to 
retrieve the values (allowing us to get rid of function/ValueSource).
- Any consumer of the FieldValueSource does not have to care about 
whether or not values are cached and how. If performance is too slow and 
memory permits, caching can be enabled very easily.
- We will still support loading the fieldcache from the dictionary for 
backwards compatibility, but we should think about deprecating this and 
eventually get rid of it. We probably shouldn't add an implementation of 
FieldValueSource that reads from the dictionary, because performance 
would be terrible in the non-cached mode.

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Does LUCENE-831) "Complete overhaul of FieldCache API" provide fieldcache offloading to disk?

Posted by Michael Busch <bu...@gmail.com>.
Michael McCandless wrote:
> 
> OK so in this approach, a CSF is an "on disk" format, while the 
> FieldCache represents loading all (or maybe eventually subsets as 
> controlled by a cache policy) into a memory cache.  And since they both 
> implement FSV you can swap either in when you need it.
> 

Yes, exactly.

-Michael


> This sounds good!
> 
> Mike
> 
> Michael Busch wrote:
>> Chris Hostetter wrote:
>>> : But then the FieldCache is just starting to feel alot like 
>>> column-stride
>>> : fields
>>> : (LUCENE-1231).
>>> that's what i've been thinking ... my goal with LUCENE-831 was to 
>>> make it easier to manage FieldCache and hopefully the norms[] as well 
>>> particularly in the case of reopen ... but with column-stride fields 
>>> the need for both of those might go away completely)
>>>
>>
>> (moved to java-dev, java-user cc'd)
>>
>> My goal is it not to get rid of the FieldCache by adding column-stride 
>> fields (CSF), but instead to make them the default source for the 
>> FieldCache.
>>
>> We should introduce an interface, named maybe FieldValueSource, that 
>> the new FieldCache implements, and also the CSF API. That has some 
>> advantages:
>> - Norms can be stored as CSF, and can be accessed using the 
>> FieldValueSource API. Then we can easily add an option to IndexReader 
>> whether to cache norms in memory (i. e. the new FieldCache) or not. 
>> When users have huge indexes on 32bit machines, where the norms would 
>> consume too much memory, they can disable caching them, of course 
>> search performance will suffer (but that's better than OutOfMemoryErrors)
>> - The function queries can use the FieldValueSource interface to 
>> retrieve the values (allowing us to get rid of function/ValueSource).
>> - Any consumer of the FieldValueSource does not have to care about 
>> whether or not values are cached and how. If performance is too slow 
>> and memory permits, caching can be enabled very easily.
>> - We will still support loading the fieldcache from the dictionary for 
>> backwards compatibility, but we should think about deprecating this 
>> and eventually get rid of it. We probably shouldn't add an 
>> implementation of FieldValueSource that reads from the dictionary, 
>> because performance would be terrible in the non-cached mode.
>>
>> -Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Does LUCENE-831) "Complete overhaul of FieldCache API" provide fieldcache offloading to disk?

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK so in this approach, a CSF is an "on disk" format, while the  
FieldCache represents loading all (or maybe eventually subsets as  
controlled by a cache policy) into a memory cache.  And since they  
both implement FSV you can swap either in when you need it.

This sounds good!

Mike

Michael Busch wrote:
> Chris Hostetter wrote:
>> : But then the FieldCache is just starting to feel alot like  
>> column-stride
>> : fields
>> : (LUCENE-1231).
>> that's what i've been thinking ... my goal with LUCENE-831 was to  
>> make it easier to manage FieldCache and hopefully the norms[] as  
>> well particularly in the case of reopen ... but with column-stride  
>> fields the need for both of those might go away completely)
>>
>
> (moved to java-dev, java-user cc'd)
>
> My goal is it not to get rid of the FieldCache by adding column- 
> stride fields (CSF), but instead to make them the default source  
> for the FieldCache.
>
> We should introduce an interface, named maybe FieldValueSource,  
> that the new FieldCache implements, and also the CSF API. That has  
> some advantages:
> - Norms can be stored as CSF, and can be accessed using the  
> FieldValueSource API. Then we can easily add an option to  
> IndexReader whether to cache norms in memory (i. e. the new  
> FieldCache) or not. When users have huge indexes on 32bit machines,  
> where the norms would consume too much memory, they can disable  
> caching them, of course search performance will suffer (but that's  
> better than OutOfMemoryErrors)
> - The function queries can use the FieldValueSource interface to  
> retrieve the values (allowing us to get rid of function/ValueSource).
> - Any consumer of the FieldValueSource does not have to care about  
> whether or not values are cached and how. If performance is too  
> slow and memory permits, caching can be enabled very easily.
> - We will still support loading the fieldcache from the dictionary  
> for backwards compatibility, but we should think about deprecating  
> this and eventually get rid of it. We probably shouldn't add an  
> implementation of FieldValueSource that reads from the dictionary,  
> because performance would be terrible in the non-cached mode.
>
> -Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org