You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2009/04/09 14:03:13 UTC

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697485#action_12697485 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}
I'd like to see the new FieldCache API de-emphasize "get me a single array holding all values for all docs in the index" for a MultiReader. That invocation is exceptionally costly in the context of reopened readers, and providing the illusion that one can simply get this array is dangerous. It's a "leaky API", like how virtual memory API pretends you can use more memory than is physically available.

I think it's OK to return an array-of-arrays (ie, one contiguous array per underlying segment); if the app really wants to make a massive array & concatenate it, they can do so outside of the FieldCache API. 
{quote}

Is there much difference in one massive array or an array of arrays? Its just as much space and just as dangerous, right? Some apps will need random access to the field cache for any given document right? Don't we always have to support that in some way, and won't it always be severely limited by RAM (until IO is as fast)?

I like the idea of an iterator API, but it seems we will still have to provide random access with all its problems, right?

{quote}
We should also set this API up as much as possible for LUCENE-1231. Ie, the current "un-invert the field" approach that FieldCache takes is merely one source of values per doc. Column stride fields in the future will be a different (faster) source of values, that should be able to "just plug in" under the hood somehow to this same exposure API.
{quote}

Definitely.

{quote}
On Uwe's suggestion for some flexibility on how the un-inversion takes place, I think allowing differing degrees of extension makes sense. EG we already allow you to provide a custom parser. We need to allow control on whether a given value replaces the already-seen value (LUCENE-1372), or whether to stop the looping early (Uwe's needs for improving Trie). We should also allow outright entire custom class that creates the value array.
{quote}

Allow a custom field cache loader for each type?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org