You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2007/03/14 08:26:09 UTC

[jira] Created: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Complete overhaul of FieldCache API/Implementation
--------------------------------------------------

                 Key: LUCENE-831
                 URL: https://issues.apache.org/jira/browse/LUCENE-831
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: Hoss Man



Motivation:
1) Complete overhaul the API/implementation of "FieldCache" type things...
    a) eliminate global static map keyed on IndexReader (thus
        eliminating synch block between completley independent IndexReaders)
    b) allow more customization of cache management (ie: use 
        expiration/replacement strategies, disk backed caches, etc)
    c) allow people to define custom cache data logic (ie: custom
        parsers, complex datatypes, etc... anything tied to a reader)
    d) allow people to inspect what's in a cache (list of CacheKeys) for
        an IndexReader so a new IndexReader can be likewise warmed. 
    e) Lend support for smarter cache management if/when
        IndexReader.reopen is added (merging of cached data from subReaders).
2) Provide backwards compatibility to support existing FieldCache API with
    the new implementation, so there is no redundent caching as client code
    migrades to new API.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674255#action_12674255 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Are there still things planned for this issue now that LUCENE-1483 has been committed? 

Good question... I think it'd still be nice to 1) have the IndexReader
expose the API for accessing the FieldCache values, 2) allow for
customization of the caching policy.

Though maybe we should hold off on those changes until we do
LUCENE-1231 (column stride fields), which I think would use exactly
the same API with the only difference being whether under-the-hood
there was a more efficient (column-stride storage) representation for
the field values vs the slower uninvert & resort (for StringIndex)
approach that FieldCache does today.

Also, in the new API I'd like to make it not-so-easy to materialize
the full array.  I think it's OK to ask for the full array of a
sub-reader, but if you want to access @ the MultiReader level, we
should encourage either random access getX(int docID), iteration or
get-sub-arrays and append yourself.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698255#action_12698255 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I guess we do need to have Uninverters settable per run somehow ... Then if a Parser comes in on SortField, we can downcast to UninversionValueSource and pass the Uninverter wrapping the Parser. I don't think we should pass the Uninverter on SortField though, going forward. Uninverter may not apply to ValueSource, and internally, things will work with ValueSource.

So a user cannot pass an Uninverter for internal sorting per sort, it has to init the UninversionValueSource with the right Uninverters, but for backcompat, FieldComparator would be able to pass an Uninverter that takes precedence?

That sounds somewhat alright to me, I'll roll that way a bit for now.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.03.30.2008.diff

Another push forward:

 - Comparators are cached again (left it out before to think about).
 - Lots of new JavaDoc
 - Naming, usage refinements
 - Still doesn't touch the norms.
 - Cleanup, fixup, finishup, type stuff.
 - Deprecated some *function* package methods that used FieldCache, but still need alternatives.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654417#action_12654417 ] 

thetaphi edited comment on LUCENE-831 at 12/8/08 6:04 AM:
---------------------------------------------------------------

{quote}This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc....{quote}

Maybe this new Document class could also manage the encoding of these fields to the index format. With that it would be possible to extend Document, to automatically use my trie-based encoding for storing the raw term values. On the otrher hand RangeQuery would be aware of the field encoding (from field metadata) and can switch dynamically to the correct search/sort algorithm. Great!


      was (Author: thetaphi):
    {quote}This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc....{quote}

Maybe this document could also manage the encoding of these fields to the index format. With that it would be possible to extend Docuemnt, to automatically use my trie-based encoding for storing the raw term values. On the otrher hand RangeQuery would be aware of the field encoding and can switch dynamically to the correct search/sort algorithm. Great!

  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700586#action_12700586 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. but maybe the new ValueSource could provide both ways to access

Yeah, this goes with with what Mike pointed out above - we can return arrays, objects, or anything and your grandmother. My main worry with that idea is the ValueSource API - it could have 10's of accessors, but only 1 or 2 are generally implemented and you have to know the right one to call - it could work of course, but on first thought, its fairly ugly. You could make a fair point that we are already a ways down that path with the design we already have I guess though.

bq. So, maybe there is a possibility to merge both approaches and only provide one ValueSource supplying both access strategies. 

Its a good point. Something makes me think we will still be a bit hindered by back compat with deletes, norms though.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698247#action_12698247 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

{quote}
> Or, we could turn the check off entirely (which I think is better than intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only going to trip it depending...
{quote}
OK -- let's turn this check off entirely.

I like Marvin's approach but we don't quite have a FieldType just yet...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697830#action_12697830 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq.  A way to say, use this builder (cachekey) for LONG, use this one for INT, etc. When we get CSF, you would set it to use cachekeys that built arrays from that data.

That sounds right, though it'd presumably be field dependent rather than relying on only the native type?  Ie I may have 3 fields that should load long[]'s, but each has its own custom decoding to be done.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654820#action_12654820 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

Marvin, does KS/Lucy have something like FieldCache?  If so, what API do you use?  Is it iterator-only?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700430#action_12700430 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms & deleted docs.
{quote}
On the other hand, maybe, we shouldn't?
Deleted docs should definetly be mutable, but that's it.
Anybody is updating norms on a regular basis on a serious project? But still everyone pays for the feature with running ugly synchronization code for norms. Let's dump it!
As for mutable fields, okay, users of Sphinx have them. They use them mostly.. hehehe.. for implementing deletions that Sphinx lacks. I bet there could exist some other usecases, but they can be handled with a custom ValueSource without the need to bring it into API everyone must implement.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
"deleted docs" one wanted.
{quote}
That's why I think this is a misfeature. Deleted docs have different meaning from field values. They can be updated, and they should be checked against uberfast.
Swapping in another impl is cool, while forcing everyone and his dog under the same usage API is not so cool.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699435#action_12699435 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

bq. Patch is not done, but all tests now pass except for TrieRange (have not run back compat tests yet due to Trie failure - havnt looked into yet either).

The TrieRange tests do not pass, because the FieldCache.StopFillCacheException is not handled in the uninverter code (part of LUCENE-1582, FieldCache.StopFillCacheException). When Trie gets switched to an own ValueSource, StopFillCacheException can be removed (you can do this in this patch, it was just a temporary hack).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649191#action_12649191 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

You've tried the patch? Awesome!

How long did it take to fill the cache before the patch?

I agree that we should be as friendly to warming as we can be. Keep in 
mind that you can warm the reader yourself by issuing a sorted search 
before putting the reader into duty - of course you don't get to warm 
from RAM like with what you suggest.

Keep that feedback coming. I've been building momentum on coming back to 
this issue, unless a commiter beats me to it.




> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700583#action_12700583 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. >>I was also thinking that some of these issues could force back up to multi-reader support though. 

bq.>Hopefully not...

Yes, I don't know enough yet to know for sure. My thought was things like norms and deletes that are available from multireader now will have to either still be, or straddle multi/segment for a while. I guess that doesnt become much of an issue if we go with the same method of just don't load from both single and multi or you will double your reqs? It just gets ugly trying to prevent multireader use with valuesource, but then have to support it due to all the back compat reqs.

bq. We can (and I think should) do FieldType without forcing a fixed schema.

Fair enough, fair enough. I wasn't really taking this completely from this discussion, but from a variety of ideas about fields that have been spilling out on the list. Of course we can still get a lot better (easier) without hitting fixed.

bq. For tags we'd presumably want multi-valued fields handled in ValueSource, plus updatability, plus NRT.

Well I'm glad its a small order. Yonik did do some multi value faceting work that I never really looked at. I'll go dig it up.

It may just be best if this sits for a while and we see what happens with a couple other issues floating around it. I said I had sweat to pump into this, not intelligence ;) If we hit all this stuff (and yes, your not saying we need to or should, but) this ends up touching most things in IndexReader, than possibly writing and merging and what not in IndexWriter (pluggable norms etc still need to be written, merged, loaded, etc), and ... 



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Okay, all fixed up and updated to trunk. All tests pass. Still some rough edges for sure. Still probably going to revert FieldCache to its former self (though internally, the API won't be used).

Back compat test run hasnt finished yet.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655988#action_12655988 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


{quote}
> At present, KS only caches the docID -> ord map as an array. It builds that
> array by iterating over the terms in the sort field's Lexicon and mapping the
> docIDs from each term's posting list.
{quote}

OK, that corresponds to the "order" array in Lucene's
FieldCache.StringIndex class.

{quote}
> Building the docID -> ord array is straightforward for a single-segment
> SegLexicon. The multi-segment case requires that several SegLexicons be
> collated using a priority queue. In KS, there's a MultiLexicon class which
> handles this; I don't believe that Lucene has an analogous class.
{quote}

Lucene achieves the same functionality by using a MultiReader to read
the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
uses a pqueue under the hood) and building up StringIndex from that.
It's very costly.

{quote}
> Relying on the docID -> ord array alone works quite well until you get to the
> MultiSearcher case. As you know, at that point you need to be able to
> retrieve the actual field values from the ordinal numbers, so that you can
> compare across multiple searchers (since the ordinal values are meaningless).
{quote}

Right, and we are trying to move towards pushing searcher down to the
segment.  Then we can use the per-segment ords for within-segment
collection, and then the real values for merging the separate pqueues
at the end (but, initial results from LUCENE-1483 show that collecting
N queues then merging in the end adds ~20% slowdown for N = 100
segments).

{quote}
> Lex_Seek_By_Num(lexicon, term_num);
> field_val = Lex_Get_Term(lexicon);
> 
> The problem is that seeking by ordinal value on a MultiLexicon iterator
> requires a gnarly implementation and is very expensive. I got it working, but
> I consider it a dead-end design and a failed experiment.
{quote}

OK.

{quote}
> The planned replacement for these iterator-based quasi-FieldCaches involves
> several topics of recent discussion:
> 
> 1) A "keyword" field type, implemented using a format similar to what Nate
> and I came up with for the lexicon index.
> 2) Write per-segment docID -> ord maps at index time for sort fields.
> 3) Memory mapping.
> 4) Segment-centric searching.
> 
> We'd mmap the pre-composed docID -> ord map and use it for intra-segment
> sorting. The keyword field type would be implemented in such a way that we'd
> be able to mmap a few files and get a per-segment field cache, which we'd then
> use to sort hits from multiple segments.
{quote}

OK so your "keyword" field type would expose random-access to field
values by docID, to be used to merge the N segments' pqueues into a
single final pqueue?

The alternative is to use iterator but pull the values into your
pqueues when they are inserted.  The benefit is iterator-only
exposure, but the downside is likely higher net cost of insertion.
And if the "assumption" is these fields can generally be ram resident
(explicitly or via mmap), then the net benefit of iterator-only API is
not high.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-831:
---------------------------------

      Description: 
Motivation:
1) Complete overhaul the API/implementation of "FieldCache" type things...
    a) eliminate global static map keyed on IndexReader (thus
        eliminating synch block between completley independent IndexReaders)
    b) allow more customization of cache management (ie: use 
        expiration/replacement strategies, disk backed caches, etc)
    c) allow people to define custom cache data logic (ie: custom
        parsers, complex datatypes, etc... anything tied to a reader)
    d) allow people to inspect what's in a cache (list of CacheKeys) for
        an IndexReader so a new IndexReader can be likewise warmed. 
    e) Lend support for smarter cache management if/when
        IndexReader.reopen is added (merging of cached data from subReaders).
2) Provide backwards compatibility to support existing FieldCache API with
    the new implementation, so there is no redundent caching as client code
    migrades to new API.



  was:

Motivation:
1) Complete overhaul the API/implementation of "FieldCache" type things...
    a) eliminate global static map keyed on IndexReader (thus
        eliminating synch block between completley independent IndexReaders)
    b) allow more customization of cache management (ie: use 
        expiration/replacement strategies, disk backed caches, etc)
    c) allow people to define custom cache data logic (ie: custom
        parsers, complex datatypes, etc... anything tied to a reader)
    d) allow people to inspect what's in a cache (list of CacheKeys) for
        an IndexReader so a new IndexReader can be likewise warmed. 
    e) Lend support for smarter cache management if/when
        IndexReader.reopen is added (merging of cached data from subReaders).
2) Provide backwards compatibility to support existing FieldCache API with
    the new implementation, so there is no redundent caching as client code
    migrades to new API.



    Fix Version/s: 2.4

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Tim Smith (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689629#action_12689629 ] 

Tim Smith commented on LUCENE-831:
----------------------------------

One requirement i would like to request is the ability to attach an arbitrary object to each Segment.
This will allow people using lucene to store any arbitrary per segment caches and statistics that their application requires (fully free form)

Would like to see the following:
* add SegmentReader.setCustomCacheManager(CacheManager m) // mabye add a string for a CacheManager id (to allow registration of multiple cache managers)
* add SegmentReader.getCustomCacheManager() // to allow accessing the manager

CacheManager should be a very light interface (just a close() method that is called when the SegmentReader is closed)



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581307#action_12581307 ] 

Hoss Man commented on LUCENE-831:
---------------------------------

Mark:

I haven't looked at this issue or any of the code in my patch since my last updated, nor have i had a chance to look at your updated patch, but a few comments...

1) you rock.

2) I have no idea what i had in mind for dealing with StringIndex when i said "a bit complicated, but should be possible/efficient".  I do distinctly remember thinking that we should add support for *just* getting the string indexes (ie: the current int[numDocs]) for when you don't care what the strings are, just the sort order or *just* getting a String[numDocs] when you aren't doing sorting and just want an "inverted inverted index" on the field ... but obviously we still need to support the curent StringIndex (it's used in MultiSearcher right?)


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700175#action_12700175 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Yes, I agree - will do.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Here is the patch - plenty to look over.

The method access might not be as much slower as I thought - might be closer to a couple to 5% than the 10% I first guessed. 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654417#action_12654417 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

{quote}This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc....{quote}

Maybe this document could also manage the encoding of these fields to the index format. With that it would be possible to extend Docuemnt, to automatically use my trie-based encoding for storing the raw term values. On the otrher hand RangeQuery would be aware of the field encoding and can switch dynamically to the correct search/sort algorithm. Great!


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625098#action_12625098 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. change norm caching to use new caches (if not the same
main cache, then at least the same classes in a private cache)


What benefit do you see to this? Does it offer anything over the simple map used now?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644786#action_12644786 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

This is missing a good way to manage the caching of the field values per cachekey (allowing for a more fine grained  weak or disk strategy for example). That type of stuff would have to be built into the cachkey currently - so to add a new strategy, youd have to implement a lot of new cachekeys and multiply your options to manage...for every int/long/byte/etc array and object array you would have to implement a new cachkey for a weakref solution. Is there an API that can overcome this?

The current cache that you can easily override, cachkey to data, is pretty course...I am not sure how many helpful things you can do easily.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698089#action_12698089 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Iterate iterate iterate I suppose.

Here here!  Ready, set, GO!

bq. We may need the multireader to do the full array for back compat.

Can't we just create the "make massive array & copy sub arrays in"
inside the old FieldCache?  (And deprecate the old FieldCache
entirely).

bq. I dont think any of the custom type stuff is setup to work yet.

How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc.  (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion.  Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it?  It would default to
CachedValueSource(UninversionValueSource()).  I think we should
require that you set this on opening the reader, and you can't later
change it.

This would mean a single CachingValueSource can be used for more than
one reader, which is good because IndexReader.open would send it down
to all SegmentReaders it opens.

This would then replace *CacheKey.

This approach is not that different from what we have today, but I
think there are important differences:

  * Decouple value generation (ValueSource) from caching

  * Tell IndexReader what its ValueSource is, so eg when you do
    sorting the sort pulls from your ValueSource and not a global
    default one.

  * Hopefully don't duplicate so much code (eg uninversion)

Other thoughts:

  * Presumably, at this point, the arrays returned by field cache
    should be considered readonly by the app, right?  So cloning
    a reader should simply make a shallow clone of the cache.  (Longer
    term, with CSF as the source, we think updating fields should be
    possible, so we'd need a copy-on-write solution, like we now do w/
    deleted docs).

  * Looks like some accidental regressions snuck in, eg in
    DirIndexReader:
{code}
-    final String[] files = dir.listAll();
+    final String[] files = dir.list();
{code}
    and in IndexReader:
{code}
protected IndexReader(Directory directory) {
     this();
-    this.directory = directory;
}
{code}

  * Do we even need ComparatorFactory*?  Seems like this patch
    shouldn't be be in the business of creating comparators.

  * You should hit UOE if you try to getXXX() on a MultiReader

  * Shouldn't FieldCache be deprecated entirely?  I would think, going
    forward, I interact only w/ the IndexReader's default ValueSource?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698239#action_12698239 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. Ugly.

Well no worries yet :) Still in early design mode, so if it can be made better, I'm sure it will. Of course I'd love to get to 'everything works right for the right field automagically' as well - not sure that will fit into the scope of this issue though (though nothing saying this issue can't be further delayed). We will do the best we can regardless though.

I'm kind of worried that any change is going to hurt Apps like Solr - if you end up using the new built in API, but also have code that must stick for a while with the old API (for multireader fieldcache or something), you'll likely increase RAM usage more than before - how much of a concern that ends up being, I'm not sure. I suppose eventually it has to become up to the upgrader to consider and deal with it if we want to move to segment level caching.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698397#action_12698397 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I'd like to get the cache back into the segment readers somehow ... or somehow get away from the large indexreader cache map.

IndexReader plugin could come in useful there :)

I've got the tests passing in general, but still have to figure out better/right caching (the way I allow you to pass in an Uninverter skips caching completely! plus the above), need to deal with ValueSource as a field on SortField - require it being serializable...ugg.

I'll be away from this for while now prob though, so plenty of time to mull it over.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Might as well throw another one up with custom comparator test passing. Still a bunch to do beyond tests, but I think any left that are failing will be easily fixed with a resolution to how we are going to cache.

Then javadoc, consider all the back compat issues a bit more, tweak API, consider what new tests make sense ...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700571#action_12700571 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. I was also thinking that some of these issues could force back up to multi-reader support though. 

Hopefully not...

bq. I want field handling to become easier in Lucene, but I hope we don't lose any of our super on the fly settings. +1 on making field handing easier, but I am much more weary of a fixed schema type thing.

I think "consolidating per-field details" (FieldType) is well decoupled from "forcing every occurrence of a field to be the same" (fixed schema).  We can (and I think should) do FieldType without forcing a fixed schema.

bq. I am very interested in having updatable CSF's (much too easy to mistype that). There are many cool things to use it for, especially in combination with near realtime search (tagging variations).

For tags we'd presumably want multi-valued fields handled in ValueSource, plus updatability, plus NRT.

Updatability is tricky... ValueSource would maybe need a "startChanges()" API, which would copy the array (copy-on-write) if it's not already private.  The problem is... direct array access precludes more efficient data structures that amortize the copy-on-write cost (eg by block), which we are wanting to eventually get to for deleted docs & norms (it's likely a large cost in NRT reader turnaround, though I hven't yet measured just how costly).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698234#action_12698234 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

That is why I'd love to somehow move to a FieldType class that holds such per-field details.

As things stand now (or, soon) you have to do many separate things to use TrieXXX:

  * Create the right tokenStream for it, and stick that into your Field

  * Make the right query type at search time

  * Make the right sort-field parser

  * Make the right ValueSource

It's crazy.  I should be able to make a TrieFieldType (SingleNumberFieldType, or something, after the rename), make that the type of my field when I add it to my document, and then have all these places that do range search, sorting, value retrieval consult the FieldInfo and see that this is a trie field, and act accordingly.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698423#action_12698423 ] 

Mark Miller edited comment on LUCENE-831 at 4/13/09 9:24 AM:
-------------------------------------------------------------

Okay, good point. I've got to take a closer look at what we are required to support for remote searching.

I think your right though, we probably won't end up needing the factory. For right now, the way I set the DEFAULT, we need it to get remote tests to pass. But that is all very loose right now - when the caching mechanism is fixed up, I think the whole DEFAULT_VALUE_SOURCE will be cleaned right up - especially having to set it there on the SortField.

I think we have to support remote for CustomFieldComparator, but I guess wouldnt have to for ValueSource.

      was (Author: markrmiller@gmail.com):
    Okay, good point. I've got to take a closer look at what we are required to support for remote searching.
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698419#action_12698419 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. It worked until now without serialization and so I think we should remove serialization from SortField. 

I don't think we can right now because of remote searchable - I think?. I agree the factories are silly, but now I know why they exist! It had eluded me before.

I can roll your patch in Uwe - sorry I missed it with that last one - I had meant to, but it slipped my mind.

I've been waiting to pin down how ValueSource handles its cache (either lightning will strike my mind, or more likely, Mike will tell me what to do) - but the main reason for the factory at the moment is to get the remote tests to pass - since SortField is serializable, it allows us to pass the ValueSource without it being seriazable.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698239#action_12698239 ] 

Mark Miller edited comment on LUCENE-831 at 4/12/09 10:00 AM:
--------------------------------------------------------------

bq. Ugly.

Well no worries yet :) Still in early design mode, so if it can be made better, I'm sure it will. Of course I'd love to get to 'everything works right for the right field automagically' as well - not sure that will fit into the scope of this issue though (though nothing saying this issue can't be further delayed). We will do the best we can regardless though.

I'm kind of worried that any change is going to hurt Apps like Solr - if you end up using the new built in API, but also have code that must stick for a while with the old API (for multireader fieldcache or something), you'll likely increase RAM usage more than before - how much of a concern that ends up being, I'm not sure. I suppose eventually it has to become up to the upgrader to consider and deal with it if we want to move to segment level caching.

*edit*

Little behind on that worry I guess - we already pumped that problem out the door with 1483. Half in the clouds over here.

      was (Author: markrmiller@gmail.com):
    bq. Ugly.

Well no worries yet :) Still in early design mode, so if it can be made better, I'm sure it will. Of course I'd love to get to 'everything works right for the right field automagically' as well - not sure that will fit into the scope of this issue though (though nothing saying this issue can't be further delayed). We will do the best we can regardless though.

I'm kind of worried that any change is going to hurt Apps like Solr - if you end up using the new built in API, but also have code that must stick for a while with the old API (for multireader fieldcache or something), you'll likely increase RAM usage more than before - how much of a concern that ends up being, I'm not sure. I suppose eventually it has to become up to the upgrader to consider and deal with it if we want to move to segment level caching.
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699433#action_12699433 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Bah, just wrote a bunch and hit cancel.

Attacking this from the old incarnation of the patch has me trying to hard to back FieldCache with the new API. Looking from closer eyes now, I don't see that being necessary. We just want the SegmentReader level ValueSource and the option for SortField override. FieldCache can use its current implementation. The motivation to back it is to minimize the RAM reqs of straddling the two API's. If we want the new API to work at the SegmentReader level though, we can never really achieve that. Might as well not half *ss it. I'll return the FieldCache to as is at some point and just deprecate.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689863#action_12689863 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

I'd like to see the new FieldCache API de-emphasize "get me a single array holding all values for all docs in the index" for a MultiReader. That invocation is exceptionally costly in the context of reopened readers, and providing the illusion that one can simply get this array is dangerous. It's a "leaky API", like how virtual memory API pretends you can use more memory than is physically available. 

I think it's OK to return an array-of-arrays (ie, one contiguous array per underlying segment); if the app really wants to make a massive array & concatenate it, they can do so outside of the FieldCache API. 

Or an method you call to get a given doc's value (like DocValues in o.a.l.search.function). Or an iterator API to step through all the values. 

We should also set this API up as much as possible for LUCENE-1231. Ie, the current "un-invert the field" approach that FieldCache takes is merely one source of values per doc. Column stride fields in the future will be a different (faster) source of values, that should be able to "just plug in" under the hood somehow to this same exposure API. 

On Uwe's suggestion for some flexibility on how the un-inversion takes place, I think allowing differing degrees of extension makes sense. EG we already allow you to provide a custom parser. We need to allow control on whether a given value replaces the already-seen value (LUCENE-1372), or whether to stop the looping early (Uwe's needs for improving Trie). We should also allow outright entire custom class that creates the value array.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700391#action_12700391 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I've got a bit of the same feeling. My list was more or less cherry picked from all of the above comments, and my initial feeling was their was not enough motivation as well. But the more I thought about it, the more kind of ugly field cache is. And we would want to lose exposing Parser so that CFS can be a seamless backing. That makes FieldCache even uglier for a while. Clickless thus far here too, but I think we have a good base to work with still.

{quote}Honestly these reasons are not net/net compelling enough to warrant a
whole new API? They are fairly minor. And I agree: LUCENE-1483 has
already achieved the biggest step forward here.{quote}

Not only that, but almost all of those reasons can be handled by allowing a custom FieldCache to be used, rather than just hard coding to the default singleton.

A couple responses:

{quote}We need source pluggability for when CSF arrives (but, admittedly,
we could wait until CSF actually does arrive){quote}
We have it? Just pass the CSFValueSource at IndexReader creation?

{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms & deleted docs.{quote}
Good point. We need a way to update, that can throw USO Exception?

{quote}
How would norms be folded into this? Ideally, each field could
choose to pull its norms from any source. Document level norms
was discussed somewhere, and should easily "fit" as another norms
source. We'd need to relax how per-doc-field boosting is computed
at runtime to pull from such "arbitrary" sources.{quote}
Good point again. Getting norms under this API will add a bit more meat to this issue.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
"deleted docs" one wanted.{quote}
You've got me here at the moment. I don't know the delete code very well, but I will in time :)

{quote}
      Allowing for docs that have more than one value. (We'd also need
      to extend sorting to be able to compare multiple values).
{quote}
This is an interesting one, because I wonder if we can do it and stick with arrays? A multi dimensional array seems a bit much...

{quote}
An mmap implementation (like Lucy/KS) - should feel just like CSF
or uninversion (ie, "just another impl").{quote}
This is already fairly independent I think...

{quote}
Good impls for the enum case (all strings could be considered
enums), eg if there are only 100 unique strings in that field, you
only need 7 bits per ord derefing into the char[] values.
{quote}
+1. Yes.

{quote}
Possible future when Lucene computes sort cache (for text fields)
and stores in the index{quote}
I'm not familiar with that idea, so not sure what affect this has...

{quote}
Allowing field sort to use an entirely external source of values
{quote}
I think both options allow that now - if you pass the ValueSource from the reader, it can get its values from everywhere. If you override the reader valuesource with the sortfield valuesource, it too can load from anywhere. I am just not sure both options are really needed. I am kind of liking Uwe's idea of assigning ValueSources per field, though that could probably get messy. Perhaps a default, and then per field overrides? 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Chris Hostetter <ho...@fucit.org>.

: >  Right, if the updates come through IndexWriter or through a different
: >  IndexReader.  But if you do the updates with an IndexReader (which
: >  eventually commits to disk), and also use that IndexReader for
: >  searching, we may need to synchronize?
: 
: IMO, if we want to support something like that, we could make it so
: the changes weren't visible until the next time the user called
: getFieldCache() (which would have knowledge about the change and
: create a new cache object, leaving the old object unaffected).

I wonder if this is a detail the Cache impls could worry about ... the 
Cache API could include hook methods for the IndexReader to notify it that 
various properties of the index have changed -- simple Cache impls used 
by people for "read only" indexes could makethese methods NOOPs and the 
accessors could be unsynchronized ... more complex impls could be written 
for more complex caches/usecases.

(Incidently: i don't remember if it was in the orriginal patch, but one of 
my assumptions was that fieldNorm caching could be managed by this new API 
as well)

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Yonik Seeley wrote:
> On Fri, Mar 28, 2008 at 10:43 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>  Yonik Seeley wrote:
>>> On Fri, Mar 28, 2008 at 9:20 AM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>>  But eg LUCENE-1231 talks about maybe eventually allowing  
>>>> updates to
>>>>  fields, like how norms can be updated in a reader today.  If we do
>>>>  that, eg as part of flexible indexing, then we might need to worry
>>>>  about synchronizing?
>>>
>>> If updates to fields are versioned like norms, one would not see the
>>> changes until a new reader is opened.
>>
>>  Right, if the updates come through IndexWriter or through a  
>> different
>>  IndexReader.  But if you do the updates with an IndexReader (which
>>  eventually commits to disk), and also use that IndexReader for
>>  searching, we may need to synchronize?
>
> IMO, if we want to support something like that, we could make it so
> the changes weren't visible until the next time the user called
> getFieldCache() (which would have knowledge about the change and
> create a new cache object, leaving the old object unaffected).

I think that makes sense.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Yonik Seeley <yo...@apache.org>.

On Fri, Mar 28, 2008 at 10:43 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
>  Yonik Seeley wrote:
>  > On Fri, Mar 28, 2008 at 9:20 AM, Michael McCandless
>  > <lu...@mikemccandless.com> wrote:
>  >>  But eg LUCENE-1231 talks about maybe eventually allowing updates to
>  >>  fields, like how norms can be updated in a reader today.  If we do
>  >>  that, eg as part of flexible indexing, then we might need to worry
>  >>  about synchronizing?
>  >
>  > If updates to fields are versioned like norms, one would not see the
>  > changes until a new reader is opened.
>
>  Right, if the updates come through IndexWriter or through a different
>  IndexReader.  But if you do the updates with an IndexReader (which
>  eventually commits to disk), and also use that IndexReader for
>  searching, we may need to synchronize?

IMO, if we want to support something like that, we could make it so
the changes weren't visible until the next time the user called
getFieldCache() (which would have knowledge about the change and
create a new cache object, leaving the old object unaffected).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Yonik Seeley wrote:
> On Fri, Mar 28, 2008 at 9:20 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>  But eg LUCENE-1231 talks about maybe eventually allowing updates to
>>  fields, like how norms can be updated in a reader today.  If we do
>>  that, eg as part of flexible indexing, then we might need to worry
>>  about synchronizing?
>
> If updates to fields are versioned like norms, one would not see the
> changes until a new reader is opened.

Right, if the updates come through IndexWriter or through a different  
IndexReader.  But if you do the updates with an IndexReader (which  
eventually commits to disk), and also use that IndexReader for  
searching, we may need to synchronize?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Yonik Seeley <yo...@apache.org>.

On Fri, Mar 28, 2008 at 9:20 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
>  But eg LUCENE-1231 talks about maybe eventually allowing updates to
>  fields, like how norms can be updated in a reader today.  If we do
>  that, eg as part of flexible indexing, then we might need to worry
>  about synchronizing?

If updates to fields are versioned like norms, one would not see the
changes until a new reader is opened.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:
> Right Michael...of course. Sometimes I cannot see the forest or the  
> trees...
>
> We don't have to worry that much about the IndexReader being read  
> only though do we? There is not much worry now - I believe that a  
> deleted doc field value remains in the field cache until the Reader  
> is reopened. If someone is changing docs/fields, they need to  
> reopen the Reader. Heck...i think that deleted docs field values  
> even currently get loaded into the cache. So regardless of "read- 
> only" this method will duplicate the current behavior right?

Right, if we mimic the current FieldCache, we don't allow writing  
into that cache so we don't have to worry about it.

But eg LUCENE-1231 talks about maybe eventually allowing updates to  
fields, like how norms can be updated in a reader today.  If we do  
that, eg as part of flexible indexing, then we might need to worry  
about synchronizing?

Mike

> - Mark
>
> Michael McCandless wrote:
>>
>> I was picturing that you'd first call an API on IndexReader to  
>> retrieve an object for accessing your stored field, like the  
>> IndexReader.getCachedData() in the current patch on LUCENE-831.   
>> That method must be synchronized so that the underlying cache is  
>> initially loaded by at most one thread.
>>
>> You then use the returned object to ask for the field value(s) for  
>> an individual doc.  I think calling the "getIntValue(int docID)"  
>> on this object would not have to be synchronized, assuming the  
>> original IndexReader was "read only"?
>>
>> Mike
>>
>> Mark Miller wrote:
>>> The reason I am thinking you have to synch on every  
>>> getCachedField call is that the cache needs to be lazily  
>>> loaded...I don't see a way to do with this without sync unless  
>>> you have an ugly "you must call this method before repeatably  
>>> calling getCachedField."
>>>
>>> Maybe I am wrong? Or maybe the cost of synchronization is low  
>>> enough now not to matter as long as we provide the back  
>>> compaitable way?
>>>
>>> Michael McCandless (JIRA) wrote:
>>>>      [ https://issues.apache.org/jira/browse/LUCENE-831? 
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
>>>> tabpanel&focusedCommentId=12582576#action_12582576 ]
>>>>
>>>> Michael McCandless commented on LUCENE-831:
>>>> -------------------------------------------
>>>>
>>>> I think if we can finally move to having read-only IndexReaders  
>>>> then they would not sync on this method?
>>>>
>>>> Also, we should still provide the "give me the full array as of  
>>>> right now" fallback which in a read/write usage would allow you  
>>>> to spend lots of RAM in order to not synchronize.  Of course  
>>>> you'd also have to update your array (or, periodically ask for a  
>>>> new one) if you are altering fields.
>>>>
>>>>
>>>>> Complete overhaul of FieldCache API/Implementation
>>>>> --------------------------------------------------
>>>>>
>>>>>                  Key: LUCENE-831
>>>>>                  URL: https://issues.apache.org/jira/browse/ 
>>>>> LUCENE-831
>>>>>              Project: Lucene - Java
>>>>>           Issue Type: Improvement
>>>>>           Components: Search
>>>>>             Reporter: Hoss Man
>>>>>             Assignee: Michael Busch
>>>>>              Fix For: 2.4
>>>>>
>>>>>          Attachments: fieldcache-overhaul.032208.diff,  
>>>>> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>>>>>
>>>>>
>>>>> Motivation:
>>>>> 1) Complete overhaul the API/implementation of "FieldCache"  
>>>>> type things...
>>>>>      a) eliminate global static map keyed on IndexReader (thus
>>>>>          eliminating synch block between completley independent  
>>>>> IndexReaders)
>>>>>      b) allow more customization of cache management (ie: use
>>>>>          expiration/replacement strategies, disk backed caches,  
>>>>> etc)
>>>>>      c) allow people to define custom cache data logic (ie: custom
>>>>>          parsers, complex datatypes, etc... anything tied to a  
>>>>> reader)
>>>>>      d) allow people to inspect what's in a cache (list of  
>>>>> CacheKeys) for
>>>>>          an IndexReader so a new IndexReader can be likewise  
>>>>> warmed.
>>>>>      e) Lend support for smarter cache management if/when
>>>>>          IndexReader.reopen is added (merging of cached data  
>>>>> from subReaders).
>>>>> 2) Provide backwards compatibility to support existing  
>>>>> FieldCache API with
>>>>>      the new implementation, so there is no redundent caching  
>>>>> as client code
>>>>>      migrades to new API.
>>>>>
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Right Michael...of course. Sometimes I cannot see the forest or the trees...

We don't have to worry that much about the IndexReader being read only 
though do we? There is not much worry now - I believe that a deleted doc 
field value remains in the field cache until the Reader is reopened. If 
someone is changing docs/fields, they need to reopen the Reader. 
Heck...i think that deleted docs field values even currently get loaded 
into the cache. So regardless of "read-only" this method will duplicate 
the current behavior right?

- Mark

Michael McCandless wrote:
>
> I was picturing that you'd first call an API on IndexReader to 
> retrieve an object for accessing your stored field, like the 
> IndexReader.getCachedData() in the current patch on LUCENE-831.  That 
> method must be synchronized so that the underlying cache is initially 
> loaded by at most one thread.
>
> You then use the returned object to ask for the field value(s) for an 
> individual doc.  I think calling the "getIntValue(int docID)" on this 
> object would not have to be synchronized, assuming the original 
> IndexReader was "read only"?
>
> Mike
>
> Mark Miller wrote:
>> The reason I am thinking you have to synch on every getCachedField 
>> call is that the cache needs to be lazily loaded...I don't see a way 
>> to do with this without sync unless you have an ugly "you must call 
>> this method before repeatably calling getCachedField."
>>
>> Maybe I am wrong? Or maybe the cost of synchronization is low enough 
>> now not to matter as long as we provide the back compaitable way?
>>
>> Michael McCandless (JIRA) wrote:
>>>      [ 
>>> https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582576#action_12582576 
>>> ]
>>>
>>> Michael McCandless commented on LUCENE-831:
>>> -------------------------------------------
>>>
>>> I think if we can finally move to having read-only IndexReaders then 
>>> they would not sync on this method?
>>>
>>> Also, we should still provide the "give me the full array as of 
>>> right now" fallback which in a read/write usage would allow you to 
>>> spend lots of RAM in order to not synchronize.  Of course you'd also 
>>> have to update your array (or, periodically ask for a new one) if 
>>> you are altering fields.
>>>
>>>
>>>> Complete overhaul of FieldCache API/Implementation
>>>> --------------------------------------------------
>>>>
>>>>                  Key: LUCENE-831
>>>>                  URL: https://issues.apache.org/jira/browse/LUCENE-831
>>>>              Project: Lucene - Java
>>>>           Issue Type: Improvement
>>>>           Components: Search
>>>>             Reporter: Hoss Man
>>>>             Assignee: Michael Busch
>>>>              Fix For: 2.4
>>>>
>>>>          Attachments: fieldcache-overhaul.032208.diff, 
>>>> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>>>>
>>>>
>>>> Motivation:
>>>> 1) Complete overhaul the API/implementation of "FieldCache" type 
>>>> things...
>>>>      a) eliminate global static map keyed on IndexReader (thus
>>>>          eliminating synch block between completley independent 
>>>> IndexReaders)
>>>>      b) allow more customization of cache management (ie: use
>>>>          expiration/replacement strategies, disk backed caches, etc)
>>>>      c) allow people to define custom cache data logic (ie: custom
>>>>          parsers, complex datatypes, etc... anything tied to a reader)
>>>>      d) allow people to inspect what's in a cache (list of 
>>>> CacheKeys) for
>>>>          an IndexReader so a new IndexReader can be likewise warmed.
>>>>      e) Lend support for smarter cache management if/when
>>>>          IndexReader.reopen is added (merging of cached data from 
>>>> subReaders).
>>>> 2) Provide backwards compatibility to support existing FieldCache 
>>>> API with
>>>>      the new implementation, so there is no redundent caching as 
>>>> client code
>>>>      migrades to new API.
>>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

I was picturing that you'd first call an API on IndexReader to  
retrieve an object for accessing your stored field, like the  
IndexReader.getCachedData() in the current patch on LUCENE-831.  That  
method must be synchronized so that the underlying cache is initially  
loaded by at most one thread.

You then use the returned object to ask for the field value(s) for an  
individual doc.  I think calling the "getIntValue(int docID)" on this  
object would not have to be synchronized, assuming the original  
IndexReader was "read only"?

Mike

Mark Miller wrote:
> The reason I am thinking you have to synch on every getCachedField  
> call is that the cache needs to be lazily loaded...I don't see a  
> way to do with this without sync unless you have an ugly "you must  
> call this method before repeatably calling getCachedField."
>
> Maybe I am wrong? Or maybe the cost of synchronization is low  
> enough now not to matter as long as we provide the back compaitable  
> way?
>
> Michael McCandless (JIRA) wrote:
>>      [ https://issues.apache.org/jira/browse/LUCENE-831? 
>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
>> tabpanel&focusedCommentId=12582576#action_12582576 ]
>>
>> Michael McCandless commented on LUCENE-831:
>> -------------------------------------------
>>
>> I think if we can finally move to having read-only IndexReaders  
>> then they would not sync on this method?
>>
>> Also, we should still provide the "give me the full array as of  
>> right now" fallback which in a read/write usage would allow you to  
>> spend lots of RAM in order to not synchronize.  Of course you'd  
>> also have to update your array (or, periodically ask for a new  
>> one) if you are altering fields.
>>
>>
>>> Complete overhaul of FieldCache API/Implementation
>>> --------------------------------------------------
>>>
>>>                  Key: LUCENE-831
>>>                  URL: https://issues.apache.org/jira/browse/ 
>>> LUCENE-831
>>>              Project: Lucene - Java
>>>           Issue Type: Improvement
>>>           Components: Search
>>>             Reporter: Hoss Man
>>>             Assignee: Michael Busch
>>>              Fix For: 2.4
>>>
>>>          Attachments: fieldcache-overhaul.032208.diff, fieldcache- 
>>> overhaul.diff, fieldcache-overhaul.diff
>>>
>>>
>>> Motivation:
>>> 1) Complete overhaul the API/implementation of "FieldCache" type  
>>> things...
>>>      a) eliminate global static map keyed on IndexReader (thus
>>>          eliminating synch block between completley independent  
>>> IndexReaders)
>>>      b) allow more customization of cache management (ie: use
>>>          expiration/replacement strategies, disk backed caches, etc)
>>>      c) allow people to define custom cache data logic (ie: custom
>>>          parsers, complex datatypes, etc... anything tied to a  
>>> reader)
>>>      d) allow people to inspect what's in a cache (list of  
>>> CacheKeys) for
>>>          an IndexReader so a new IndexReader can be likewise warmed.
>>>      e) Lend support for smarter cache management if/when
>>>          IndexReader.reopen is added (merging of cached data from  
>>> subReaders).
>>> 2) Provide backwards compatibility to support existing FieldCache  
>>> API with
>>>      the new implementation, so there is no redundent caching as  
>>> client code
>>>      migrades to new API.
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

The reason I am thinking you have to synch on every getCachedField call 
is that the cache needs to be lazily loaded...I don't see a way to do 
with this without sync unless you have an ugly "you must call this 
method before repeatably calling getCachedField."

Maybe I am wrong? Or maybe the cost of synchronization is low enough now 
not to matter as long as we provide the back compaitable way?

Michael McCandless (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582576#action_12582576 ]
>
> Michael McCandless commented on LUCENE-831:
> -------------------------------------------
>
> I think if we can finally move to having read-only IndexReaders then they would not sync on this method?
>
> Also, we should still provide the "give me the full array as of right now" fallback which in a read/write usage would allow you to spend lots of RAM in order to not synchronize.  Of course you'd also have to update your array (or, periodically ask for a new one) if you are altering fields.
>
>    
>> Complete overhaul of FieldCache API/Implementation
>> --------------------------------------------------
>>
>>                  Key: LUCENE-831
>>                  URL: https://issues.apache.org/jira/browse/LUCENE-831
>>              Project: Lucene - Java
>>           Issue Type: Improvement
>>           Components: Search
>>             Reporter: Hoss Man
>>             Assignee: Michael Busch
>>              Fix For: 2.4
>>
>>          Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>>
>>
>> Motivation:
>> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>>      a) eliminate global static map keyed on IndexReader (thus
>>          eliminating synch block between completley independent IndexReaders)
>>      b) allow more customization of cache management (ie: use
>>          expiration/replacement strategies, disk backed caches, etc)
>>      c) allow people to define custom cache data logic (ie: custom
>>          parsers, complex datatypes, etc... anything tied to a reader)
>>      d) allow people to inspect what's in a cache (list of CacheKeys) for
>>          an IndexReader so a new IndexReader can be likewise warmed.
>>      e) Lend support for smarter cache management if/when
>>          IndexReader.reopen is added (merging of cached data from subReaders).
>> 2) Provide backwards compatibility to support existing FieldCache API with
>>      the new implementation, so there is no redundent caching as client code
>>      migrades to new API.
>>      
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582576#action_12582576 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

I think if we can finally move to having read-only IndexReaders then they would not sync on this method?

Also, we should still provide the "give me the full array as of right now" fallback which in a read/write usage would allow you to spend lots of RAM in order to not synchronize.  Of course you'd also have to update your array (or, periodically ask for a new one) if you are altering fields.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582480#action_12582480 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Hmm...how do we avoid having to pull the cached field values through a sync on every call? The field data has to be cached...and the method to return the single cached field value has to be multi-threaded...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Jeremy Volkman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657401#action_12657401 ] 

Jeremy Volkman commented on LUCENE-831:
---------------------------------------

A couple things:

# Looking at the getCachedData method for MultiReader and MultiSegmentReader, it doesn't appear that the CacheData objects from merge operations are cached.  Is there any reason for this?
# I've written a merge method for StringIndexCacheKey. The process isn't all that complicated (apart from all of the off-by-ones), but it's expensive.

{code:java}
  public boolean isMergable() {
    return true;
  }

  private static class OrderNode {
      int index;
      OrderNode next;
  }
  
  public CacheData mergeData(int[] starts, CacheData[] data) 
  throws UnsupportedOperationException {
    int[] mergedOrder = new int[starts[starts.length - 1]];
    // Lookup map is 1-based
    String[] mergedLookup = new String[starts[starts.length - 1] + 1];
    
    // Unwrap cache payloads and flip order arrays
    StringIndex[] unwrapped = new StringIndex[data.length];

    /* Flip the order arrays (reverse indices and values)
     * Since the ord map has a many-to-one relationship with the lookup table,
     * the flipped structure must be one-to-many which results in an array of
     * linked lists.
     */
    OrderNode[][] flippedOrders = new OrderNode[data.length][];
    for (int i = 0; i < data.length; i++) {
        StringIndex si = (StringIndex) data[i].getCachePayload();
        unwrapped[i] = si;
        flippedOrders[i] = new OrderNode[si.lookup.length];
        for (int j = 0; j < si.order.length; j++) {
            OrderNode a = new OrderNode();
            a.index = j;
            a.next = flippedOrders[i][si.order[j]];
            flippedOrders[i][si.order[j]] = a;
        }
    }

    // Lookup map is 1-based
    int[] lookupIndices = new int[unwrapped.length];
    Arrays.fill(lookupIndices, 1);

    int lookupIndex = 0;
    String currentVal;
    int currentSeg;
    while (true) {
        currentVal = null;
        currentSeg = -1;
        int remaining = 0;
        // Find the next ordered value from all the segments
        for (int i = 0; i < unwrapped.length; i++) {
            if (lookupIndices[i] < unwrapped[i].lookup.length) {
                remaining++;
                String that = unwrapped[i].lookup[lookupIndices[i]];
                if (currentVal == null || currentVal.compareTo(that) > 0) {
                    currentVal = that;
                    currentSeg = i;
                }
            }
        }
        if (remaining == 1) {
            break;
        } else if (remaining == 0) {
            /* The only way this could happen is if there are 0 segments or if
             * all segments have 0 terms. In either case, we can return
             * early.
             */
            return new CacheData(new StringIndex(
                    new int[starts[starts.length - 1]], new String[1]));
        }
        if (!currentVal.equals(mergedLookup[lookupIndex])) {
            lookupIndex++;
            mergedLookup[lookupIndex] = currentVal;
        }
        OrderNode a = flippedOrders[currentSeg][lookupIndices[currentSeg]];
        while (a != null) {
            mergedOrder[a.index + starts[currentSeg]] = lookupIndex;
            a = a.next;
        }
        lookupIndices[currentSeg]++;
    }
{code}



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702526#action_12702526 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

{quote}
Grandma! But yeah we need to somehow support probably plain Java
objects rather than every primitive derivative?
{quote}

You mean big arrays (one per doc) of plain-java-objects?  Is Bobo doing that today?  Or do you mean a single Java obect that, internally, deals with lookup by docID?

{quote}
(In reference to Mark's post 2nd to last post) Bobo efficiently
nicely calculates facets for multiple values per doc which is
the same thing as "multi value faceting"?
{quote}

Neat.  How do you compactly represent (in RAM) multiple values per doc?

{quote}
Are norms and deletes implemented? These would just be byte
arrays in the current approach? If not how would they be
represented? It seems like for deleted docs we'd want the
BitVector returned from a ValueSource.get type of method?
{quote}

The current patch doesn't do this -- but we should think about how this change could absorb norms/deleted docs, in the future.  We would add a "bit" variant of getXXX (eg that returns BitVector, BitSet, something).

{quote}
Hmm... Does this mean we'd replace the current IndexReader
method of performing updates on norms and deletes with this more
generic update mechanism?
{quote}

Probably we'd still leave the "sugar" APIs in place, but under the hood their impls would be switched to this.

bq. It would be cool to get CSF going?

Most definitely!!

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649391#action_12649391 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. i haven't had any time to do further work on this issue ... partly because i haven't had a lot of time, but mainly because i'm hoping to get some feedback on the overall approach before any more serious effort investment. 

Wheres that investment Hoss? You've orphaned your baby. There is a fairly decent amount of feedback here.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.03.28.2008.diff

Here is a quick proof-of-concept type patch for using a method call rather than arrays. Speed pertaining to reopen.

In my quick test of 'open 500000 tiny docs index, repeat(3): add couple docs/sort search' the total time taken was:

Orig FieldCache impl: 27 seconds
New impl with arrays: 12 seconds
New impl with method call: 3 seconds

Its kind of a worse case scenerio, but much faster is much faster<g> The bench does not push through the point where method 3 would have to reload all of the segments, so that would affect it some...but method one is reloading all of the segments every single time...

This approach keeps the original approach for those that want to use the arrays. In that case everything still merges except for the StringIndex, so String sorting is slow. Lucene core is rigged to use the new method call though, so String sort is as sped up as the other field types when not using the arrays.

Not sure everything is completely on the level yet, but all core tests pass (core sort tests can miss a lot).

I lied about changing all core to use the new api...I havn't changed the function package yet.

- Mark

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Updated to trunk.

Took out an optimization last patch - was using ordinals rather than strings to sort when Reader was not Multi - didn't like the isMulti on IndexReader though, so this patch and the last don't have it.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699439#action_12699439 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I do not get the patch applied to trunk (merging works) but it gots lots of compile failures (because of the changes of LUCENE-1575). Mostly because the comparators got new ctors and so on.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698421#action_12698421 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I am just wondering why the parsers and locales work, which all are not serializable. But they are NULL per default. So in principle, if I do a remote search with a custom parser or comparator, it would fail, too.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699880#action_12699880 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Okay, now that I half way understand this issue, I think I have to go back to the basic motivations. The original big win was taken away by 1483, so lets see if we really need a new API for the wins we have left.

h3. Advantage of new API (kind of as it is in the patch)
FieldCache is interface and it would be nice to move to abstract class, ExtendedFieldCache is ugly
Avoid global sync by IndexReader to access cache
its easier/cleaner to block caching by multireaders (though I am almost thinking I would prefer warnings/advice about performance and encouragement to move to per segment)
It becomes easier to share a ValueSource instance across readers.

h3. Disadvantages of new API
If we want only SegmentReaders to have a ValueSource, you can't efficiently back the old API with the new, causing RAM reqs jumps if you straddle the two APIs and ask for the same array data from each.

Its probably a higher barrier to a custom Parser to implement and init a Reader with a ValueSource (presumably that works per field) than to simply pass the Parser on a SortField. However, Parser stops making sense if we end up being able to back ValueSource with column stride fields. We could allow ValueSource to be passed on the SortField (the current incarnation of this patch), but then you have to go back to a global cache by reader the ValueSources passed that way (you would also still have the per segment reader, settable ValueSource).

h3. Advantages of staying with old API
Avoid forcing large migration for users, with possible RAM req penalties if they don't switch from deprecated code (we are doing something similar with 1483 even without deprecated code though - if you were using an external multireader FieldCache that matched a sort FieldCache key, youd double your RAM reqs).

h3. Thoughts
If we stayed with the old API, we could still allow a custom FieldCache to be supplied. We could still back FieldCacheImpl with Uninverter to reduce code. We could still have CachingFieldCache. Though CachingValueSource is much better :) FieldCache implies caching, and so the name would be confusing. We could also avoid CachingFieldCache though, as just making a pluggable FieldCache would allow alternate caching implementations (with a bit more effort).

We could deprecate the Parser methods and force supplying a new FieldCache impl for custom uninversion to get to an API suitable to be backed by CSF.

Or:

We could also move to ValueSource, but allow a ValueSource on multi-readers. That would probably make straddling the API's much more possible (and efficient) in the default case. We could advise that its best to work per segment, but leave the option to the user.

h3. Conclusion
I am not sure. I thought I was convinced we might as well not even move from FieldCache at all, but now that I've written a bit out, I'm thinking it would be worth going to ValueSource. I'm just not positive on what we should support. SortField ValueSource override keyed by reader? ValueSources on MultiReaders?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654413#action_12654413 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


bq. It seems with this field cache approach and the recent FieldCacheRangeFilter on trunk, that Lucene has a robust and coherent answer to performing efficient sorting and range filtering for float, double, short, int and long values, perhaps it's time to enhance Document. That might cut down the size of the API, which in turn makes it easy to test and tune. Document could preclude tokenization for such fields, I suspect I'm not the only one to build a type-safe replacement to Document.

This is an interesting idea.  Say we create IntField, a subclass of
Field.  It could directly accept a single int value and not accept
tokenization options.  It could assert "not null", if the field wanted
that.  FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well.  If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields.  Etc....


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699678#action_12699678 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

So I'm flopping around on this, but I guess my latest take is that:

I want to drop the SortField ValueSource override option. Everything would need to be handled by overriding the segment reader ValueSource.

Drop the current back compat code for FieldCache - its mostly unnecessary I think. Instead, perhaps go back to orig FieldCache impl, except if the Reader is a segment reader, use the new ValueSource API ? Grrr - except if someone has mucked with the ValueSource or used a custom FieldCache Parser, it won't match correctly...thats it - you just can't straddle the two APIs. So I'll revert FieldCache to its former self and just deprecate.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698215#action_12698215 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

I'm using a similar approach.

There's a FieldType, that governs conversions from Java type into Lucene strings and declares 'abilities' of that type. Like - conversion is order-preserving (all numerics + some others), converted values can be meaningfully prefix-searched (like TreeId, that is essentially an int[], used to represent things like nested category trees). Some types can also declare themselves as derivatives of others, like DateType being derived from LongType.

Then there's a FieldInfo, that defines field name, FieldType used for it, and actions we're going to take on the field. E.g. if we want to sort on it, build clusters with certain characteristics, load values for this field for each found document, use fast rangefilters, store/filter on field being null/notnull, apply transforms on the field before storing/searching, copy value of the field to another field (with probable transformation) when indexing, etc. From FieldType and desired actions, FieldInfo is able to deduce tokenize/index/store/cache behaviour, and can say that additional lucene fields are required (e.g. for handling null/notnull searches, or trie ranges, or a special sort-form).

Then there's an interface that contains FieldInfo constants and a special constant FieldEnum FIELDS = fieldsOf(ResumeFields.class); that is essentially a navigable list of all FieldInfos defined in this interface and interfaces it extends (allows me to have CommonFields + ResumeFields extends CommonFields, VacancyFields extends CommonFields).

FieldType, and consequently FieldInfo is type-parameterized with the java type associated with the field, so you get the benefit of type-safety when storing/loading/searching the field. All Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for field name, so for example Filters.Range(field, fromValue, fromInclusive, toValue, toInclusive) knows whether to use a simple range filter or a trie one, ensures from/toValues are of a proper type and converts them properly. Filters.IsSet(field) can consult an additional field created during indexation, or access a FieldCache. DocLoader will either get a value for the field from index or from the cache. etc, etc, etc.

While I like resulting schema-style very much, I don't want to see the likes of it within Lucene core. Better to have some contrib/extension/whatever that builds on core-defined primitives. That way if one needs to build his own somewhat divergent schema, they can easily do it, instead of trying to fit theirs over Lucene's. For the very same reason I'd like to see fieldcaches moved away from the core, and depending on the same in-core IndexReader segment creation/deletion/whatever hooks that users will use to build their extensions. 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698309#action_12698309 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I guess the caching is not the problem I thought. It's got to be per  
value source anyway. I guess I just have to stick the default  
valuesource in a better place. Seems cachingvaluesource is still legit.

Still, I introduced reader back into the caching , and you wanted to  
avoid that..

- Mark

http://www.lucidimagination.com

On Apr 12, 2009, at 8:45 PM, "Mark Miller (JIRA)" <ji...@apache.org>  



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698241#action_12698241 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. Or, we could turn the check off entirely (which I think is better than intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only going to trip it depending...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657409#action_12657409 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

{quote}
>  this will turn more into an API overhaul than an IndexReader reopen time saver.
{quote}
...and given the progress on LUCENE-1483 (copying values into the sort queues), I think this new FieldCache API should probably be primarily an iteration API.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701751#action_12701751 ] 

Jason Rutherglen commented on LUCENE-831:
-----------------------------------------

I'm trying to figure out how to integrate Bobo faceting field
caches with this patch, I applied the patch, browsed the
ValueSource API and yeah, it's not what I expected. "we can
return arrays, objects, or anything and your grandmother" not
Grandma! But yeah we need to somehow support probably plain Java
objects rather than every primitive derivative? 

(In reference to Mark's post 2nd to last post) Bobo efficiently
nicely calculates facets for multiple values per doc which is
the same thing as "multi value faceting"? 

> by back compat with deletes, norms though.

Are norms and deletes implemented? These would just be byte
arrays in the current approach? If not how would they be
represented? It seems like for deleted docs we'd want the
BitVector returned from a ValueSource.get type of method?

M.M.: "Updatability is tricky... ValueSource would maybe need a
"startChanges()" API, which would copy the array (copy-on-write)
if it's not already private"

Hmm... Does this mean we'd replace the current IndexReader
method of performing updates on norms and deletes with this more
generic update mechanism?

It would be cool to get CSF going?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581375#action_12581375 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Right, I think its used in MultiSearcher and Parallel-MultSearcher and I believe its because you cannot compare simple ord arrays across Searchers and so you need the original sort term?

I havn't been able to come up with anything that would be very efficient for merging a StringIndex, but I have not thought to much about it yet. Anyone out there have any ideas? Fastest way to merge two String[], each with an int[] indexing into the String[]? Is there a *really* fast way?

I agree that it would be nice to skip the String[] if a MultiSearcher was not being used.

I'll keep playing with it

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Updated to trunk.

I've combined all of the dual (primitive array/ObjectArray) CachKeys into one. Each cache key can support both modes or throw UnsupportedException or something.

I've also tried something a bit experimental to allow users to eventually use custom or alternate cachekeys (payload or sparse arrays or something) that work with internal sorting. A cache implementation can now supply a ComparatorFactory (name will prob be tweaked) that handles creating comparators. You can subclass ComparatorFactory and add new or override current supported CacheKeys.

CustomComparators still needs to be twiddled with some.

I've converted some of the sort tests to run with both primitive and object arrays as well.

- Mark
I

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698238#action_12698238 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

Another thing that'd be great to fix about FieldCache is its
intermittent checking of the "some docs had more than one token in the
field" error.  The current check only catches it in limited cases,
which is deadly because you can test like crazy and think you're OK
only in production months later to index slightly different content
and hit the exception. RuntimeException

But I can't think of a cheap way to do it reliably.

At least, we should upgrade the exception from RuntimeException to a
checked exception.  Or, we could turn the check off entirely (which I
think is better than intermittently catching it).

We should also somehow allow turning off the check on a case by case
basis, since there are really times when it's OK.  (Though maybe you
just make your own ValueSource, or maybe subclass Uninverter,
or... something?).


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700523#action_12700523 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Some of your comments seem to indicate you think we will need to end up with an object rather than raw arrays?

Well, really I threw out all these future items to stir up the pot and
see if some clarity comes out of it ;) This is what I try to do
whenever I'm stuck on how to design something... some sort of defense
mechanism.

That said, what requires object instead of array?  EG for binary
fields (deleted docs) we'd have eg "BitVector getBits(...)".

For multi-valued fields, I'm not sure what's best.  I think Yonik did
something neat with Solr for holding multi-valued fields but I can't
find it now.  But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones?  It's not
either/or?

bq. And we would want to lose exposing Parser so that CFS can be a seamless backing. 

I see the CFS/CSF confusion has already struck!

But yes cleaner API would be a nice step forward...

bq. We have it? Just pass the CSFValueSource at IndexReader creation?

Yes I think we have this one.

Though... I feel like ValueSource should represent a single field's
values, and something else (FieldType?) returns the ValueSource for
that field.  Ie, I think we are overloading ValueSource now?

bq. Good point. We need a way to update, that can throw USO Exception?

Maybe... or we can defer for future.  We don't need full answers nor
impls for all of these now...

{quote}
> Possible future when Lucene computes sort cache (for text fields)
> and stores in the index

I'm not familiar with that idea, so not sure what affect this has...
{quote}

Sort cache is just getStringIndex()... all other types just use the
values directly (no need for separate ords).  If it's costly to
compute per-reopen we may want to store it in the index.  But
honestly, since we load the full thing into RAM, I wonder how
different the time'd really be loading it vs recomputing it.

bq. Good point again. Getting norms under this API will add a bit more meat to this issue.

Yeah I'm not sure whether norms/deleted docs "fit"; certainly we'd
need updatability first.  It's just that, from a distance, they are
clearly a "value per doc" for every doc in the index.  If we had norms
& deletions under this API then suddenly, [almost] for free, we'd get
pluggability of deleted docs & norms.

bq. I am kind of liking Uwe's idea of assigning ValueSources per field, though that could probably get messy. Perhaps a default, and then per field overrides?

I'm also more liking "per field" to be somehow handled.  Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.

bq. Anybody is updating norms on a regular basis on a serious project?

This is a good question -- I'd love to know too.

But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance.  Of course we'd have to figure out (or rule out) updating
the postings for such changes...


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559008#action_12559008 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I did some extensive tests with Lucene 2.3 today and was wondering, that IndexReader.reopen() in combination with FieldCache/Sorting does not bring a performance increase.

Until now, even if you reopen an Index using IndexReader.reopen(), you will get a new IndexReader instance which is not available in the default FieldCache. Because of that, all values from the cached fields must be reloaded. As reopen() only opens new/changed segments, a FieldCache implementation directly embedded into IndexReader as propsed by this issue would help here. Each segment would have its own FieldCache and reopening would be quite fast.

As with Lucene 2.3 the reopen is possible, how about this issue? Would this be possible for 2.4?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Here is fairly decent base to start from.

Still needs a lot, but a surprising amount of tests pass. Essentially stripped out what isnt needed due to 1483, updated to trunk (new 1483 API), and added a bit of the new stuff we need.

Some of the work to do:

I think cache needs a clone for indexreader cloning.

Need to come up with a good way to specify type to cachekey configuration - I threw some paint on the wall. Once we go filed/type - cachekey, almost seems we are getting into schema territory...

Have to deal with new Parser interface in FieldCache (back compat)

We may need the multireader to do the full array for back compat.

I dont think any of the custom type stuff is setup to work yet.

Iterate iterate iterate I suppose.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699028#action_12699028 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Or maybe we can just keep all 3 options? Backcompat parser stuff goes through a reader keyed cache, built in stuff goes through a segment level ValueSource, and custom stuff using SortField can do whatever to override the ValueSource - they would cache if they wanted, etc.

So you could either override the default ValueSource, and provide your own override on the fly.

I guess that does seem to work out anyway ... ?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

>> EG when sorting by field, we could pull say an IntData iterator from
>> the reader, and then access the int values in docID order as we visit
>> the docs.
>>
> We need random access after collecting/visiting though...do we put  
> what we collect into a map? If a lot of docs match?

I guess I was thinking we'd copy the values into the  
FieldSortedHitQueue entry if the doc "made it" into the queue, though,  
I'm no longer liking this idea as much (such slurping can easily  
become very costly).

But, we could still save the binary search during comparisons when  
inserting into FieldSortedHitQueue if each entry maintained a pointer  
to the SegmentReader's array, and docID within that array.  Ie "cache"  
which sub-reader the binary search had chosen.  And, even though we'd  
have random access to the native array, since we are still accessing  
the field values sequentially by docID, we should not have to re-do  
the binary search every time.

If we can implement this approach (not sure exactly how just yet...)  
then I think that'd probably drop the additional cost of not having  
materialized the global field value array to near zero.

>> For norms, which we should eventually switch to FieldCache +
>> column-stride fields, it would be the same story.
>>
>> Accessing via iterator should go a long ways to reducing the overhead
>> of "using a method" instead of accessing the full array directly.
>>
> I agree...dropping the binary search per id access on a multi reader  
> is a must (at least for the common sort use case of field cache).
>
> I'm still attempting to consolidate my ideas on this. I've come at  
> it from a rather lazy approach - Hoss laid out a an API that was  
> 'almost' complete, and I just completed things and pushed in a  
> couple directions (the ArrayObject). I've been thinking from the  
> inside out. Originally I was just playing around and trying to build  
> some interest. I have more interest in getting this finished now  
> though, and it sounds like others do as well. This discussion will  
> hopefully get things moving.

I hear you!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654057#action_12654057 ] 
>
> Michael McCandless commented on LUCENE-831:
> -------------------------------------------
>
>
> One more thing here... while random-access lookup of a field's value
> via MultiReader dispatch requires the binary search to find the right
> sub-reader, I think in most internal uses, the access could be
> switched to an iterator instead, in which case the lookup should be
> far faster.
>   
This sounds good.
> EG when sorting by field, we could pull say an IntData iterator from
> the reader, and then access the int values in docID order as we visit
> the docs.
>   
We need random access after collecting/visiting though...do we put what 
we collect into a map? If a lot of docs match?
> For norms, which we should eventually switch to FieldCache +
> column-stride fields, it would be the same story.
>
> Accessing via iterator should go a long ways to reducing the overhead
> of "using a method" instead of accessing the full array directly.
>   
I agree...dropping the binary search per id access on a multi reader is 
a must (at least for the common sort use case of field cache).

I'm still attempting to consolidate my ideas on this. I've come at it 
from a rather lazy approach - Hoss laid out a an API that was 'almost' 
complete, and I just completed things and pushed in a couple directions 
(the ArrayObject). I've been thinking from the inside out. Originally I 
was just playing around and trying to build some interest. I have more 
interest in getting this finished now though, and it sounds like others 
do as well. This discussion will hopefully get things moving.

- Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654057#action_12654057 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


One more thing here... while random-access lookup of a field's value
via MultiReader dispatch requires the binary search to find the right
sub-reader, I think in most internal uses, the access could be
switched to an iterator instead, in which case the lookup should be
far faster.

EG when sorting by field, we could pull say an IntData iterator from
the reader, and then access the int values in docID order as we visit
the docs.

For norms, which we should eventually switch to FieldCache +
column-stride fields, it would be the same story.

Accessing via iterator should go a long ways to reducing the overhead
of "using a method" instead of accessing the full array directly.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Fix Version/s:     (was: 2.4)
                   3.0

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699166#action_12699166 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Thinking a bit on this this morning:

I think that will work out right. We would have 3 different attacks at ValueSource.

1. A ValueSource per segment reader that handles all default ValueSource needs - you get it with IndexReader.getValueSource. Its an UninversionValueSource wrapped by a CachingValueSource by default.

2. A singleton back compat value source that is wrapped by CacheByReaderValueSource. It has extra methods that takes Uninverters, allowing custom Uninverters and caching by Uninverter.

3. You can override the ValueSource used for Sorting by attaching it to the SortField. Likely, you would weap in a CacheByReaderValueSource and have your own singleton.

I think that gives us back compat and the best of both worlds?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689867#action_12689867 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

Adding to Tim, I'd like to see the ability not only to be notified of SegmentReader destruction, but of SegmentReader creation (within reopen) too. And new FieldCache logic should be built on these notifications.
Then it's possible to extend/replace Lucene's native FieldCache, then it's possible to create a cache specialized for trie-fields.

Linking objects to each other with WeakHashmaps is insanely evil, especially in the case when object creation/destruction is clearly visible.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654109#action_12654109 ] 

Robert Newson commented on LUCENE-831:
--------------------------------------


The conflict was easy to resolve, it was just an FYI, I appreciate trunk changes rapidly. I was just wondering if a feature branch would make synchronization easier.

Some meta-task to combine or track these things would be great. I hit the problem that LUCENE-1478 describes. I'd previously indexed numeric values with NumberTools, which gives String-based sorting, the most memory-intensive one.

It seems with this field cache approach and the recent FieldCacheRangeFilter on trunk, that Lucene has a robust and coherent answer to performing efficient sorting and range filtering for float, double, short, int and long values, perhaps it's time to enhance Document. That might cut down the size of the API, which in turn makes it easy to test and tune. Document could preclude tokenization for such fields, I suspect I'm not the only one to build a type-safe replacement to Document.

For what it's worth, I'm currently indexed longs using String.format("%019d") and treating dates as longs (getTime()) coupled with a long[] version of FieldCacheRangeFilter. It achieves a similar goal to this task, the long[] used for sorting is the same as for range filtering. 


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698300#action_12698300 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

By the way: If this new API goes into 2.9, the SortField ctors with parser and the whole parser support in SortField/Search-code can be removed, if Parser itsself is deprecated (as support for LUCENE-1478 was not released until now). New code could always use ValueSource.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708307#action_12708307 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I won't likely be getting to this anytime soon if someone else wants to work on it. I'll get back at it at some point if not though.

I believe the latest patch is a nice base to work from.

I'm still not clear to me if its best to start merging using the ValueSource somehow, or do something where the ValueSource has a merge implementation (allowing for a more efficient private merge). It seems the merge code for fields, norms, dels, is fairly specialized now, but could become a bit more generic. Then perhaps you could add any old ValueSource (other than norms, fields, dels)  and easily hook into the merge process. Maybe even in RAM merges of RAM based ValueSources - FieldCache etc. Of course, I guess you could also still do things specialized as now, and just provide access to the files through a ValueSource. That really crimps the pluggability though.

The next step (in terms of the current patch) seems to be to start working ValueSource into norms, dels, possibly stored fields. Eventually they should become pluggable, but I'm not sure how best to plug them in. I was thinking you could set a default ValueSource by field for the FieldCache using the Reader open method with a new param. Perhaps it should take a ValueSourceFactory that can provide a variety of ValueSources based on field, norms, dels, stored fields, with variations for read-only? The proposed componentization of IndexReader could be another approach if it materializes, or worked into this issue.

I don't think I'll understand whats needed for updatability until I'm in deeper. It almost seems like something like setInt(int doc, int n), setByte(int doc, byte b) on the ValueSource might work. They could possibly throw Unsupported. I know there are a lot of little difficulties involved in all of this though, so I'm not very sure of anything at the moment. The backing impl would be free to update in RAM (say synced dels), or do a copy on write, etc. I guess all methods would throw Unsupported by default, but if you override a getXXX you would have the option of  overriding a setXXX. 

ValueSources also need the ability to be sharable across IndexReaders with the ability to do copy on write if they are shared and updatable.



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649180#action_12649180 ] 

Alex Vigdor commented on LUCENE-831:
------------------------------------

Another useful feature that seems like it would be pretty easy to implement on top of this patch is cache warming; that is, the ability to reopen an IndexReader and repopulate its caches before making it "live".  The main thing missing is a Cache.getKeys() method which could be used to discover what caches are already in use, in order to repopulate them after reopening the IndexReader.  This cache warming could be performed externally to the IndexReader (by calling getCacheData for each of the keys after reopening), or perhaps the reopen method could be overloaded with a boolean "reloadCaches" to perform this in the same method call.

The rationale for this I hope is clear; my application, like many I'm sure, keeps a single IndexReader for handling searches, and in a separate thread from search request handling commits writes and reopens the IndexReader before replacing the IndexReader reference for new searches (reference counting is then used to eventually close the old reader instances).  It would be ideal for that separate thread to also bear the expense of cache warming; even with this patch against our 4 GB indices, along with -Duse.object.array.sort=true,  a search request coming immediately after reopening the index will pause 20-40 seconds while the caches refill.  Preferably that could be done in the background and request handling threads would never be slowed down.



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698254#action_12698254 ] 

Mark Miller edited comment on LUCENE-831 at 4/12/09 12:09 PM:
--------------------------------------------------------------

I began to make the switch of allowing an Uninverter to be attached to a SortField (like Parser) with the idea of doing backcompat by wrapping the Parser with an Uninverter. I caught myself though, because the reason I wasn't doing that before is that Uninverter may not make sense for a given ValueSource. By forcing you to set the Uninverters per type on the UninversionValueSource instead (current posted patch), this issue is kind of avoided - the possible problem is that it seems somewhat less dynamic in that you set it once on the ValueSource and leave it instead of being able to pass any impl any time on a SortField. Perhaps not so bad. But then how do I set the Uninverter for back compat with an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to the UninversionValueSource does it? Thread safety issues, and ... but then how to handle back compat with the parser? ... 

      was (Author: markrmiller@gmail.com):
    I began to make the switch of allowing an Uninverter to be attached to a SortField (like Parser) with the idea of doing backcompat by wrapping the Parser with an Inverter. I caught myself though, because the reason I wasn't doing that before is that Inverter may not make sense for a given ValueSource. By forcing you to set the inverters per type on the UninversionValueSource instead (current posted patch), this issue is kind of avoided - the possible problem is that it seems somewhat less dynamic in that you set it once on the ValueSource and leave it instead of being able to pass any impl any time on a SortField. Perhaps not so bad. But then how do I set the Uninverter for back compat with an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to the UninversionValueSource does it? Thread safety issues, and ... but then how to handle back compat with the parser? ... 
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699016#action_12699016 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I was going to throw in that constructor Uwe, and I got caught up doing a few things. I'm not so sure we can stick with attaching the ValueSource on the SortField. In the end, we'd really like the cache to be held per segment in the reader, rather than a monster reader cache. So I have moved the standard ValueSource back to the reader. I've got a separate massive reader cache (as in, everything goes in one cache keyed by reader) for handling the backcompat issues with custom parsers in FieldCache (it would be awesome to get this out before having to deprecate the SortField parser stuff).

This doesnt really jive well with passing in ValueSources on the fly with SortField. Which sucks, because that was nice.

What do you think about having to pull off Trie from the ValueSource set on your reader at reader init? I'm not thinking its super pretty - I guess you take which fields to override at init, and then do trie stuff for certain fields, but pass through to the default ValueSource impl for other fields?

I hope that can be improved, but i don't see how at the moment...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699448#action_12699448 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

By the way: In the patch, the ValueSource set in SortField seems to be never used when building the comparators. If it is used, when applying the patch attached by me some days ago (LUCENE-831-trieimpl.patch), the trie tests should also work.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700529#action_12700529 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones? It's not
either/or?{quote}

Good point. I was also thinking that some of these issues could force back up to multi-reader support though. But I guess that is not such a worry now that we search per segment is it. A lot of that could probably be deprecated (though I really don't know how easily - I hope to spend a lot more time getting more familiar with IndexReader code).

{quote}[almost] for free, we'd get
pluggability of deleted docs & norms.{quote}
I like that idea as well. Plugability is nice.

{quote}I'm also more liking "per field" to be somehow handled. Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.{quote}
I want field handling to become easier in Lucene, but I hope we don't lose any of our super on the fly settings. +1 on making field handing easier, but I am much more weary of a fixed schema type thing.

{quote}But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance. {quote}

I am very interested in having updatable CSF's (much too easy to mistype that). There are many cool things to use it for, especially in combination with near realtime search (tagging variations).


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated LUCENE-831:
----------------------------

    Attachment: fieldcache-overhaul.diff

i haven't had any time to do further work on this issue ... partly because i haven't had a lot of time, but mainly because i'm hoping to get some feedback on the overall approach before any more serious effort investment.

updated patch to work against the trunk (r544035)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699663#action_12699663 ] 

Mark Miller commented on LUCENE-831:
------------------------------------


Thats somewhat possible now (with the exception that you can't yet set the value source for the segment reader yet - it would likely become an argument to the static open methods): ValueSource gets a field as an argument, so it is also easy enough to set a ValueSource that does trie encoding for arbitrary fields on the SegmentReader, eg FieldTypeValueSource could take arguments to configure it per field and then you set it on the IndexReader when you open it. Thats all still in the patch - its just a bit more of a pain than being able to set it at any time on the SortField as an override.

I guess I almost see things going just to the segment reader valuesource option though - once FieldCache goes back to standard, it might make sense to drop the SortField valuesource support too, and just do the segment ValueSource. Being able to init the SegmentReader with a ValueSource really allows for anything needed - I just wasn't sure if it was too much of a pain in comparison to also having a dynamic SortField override.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697812#action_12697812 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Some random thoughts:

If we are going to allow random access, I like the idea of sticking with the arrays. They are faster than hiding behind a method, and it allows easier movement from the old API. It would be nice if we can still deprecate all of that by backing it with the new impl (as done with the old patch).

The current API (from this patch) still looks fairly good to me - a given cachekey gets your data, and knows how to construct it. You get data something like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It could be improved, but it seems a good start to me.

The immediate problem I see is how to handle multireader vs reader. Not being able to treat them the same is a real pain. In the segment case, you just want an array back, in the multi-segment perhaps an array of arrays? Or unsupported? I havn't thought of anything nice.

We have always been able to customize a lot of behavior with our custom sort types - I guess the real issue is making the built in sort types customizable. So I guess we need someway to say, use this "cachekey" for this built in type?

When we load the new caches in FieldComparator, can we count on those being segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I suppose its really tied in with the multireader vs reader API.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653757#action_12653757 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. change norm caching to use new caches (if not the same

I think we could go even further, and [eventually] change norms to use an iterator API, which'd also have the same benefit of not requiring costly materialization of a full byte[] array for every doc in the index (ie, reopen() cost would be in proportion to changed segments not total index size).

Likewise field cache / stored fields / column stride fields could eventually open up an iterator API as well.  This API would be useful if eg in a custom HitCollector you wanted to look at a field's value in order to do custom filtering/scoring.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624529#action_12624529 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Deprecating the function package is problematic. Doing it by method leaves the classes open to issues maintaining two possible states that could be mixed - the FieldCache and parsers are used as method params - you can't really replace the old internal representation with the new one. You really have to support the old method and new method and tell people not to use both or something. Doing it by class means duplicating half a dozen classes with new names. Neither is very satisfying. This stuff is all marked experimental and subject to change though - could it just be done clean sweep style? A little abrupt but...experimental is experimental <g>

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699893#action_12699893 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

We have the problem with the ValueSource-override not only with SortField. Also Functions Queries need the additional ValueSource-override and other places too. So a central place to register a "ValueSource per field" for a IndexReader (MultiReader,... passing down to segments) would really be nice.

For the caching problem: Possibly the ValueSource given to SortField etc. behaves like the current parser. The cache in IndexReader should also be keyed by the ValueSource. So the SortField/FunctionQuery ValueSource override is passed down to IndexReader's cache. If the IndexReader has an entry in its cache for same (field, ValueSource, ...) key, it could use the arrays from there, if not fill cache with an array from the overridden ValueSource. I would really make the ValueSource per-field.

Univerter inner class should be made public and the Univerter should accept a starting term to iterate (overwrite "") and the newTerm() method should be able to return false to stop iterating (see my ValueSource example for trie). With that one could easily create a subclass of univerter with a own parser logic (like trie).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698286#action_12698286 ] 

Mark Miller edited comment on LUCENE-831 at 4/12/09 4:17 PM:
-------------------------------------------------------------

I really like CachingValueSource, but somehow the cache has to move to the segmentreader or something... (now that there is not a segmentreader -> valuesource instance mapping)

      was (Author: markrmiller@gmail.com):
    I really like CachingValueSource, but somehow the cache has to move to the segmentreader or something... (not that there is not a segmentreader -> valuesource mapping)
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697540#action_12697540 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

{quote}
I guess you meant dangerous as in dangerous to reopen then? I actually thought you meant as in dangerous because it could require too may resources. Dangerous is a tough to pin down word 
{quote}
Dangerous is a dangerous word ;)

I meant: I don't like exposing non-performant APIs; they are sneaky traps.  (EG TermEnum.skipTo is another such API).

bq. So what are the advantages of the iterator API again then?

The big advantage is the possibility of backing it with eg an IndexInput, so that the values need to all be in RAM for one segment.  Though, as Lucy is doing, we could leave things on disk and still have random access via mmap, which perhaps should be an option for Lucene as well.  However, iterator only messes up out-of-order scoring (BooleanScorer for OR queries), so I'm tentatively leaning against iterator only at this point.

bq. but I'd love to put a little sweat into this issue.

That would be AWESOME (if you can somehow make time)!

We should hash out the design a bit before figuring out how/where to start.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700177#action_12700177 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

I've been struggling with the "right" way forward here... despite
following all comments and aggressive ongoing mulling, I still don't
have much clarity.

It feels like one of those features that just hasn't quite "clicked"
yet (to me at least).  In fact, the more I try to think about it, the
less clarity I get!

I think there're some cncrete reasons to create a new API (some
overlap w/ Mark's list above):

  * Make caching "external"/public so you can control when things are
    evicted

  * Cleaner API -- it's just awkward that you now must call a separate
    place (ExtendedFieldCache.EXT_DEFAULT) to getInts.  FieldCache &
    ExtendedFieldCache are awkward, and they are interfaces.  It makes
    more sense to ask the reader directly for ints (or a future
    component of the reader).

  * Better extensibility on uninversion (either via "you make your own
    ValueSource entirely", or "you can subclass Uninverted and tweak
    it").  Trie needs this (though, we have a viable approach in field
    cache).  Fields with more than one value want custom control to
    pick one.

  * Making it not-so-easy to get all field values at the reader level
    (don't set dangerous API traps)

Honestly these reasons are not net/net compelling enough to warrant a
whole new API?  They are fairly minor.  And I agree: LUCENE-1483 has
already achieved the biggest step forward here.

Furthermore, there are other innovations happening that may affect how
we do this. EG LUCENE-1597 introduces type information for fields (at
least at indexing time), and Earwin is working on "componentizing"
SegmentReader.  Normally I don't like letting "big distant future
feature X" prevent progess on "today's feature Y", but since we lack
clarity on Y...

I can imagine a future when the FieldType would be the central place
that records all details for a field:

  * The analyzer to use (so we don't need PerFieldAnalyzerWrapper)

  * The ValueSource

  * It's "native" type (now "switched" in many places, like
    FieldComparator, SortField, FieldCache, etc.)

  * All the index-time configuration

And then instead of having ValueSource dispatch per field, we'd simply
ask the FieldType what it's source is.

Finally, there are a number of future improvements we should take into
account.  We wouldn't try to accomplish these right now, but we ought
to think about them (eg, not preclude them) in whatever approach we
settle on:

  * We need source pluggability for when CSF arrives (but, admittedly,
    we could wait until CSF actually does arrive)

  * Allowing values to change, just like we can call
    IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
    copy-on-write approach, like norms & deleted docs.

  * How would norms be folded into this?  Ideally, each field could
    choose to pull its norms from any source.  Document level norms
    was discussed somewhere, and should easily "fit" as another norms
    source.  We'd need to relax how per-doc-field boosting is computed
    at runtime to pull from such "arbitrary" sources.

  * Deleted docs could also be represented as a ValueSource?  Just one
    bit per doc.  This way one could swap in whatever source for
    "deleted docs" one wanted.

  * Allowing for docs that have more than one value.  (We'd also need
    to extend sorting to be able to compare multiple vlaues).

  * An mmap implementation (like Lucy/KS) -- should feel just like CSF
    or uninversion (ie, "just another impl").

  * Impls of getStrings and getStringIndex that are based on offsets
    into char[] (not actual individual String object).

  * Good impls for the enum case (all strings could be considered
    enums), eg if there are only 100 unique strings in that field, you
    only need 7 bits per ord derefing into the char[] values.

  * Possible future when Lucene computes sort cache (for text fields)
    and stores in the index

  * Allowing field sort to use an entirely external source of values

There's alot to think about :)


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582471#action_12582471 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

>If you're going to incrementally update a FieldCache of a MultiReader, it's the same issue... can't merge the ordinals without the original (String) >values.

That is a great point.

>should we switch to a method call, instead of a straight array, to retrieve a cached value for a doc?

Sounds like a great idea to me. Solves the StringIndex merge and eliminates all merge costs at the price of a method call per access.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698418#action_12698418 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

bq. I've got the tests passing in general, but still have to figure out better/right caching (the way I allow you to pass in an Uninverter skips caching completely! plus the above), need to deal with ValueSource as a field on SortField - require it being serializable...ugg.

Mh, FieldCache.Parser is also not serializable and the others like the Comparators are not serializable, too. Why not simply pass the ValueSource to SortField like the Parser or Locale? It worked until now without serialization and so I think we should remove serialization from SortField. The factory is silly. If we have a factory there, we should also have a factory for parsers...

By the way, can you add the ctor mentioned above to your patch, I need it to sucessfully test the new TrieValueSource I wrote yesterday (see patch). This is a good test case for the extensibility of your new API.
By the way: For TrieRange, I made the ValueSource a static variable. In general the ValueSources should be singletons (maybe this is why you created the factory).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

This should give us some more concrete to start from. Still early and rough (I started to put nocommits, but too early to worry much yet). I have pushed things in general to the proposed API, but its still fairly rough. Core sort tests pass (no custom comparator, remote), but custom parsers are currently ignored. A bunch of the unfun back compat / little stuff is undone.

bq. How about something like this (NOTE: not compiled/tested):

Thats kind of what I did (though not named as nicely, I'll update), but does that solve easily telling the IndexReader (valuesource) what parsers to use by a user? I suppose what your proposing is no custom parser? Instead a custom inverter - but still the problem of a user simply specifying inverters per field when initing the indexreader? Of course you can do a whole new UninversionVS imp, but it just seemed a little heavy handed compared to giving a parser the SortField... I'll update the code a bit and think some...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Brings this patch back up to trunk level.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Marvin Humphrey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655081#action_12655081 ] 

Marvin Humphrey commented on LUCENE-831:
----------------------------------------

> Marvin, does KS/Lucy have something like FieldCache? If so, what API do you
> use? Is it iterator-only? 

At present, KS only caches the docID -> ord map as an array.  It builds that
array by iterating over the terms in the sort field's Lexicon and mapping the
docIDs from each term's posting list.

Building the docID -> ord array is straightforward for a single-segment
SegLexicon.  The multi-segment case requires that several SegLexicons be
collated using a priority queue.  In KS, there's a MultiLexicon class which
handles this; I don't believe that Lucene has an analogous class.

Relying on the docID -> ord array alone works quite well until you get to the
MultiSearcher case.  As you know, at that point you need to be able to
retrieve the actual field values from the ordinal numbers, so that you can
compare across multiple searchers (since the ordinal values are meaningless).

{code}
Lex_Seek_By_Num(lexicon, term_num);
field_val = Lex_Get_Term(lexicon);
{code}

The problem is that seeking by ordinal value on a MultiLexicon iterator
requires a gnarly implementation and is very expensive.  I got it working, but
I consider it a dead-end design and a failed experiment.

The planned replacement for these iterator-based quasi-FieldCaches involves
several topics of recent discussion:

  1) A "keyword" field type, implemented using a format similar to what Nate 
     and I came up with for the lexicon index.
  2) Write per-segment docID -> ord maps at index time for sort fields.
  3) Memory mapping.
  4) Segment-centric searching.

We'd mmap the pre-composed docID -> ord map and use it for intra-segment
sorting.  The keyword field type would be implemented in such a way that we'd
be able to mmap a few files and get a per-segment field cache, which we'd then
use to sort hits from multiple segments.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698135#action_12698135 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Any ideas on where parser fits in with valuesource? Its easy enough to kind of keep it how it is, but then what if CSF can be stored as a byte rep of an int or something? parse(String) won't make any sense. If we move Parser up to an Impl of valuesource, we have to special case things -

Any thoughts? Just stick with allowing the passing of a 'String to type' Parser and worry about possible byte handling later? A different parser object of some kind?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698248#action_12698248 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

bq. I'm kind of worried that any change is going to hurt Apps like Solr - if you end up using the new built in API, but also have code that must stick for a while with the old API (for multireader fieldcache or something), you'll likely increase RAM usage more than before - how much of a concern that ends up being, I'm not sure.
My personal stance is that until you have one perfectly thought out API, nothing should restrain you from changing it. It's better to feel pain once or twice, when you adapt to API changes, then to feel it constantly, each time you're using that half-assed thing you're keeping around for back-compat. Look at google-collections. They did some really breaking changes since they released, but most of them eased my life after I made my project compile and run with the new version of their library.

bq. In Lucy, I expect StringType to fill this role. FullTextType is for multi-token fields.
In our case multiplicity is defined on FieldInfo level. Because we can have one Int field that holds some value, and another Int field that holds several ids.
Same goes for the String field - you might want tags on a document that are represented as untokenized strings, but each document can have 0..n of them.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654064#action_12654064 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Ah, the dirty secret of 831 - there is plenty more to do :) I've been pushing it down the path, but I've expected radical changes to be needed before it goes in.

bq. But I'm concerned, because this change continues the "materialize massive array for entire index" approach, which is the major remaining cost when (re)opening readers. EG, isMergable()/mergeData() methods build up the whole array from sub readers.

Originally, 3.0 wasn't so close, so there was more concern with back compatibility than there might be now. I think the method call will be a slight slowdown no matter what as well...even with an iterator approach. Perhaps other "wins" will make up for it though. Its certainly cleaner to support 'one' mode.

bq. What would it take to never require materializing the full array for the index, for Lucene's internal purposes (external users may continue to do so if they want)? Ie, leave the array bound to the "leaf" IndexReader (ie, SegmentReader). 

I'm not sure I fully understand yet. If you use the ObjectArray mode, this is what happens right? Each sub array is bound to the IndexReader and MultiReader will distribute the requests to the right subreader. Only if you use the primitive arrays and merging do you get the full arrays (when not using USE_OA_SORT).

bq. I realize this is a big change, but I think we need to get there eventually.

Sounds good to me.

bq. (why do we have get2?) 

Because a StringIndex needs to access both the array of Strings and a second array indexing into that. None of the other types need to access two arrays.

bq. Couldn't we expose eg an IntData class (and all other types) that has int get(docID) abstract method, that delegate to child readers?

Yeah, I think this would be possible. If casting does indeed cost so much, this may bring things closer to the primitive array speed.

bq. I'm also generally confused by why we have the per-atomic-type switching happening in CacheKey subclasses and not CacheData.

>From Hoss' original design. What are your concerns here? The right key gets you the right data :) I've actually mulled this over some, buts its too early in the morning to remember I suppose. I'll look at it some more.

bq. If we could fix field-sorting like that (and I'm hazy on exactly how to do so), I think Lucene internally would then never need the full array?

That would be cool. Again, I'll try to explore in this direction. It doesn't need the full array when using the ObjectArray stuff now though (well, it kind of does, just split up over the readers).
 
bq. This change also adds USE_OA_SORT, which is scary to me because Object overhead per doc can be exceptionally costly. Why do we need to even offer that?

All this does at the moment (and I hate system properties, but for the moment, thats whats working) is switch between using the primitive arrays and merging or using the distributed ObjectArray for internal sorting. It defaults to using the primitive arrays and merging because its 5-10% faster than using the ObjectArrays. The ObjectArray approach is just an ObjectArray backed by an array for each Reader - a MultiReader distributes a requests for a doc field to the right Readers ObjectArray.

To your second comment...I'm gong to have to spend some more time :)

No worries though, this is very much a work in progress. I'd love to have it in by 3.0 though. Glad to see someone else taking more of an interest - very hard for me to find the time to dig into it all that often. I'll work with the code some as I can, thinking more about your comments, and perhaps I can come up with some better responses/ideas. 



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484518 ] 

Otis Gospodnetic commented on LUCENE-831:
-----------------------------------------

I haven't looked at the patch yet.  However, I do know that a colleague of mine is about to start porting some FieldCache-based Filter stuff to Lucene's trunk.  Because that work may conflict with Hoss' changes here, we should see if this get applied first.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch reassigned LUCENE-831:
------------------------------------

    Assignee: Michael Busch

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697822#action_12697822 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}
bq. We have always been able to customize a lot of behavior with our custom sort types - I guess the real issue is making the built in sort types customizable. So I guess we need someway to say, use this "cachekey" for this built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.
{quote}

Right - since a custom cachekey builds the array from a reader, you can pretty much do anything. What I meant was that you could do anything before with a custom sort type as well - the problem was that you could not say use this custom sort type when sorting on a built in type (eg INT, BYTE, STRING). So thats all we need, right? A way to say, use this builder (cachekey) for LONG, use this one for INT, etc. When we get CSF, you would set it to use cachekeys that built arrays from that data.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698235#action_12698235 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Note: I think we should add the option to sort nulls first or last with this.

You mean for getStringIndex()?  I agree!

Actually, it'd be nice to disallow nulls entirely, somehow, since this forces us to sprinkle null checks all over the place in StringOrdVarlComparator.  Maybe we would allow you to pass in a "null equivalent", eg you could use "", "UNDEFINED", whatever, as long as it's a valid string.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-831:
------------------------------------

    Comment: was deleted

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654418#action_12654418 ] 

Robert Newson commented on LUCENE-831:
--------------------------------------


Yes, something like that. I made a Document class with an add method for each primitive type which allowed only the sensible choices for Store and Index. Field subclasses would achieve the same thing. A subclass per primitive type might be excessive, they'd be 99% identical to each other. A NumericField that could hold a single short, int, long, float, double or Date might be enough (new NumericField(name, 99.99F, true), the final boolean toggling YES/NO for Store, since Index is always UNANALYZED_NO_NORMS).

Adding this to FieldInfo would change the on-disk format such that it remembers that a particular field is of a special type?  That way all the places that Lucene currently has a multiplicity of classes or constants (SortField.INT, etc) could be eliminated, replaced by first class support in Document/Field.

A remaining question would be whether field name is sufficient for uniqueness, I suggest it becomes fieldname+type. This also implies changes to the Query and Filter hierarchy. 

If it helps, I can post my Document class, which had helper methods for RangeFilter and TermQuery's for each type. It's not a complicated class, you can probably already picture it.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reassigned LUCENE-831:
----------------------------------

    Assignee: Mark Miller

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582422#action_12582422 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


One question here: should we switch to a method call, instead of a
straight array, to retrieve a cached value for a doc?

If we did that, then MultiSearchers would forward the request to the
right IndexReader.

The benefit then is that reopen() of a reader would not have to
allocate & bulk copy massive arrays when updating the caches.  It
would keep the cost of reopen closer to the size of the new segments.
And this way the old reader & the new one would not double-allocate
the RAM required to hold the common parts of the cache.

We could always still provide a "give me the full array" fallback if
people really wanted that (and were willing to accept the cost).



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625470#action_12625470 ] 

Hoss Man commented on LUCENE-831:
---------------------------------

bq. What benefit do you see to this? Does it offer anything over the simple map used now?

I *think* what i ment there was that if the norm caching used the same functionality then it could simplify the code, and norms would be optimized in the reopen case as well ... plus custom Cache Impls could report stats on norm usage (just like anything else) so people could  see when norms were getting used for fields they didn't expect them to be.

But as i've said before ... most of my early comments were pie in the sky theoretical brainstorming comments -- don't read too much into them if they don't really make sense.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698258#action_12698258 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

This issue is fast moving!  Here're some thoughts on the last patch:

  * Need to pass in ValueSource impl to IndexReader.open, defaulting
    to Cached(Uninverted)

  * Maybe ValueSource should provide default "throws UOE" exceptions
    methods (instead of abstract) since it's rather cumbersome for a
    subclass that only intends to provide eg getInts().

  * Should CachingValueSource really include IndexReader in its key?
    I think it shouldn't?  The cache is already "scoped" to a reader
    because each SegmentReader will have its own instance.  Also, one
    might cache ValueSources that don't "have" a reader (eg pull from
    a DB or custom file format or something).  Cloning an SR for now
    should just copy over the same cache.

  * I think we should back deprecated FieldCache with ValueSource for
    the reader; one simple way to not ignore the Parser passed in is
    to anonymously subclass Uninverter and invoke the passed in
    ByteParser from its "newTerm" method?

  * Why do we need to define StringInverter, IntInverter, etc in
    UninversionValueSource?  Couldn't someone instead subclass
    UninversionValueSource, and override whichever getXXX's they want
    using their own anonymous uninverter subclass?

  * Should we deprecate function.FieldCacheSource, and make a new
    function.ReaderValueSource (ie pulls its values from the
    IndexReader's ValueSource)?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654069#action_12654069 ] 

Robert Newson commented on LUCENE-831:
--------------------------------------

This enhancement is particularly interesting to me (and the application my team is building). I'm not sure how much time I can donate but since I'd likely have to enhance this area of Lucene for our app anyway and it would be better to have it in core, I'd like to help out where I can.

The patch applies more or less cleanly against 2.4.0, but not trunk, btw. Is it possible to get the patch committed to a feature branch off of trunk perhaps?

Finally, I'm most interested in the ability to make disk-backed caches. A very quick attempt to put the cache into JDBM failed as the CacheKey classes are not Comparable, which seems necessary for most kinds of disk lookup structures. SimpleMapCache uses a HashMap, which just needs equals/hashcode methods.

The other benefit to this approach is that it allows the data structures needed for sorting to be reused for range filtering. My application needs both, though on numeric field types (dates predominantly).

Finally, this might also be a good time to add first class support for non-String field types. The formatting that NumberTools supplies is incompatible with SortField (the former outputs in base 36, the latter parses with parseLong), so there's clearly been several approaches to the general problem. In my case, I wrote a new Document class with addLong(name, value), etc.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581758#action_12581758 ] 

Yonik Seeley commented on LUCENE-831:
-------------------------------------

> I agree that it would be nice to skip the String[] if a MultiSearcher was not being used.

If you're going to incrementally update a FieldCache of a MultiReader, it's the same issue... can't merge the ordinals without the original (String) values.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698135#action_12698135 ] 

Mark Miller edited comment on LUCENE-831 at 4/11/09 2:28 PM:
-------------------------------------------------------------

Any ideas on where parser fits in with valuesource? Its easy enough to kind of keep it how it is, but then what if CSF can be stored as a byte rep of an int or something? parse(String) won't make any sense. If we move Parser up to an Impl of valuesource, we have to special case things -

Any thoughts? Just stick with allowing the passing of a 'String to type' Parser and worry about possible byte handling later? A different parser object of some kind?

*edit*

I guess its not so bad if parser is still first class and something that read bytes would just ignore the given parser? The parsing would just be built into something specialized like that, and it would be free to ignore a given String parser?

      was (Author: markrmiller@gmail.com):
    Any ideas on where parser fits in with valuesource? Its easy enough to kind of keep it how it is, but then what if CSF can be stored as a byte rep of an int or something? parse(String) won't make any sense. If we move Parser up to an Impl of valuesource, we have to special case things -

Any thoughts? Just stick with allowing the passing of a 'String to type' Parser and worry about possible byte handling later? A different parser object of some kind?
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698104#action_12698104 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. E.g.: One have a CSF and a TrieField and several normal int/float fields. For each of these fields he needs another ValueSource.

Couldn't we make a "PerFieldValueSource" impl to handle this?  (And leave the switching logic out of IndexReader).

I think another useful ValueSource would be one that first consults CSF and uses that, if present, else falls back to the uninversion source.

bq.  The sort code would get the type and the arrays from the underlying IndexReaders.

I'm not sure this'll work -- IndexReader still won't know what type to ask for, for a given field?

bq. are the functions query value sources then obsolete and can be merged with the "new" ValueSource

Well, the API is a little different (this API returns int[], but the function query's ValueSource has int intVal(int docID)), so I think we'd wrap the new API to match function query's  (ie, cut over function query's use of FieldCache to this new FieldCache).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700154#action_12700154 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I think, it would be a good idea to have the Univerter static class not package private, but "protected" (if possible) or "public". Then it would be simple, to subclass the UniverterValueSource (e.g. for the TriePackage) and then just use own univerter subclasses to fill the cache. The idea is to make this more simple.
As you may have seen in my TrieValueSource I had to redeclare the Univerter.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697519#action_12697519 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. But yes both of them consume RAM, but I don't consider that "dangerous".

I guess you meant dangerous as in dangerous to reopen then? I actually thought you meant as in dangerous because it could require too may resources. Dangerous is a tough to pin down word ;)

So what are the advantages of the iterator API again then? It not likely you are going to stream the values, and random access will likely still have a use as mentioned.

Just trying to get a clearer picture in my head - I doubt I'll have time, but I'd love to put a little sweat into this issue.

It probably makes sense to start from one of Hoss's original patches or even from scratch.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625084#action_12625084 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I've got the function package happily deprecated with replacement methods.

I am going to ballpark the sort speed with method calls to be about 10% slower, but there are a lot of variables and I am still benchmarking.

I've added a check for a system property so that by default, sorting will still use primitive arrays, but if you want to pay the small sort cost you can turn on the method calls and get faster reopen without cache merging.

So that wraps up everything I plan to do unless comments, criticisms bring up anything else.

The one piece that is missing and should be addressed is an efficient merge for the StringIndex.

Patch to follow.

- mark

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Marvin Humphrey wrote:
> On Sat, Dec 06, 2008 at 04:21:04PM -0500, Mark Miller wrote:
>   
>> And to get 30 docs over 3 indexes, you ask for 90. And sort twice.
>>     
>
> However, this scales with the number of segments, not the number of documents.
>
> Marvin Humphre
Right. They are all minor gripes to be sure. All together, they cause me 
to prefer a single Index solution though - if not just for the odd piece 
of mind. Its kind of the path that solr is on now - a single Index (you 
cant search across multi cores), no MultiSearcher.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sat, Dec 06, 2008 at 04:21:04PM -0500, Mark Miller wrote:
> And to get 30 docs over 3 indexes, you ask for 90. And sort twice.

However, this scales with the number of segments, not the number of documents.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Uwe Schindler <uw...@thetaphi.de>.

> >>> MultiSearcher has a few aspects I don't like.
> >>
> >> Do you mean the score differences vs IndexSearcher(MultiReader), or
> >> is there something else?
> > And rewrite does not work properly. And to get 30 docs over 3
> > indexes, you ask for 90. And sort twice.
> 
> I'm thinking we stick with MultiReader, but improve it so that when
> sorting by fields can use a [new FieldCache like] API such that gives
> it the benefits that MultiSearcher has.  Ie, best of both worlds.

That would be very nice! In my code, I generally have an index organization
layer using Readers and combining them to MultiReaders. The Search API is
mostly decoupled from that, so when I do a search, I instantiate a
IndexSearcher using any "abstract" Reader without knowing if it is a
MultiReader, Directory/SegmentReader or whatever. If you are working with
this code design, your suggestion is very important!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hallo Robert,

> This is why I think for many users the field cache is not the best
> solution. If you have lots of documents but searchers that return
> relatively few, then using filters and sorting the results using
> stored fields is far more efficient.
> 
> It seems to me that the field cache is only appropriate when the
> documents have very few fields in play (1-3 ?), otherwise cached
> range filters will work better.  If we also have partitioned (trie
> query) and compressed filters, then the cache is only useful for
> sorting.

I was all the time thinking about it. Maybe we should write a wiki page on
the different RangeQueries and Caches. It is very hard to find the right
solution. Currently you have three possibilities:

A) use TrieRangeQuery without any cache, this works fast and affects only a
almost constant number (with upper limit) of terms (even when the range is
very large, half open, etc.) and may returns many documents. This is my
preferred solution for floating point ranges and ranges with lot's of
different values. TrieRangeQuery can be tuned by using the CachingFilter
(that caches the BitSet). The FieldCache is of no use for this query type. I
recommend this type also for often changing indexes, because no FieldCache
needs to be refilled on index change.

B) use FieldCacheRangeFilter. This needs the initial filling of the filter
and needs more memory. The pros of this Filter is that it may completely
work in memory and reuses the cache used for searching, but its query time
still largely depends largely on the number of documents in index (you have
to iterate over the field cache during the search). This may be optimized by
binary searches etc, but this would only be good, if the field cache would
be sorted by values and not by document numbers.

> The most important use for the field cache seems to be the case where
> a query returns lots of documents, say by date range, AND you want
> the most recent ones to score higher (needing the sort) - basically
> using the cache for the selection and the sort.

Correct.

I think we should create a wiki page comparing the different Range
approaches. There are many combinations of FieldCache, Trie structures,
RangeFilters and caches.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by robert engels <re...@ix.netcom.com>.

One thing to keep in mind about using the field cache for filter  
caching.

The filter bitset cache at worst holds 8 documents per byte (and with  
bitset compression this can be even more efficient).

Using the field cache is going to rather be bytes per document, most  
likely at least an order of magnitude greater, and maybe 2 (if using  
strings fields, due to object overhead, length of string, plus the  
management fields...).

This is why I think for many users the field cache is not the best  
solution. If you have lots of documents but searchers that return  
relatively few, then using filters and sorting the results using  
stored fields is far more efficient.

It seems to me that the field cache is only appropriate when the  
documents have very few fields in play (1-3 ?), otherwise cached  
range filters will work better.  If we also have partitioned (trie  
query) and compressed filters, then the cache is only useful for  
sorting.

The most important use for the field cache seems to be the case where  
a query returns lots of documents, say by date range, AND you want  
the most recent ones to score higher (needing the sort) - basically  
using the cache for the selection and the sort.

On Dec 7, 2008, at 3:42 AM, Michael McCandless wrote:

>
> Mark Miller wrote:
>
>>
>>>> MultiSearcher has a few aspects I don't like.
>>>
>>> Do you mean the score differences vs IndexSearcher(MultiReader),  
>>> or is there something else?
>> And rewrite does not work properly. And to get 30 docs over 3  
>> indexes, you ask for 90. And sort twice.
>
> I'm thinking we stick with MultiReader, but improve it so that when  
> sorting by fields can use a [new FieldCache like] API such that  
> gives it the benefits that MultiSearcher has.  Ie, best of both  
> worlds.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

>
>>> MultiSearcher has a few aspects I don't like.
>>
>> Do you mean the score differences vs IndexSearcher(MultiReader), or  
>> is there something else?
> And rewrite does not work properly. And to get 30 docs over 3  
> indexes, you ask for 90. And sort twice.

I'm thinking we stick with MultiReader, but improve it so that when  
sorting by fields can use a [new FieldCache like] API such that gives  
it the benefits that MultiSearcher has.  Ie, best of both worlds.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

>> MultiSearcher has a few aspects I don't like.
>
> Do you mean the score differences vs IndexSearcher(MultiReader), or is 
> there something else?
And rewrite does not work properly. And to get 30 docs over 3 indexes, 
you ask for 90. And sort twice.

Minor gripes, but bugs me non the less.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Mark,

> I'm going to dig in more tonight I hope. The main issue is that using
> SortType.AUTO blows up because the MultiSearcher code expects it already
> to have been resolved to a sort type, but my hack kept that from
> happening so it hits a switch statement for AUTO that throws an
> exception. That shouldn't be a problem. The other issue is a weird
> situation where it hits an INT switch statement when the sort type is
> Byte and throws a class cast exception. I didn't have time to look into
> anything, but I'm guessing its AUTO related as well. Both probably non
> issues once I get time to do more.

Haha, the same bug I encountered when doing my LUCENE-1478 developments. The
bug is in the comparator, it returns INT instead of BYTE in
FieldSortedHitQueue. It is fixed in my patch for LUCENE-1478, which is
currently under discussion, maybe we fix this before LUCENE-1478 in a
separate patch/issue.

But you can fix it simply, just correct it here in FieldSortedHitQueue.java
(this is a copy from my patch):

@@ -241,7 +257,7 @@
       }
 
       public int sortType() {
-        return SortField.INT;
+        return SortField.BYTE;
       }
     };
   }

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> I think it does make sense (it's well defined).  This is what the 
> SubsearcherTopDocs.convertTopDoc method is doing (in the 
> multisearcher.take2.patch on LUCENE-1471).
>
> In fact, returning by document order is a particularly trivial sort, 
> since you'd just have to concatenate the results coming out of the 
> pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[] 
> that contains a SortField.FIELD_DOC could be truncated since that sort 
> order is "total".  But these are minor optimizations which we 
> shouldn't worry about for now...
>
> Mike
Yeah, right again. Just trying to get out of what wasn't working and 
seemed like it should without work from me.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Michael McCandless wrote:
>>
>> Mark Miller wrote:
>>
>>> Mark Miller wrote:
>>>>>
>>>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>>>
>>>> Yes. First thing I did was try and patch this in, but the sort  
>>>> tests failed. It would be the right order, but like the two  
>>>> center docs would be reversed or something. No time to dig in, so  
>>>> I just switch to the trunk MultiSearcher and all tests passed  
>>>> except for the two with the above issues.
>>> Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting  
>>> different aspects of an issue thats messed up with the old  
>>> MultiSearcher as well.
>>
>> OK.  If you're building on LUCENE-1471, make sure you start from  
>> the first patch.  It'd be good to factor that logic (2nd pqueue for  
>> merging) out so it can be reused b/w IndexSearcher & MultiSearcher.
> I actually worked with the second. I'll take a look at the first  
> instead. I'm sticking with using the MultiSearcher for the first  
> patch - it can be worked out later if it speed things up.

OK.  And, the first now has a 2nd iteration (factors  
ParallelMultiSearcher to do the merge sort too).

> Does returning by document id order even make sense with this  
> though? Did it make sense with MultiSearcher? They are pseudo ids  
> (mapped), so it almost seems I can't support that right...it would  
> depend on the order of the readers.

I think it does make sense (it's well defined).  This is what the  
SubsearcherTopDocs.convertTopDoc method is doing (in the  
multisearcher.take2.patch on LUCENE-1471).

In fact, returning by document order is a particularly trivial sort,  
since you'd just have to concatenate the results coming out of the  
pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[]  
that contains a SortField.FIELD_DOC could be truncated since that sort  
order is "total".  But these are minor optimizations which we  
shouldn't worry about for now...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> Mark Miller wrote:
>
>> Mark Miller wrote:
>>>>
>>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>>
>>> Yes. First thing I did was try and patch this in, but the sort tests 
>>> failed. It would be the right order, but like the two center docs 
>>> would be reversed or something. No time to dig in, so I just switch 
>>> to the trunk MultiSearcher and all tests passed except for the two 
>>> with the above issues.
>> Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting 
>> different aspects of an issue thats messed up with the old 
>> MultiSearcher as well.
>
> OK.  If you're building on LUCENE-1471, make sure you start from the 
> first patch.  It'd be good to factor that logic (2nd pqueue for 
> merging) out so it can be reused b/w IndexSearcher & MultiSearcher.
I actually worked with the second. I'll take a look at the first 
instead. I'm sticking with using the MultiSearcher for the first patch - 
it can be worked out later if it speed things up.

Does returning by document id order even make sense with this though? 
Did it make sense with MultiSearcher? They are pseudo ids (mapped), so 
it almost seems I can't support that right...it would depend on the 
order of the readers.

- Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Mark Miller wrote:
>>>
>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>
>> Yes. First thing I did was try and patch this in, but the sort  
>> tests failed. It would be the right order, but like the two center  
>> docs would be reversed or something. No time to dig in, so I just  
>> switch to the trunk MultiSearcher and all tests passed except for  
>> the two with the above issues.
> Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting  
> different aspects of an issue thats messed up with the old  
> MultiSearcher as well.

OK.  If you're building on LUCENE-1471, make sure you start from the  
first patch.  It'd be good to factor that logic (2nd pqueue for  
merging) out so it can be reused b/w IndexSearcher & MultiSearcher.

> I think I have everything working, except there is a problem with  
> sort by doc id (it won't reverse). Got the auto detection working  
> though.

Nice!!  Looking forward to the patch :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> Mark Miller wrote:
>
>> Mark Miller wrote:
>>> Mark Miller wrote:
>>>>>
>>>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>>>
>>>> Yes. First thing I did was try and patch this in, but the sort 
>>>> tests failed. It would be the right order, but like the two center 
>>>> docs would be reversed or something. No time to dig in, so I just 
>>>> switch to the trunk MultiSearcher and all tests passed except for 
>>>> the two with the above issues.
>>> Got the auto detection working though.
>> Bah, I didn't. Brought up an old bug I've seen before - if you use 
>> multisearcher and an index doesn't have the field, AUTO won't work. 
>> Advice I always got was don't use AUTO, but even Lucene uses it 
>> internally. Thought I had a workarount, but didn't quite work. Not 
>> sure what to do about this one - I'll have to mull it and the ids 
>> issue over a bit I suppose.
>
>
> Hmm... I think we have to keep the AUTO -> true type resolution that 
> MultiReader would do?  Ie, ask MultiReader for the TermEnum, not the 
> first sub-reader, for resolving.
>
> In fact we should factor out an explicit method to do this; it's 
> currently in ExtendedFieldCache.autoCache.createValue.
>
> As long as you do that resolving up front w/ the MultiReader, and pass 
> only resolved SortField[] to each sub-reader, that should fix it?
>
> Mike
Your right. I get caught up in the mode of trying to hack it to work 
quick before I do it right.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Mark Miller wrote:
>> Mark Miller wrote:
>>>>
>>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>>
>>> Yes. First thing I did was try and patch this in, but the sort  
>>> tests failed. It would be the right order, but like the two center  
>>> docs would be reversed or something. No time to dig in, so I just  
>>> switch to the trunk MultiSearcher and all tests passed except for  
>>> the two with the above issues.
>> Got the auto detection working though.
> Bah, I didn't. Brought up an old bug I've seen before - if you use  
> multisearcher and an index doesn't have the field, AUTO won't work.  
> Advice I always got was don't use AUTO, but even Lucene uses it  
> internally. Thought I had a workarount, but didn't quite work. Not  
> sure what to do about this one - I'll have to mull it and the ids  
> issue over a bit I suppose.


Hmm... I think we have to keep the AUTO -> true type resolution that  
MultiReader would do?  Ie, ask MultiReader for the TermEnum, not the  
first sub-reader, for resolving.

In fact we should factor out an explicit method to do this; it's  
currently in ExtendedFieldCache.autoCache.createValue.

As long as you do that resolving up front w/ the MultiReader, and pass  
only resolved SortField[] to each sub-reader, that should fix it?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
> Mark Miller wrote:
>>>
>>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>>
>> Yes. First thing I did was try and patch this in, but the sort tests 
>> failed. It would be the right order, but like the two center docs 
>> would be reversed or something. No time to dig in, so I just switch 
>> to the trunk MultiSearcher and all tests passed except for the two 
>> with the above issues.
>  Got the auto detection working though.
Bah, I didn't. Brought up an old bug I've seen before - if you use 
multisearcher and an index doesn't have the field, AUTO won't work. 
Advice I always got was don't use AUTO, but even Lucene uses it 
internally. Thought I had a workarount, but didn't quite work. Not sure 
what to do about this one - I'll have to mull it and the ids issue over 
a bit I suppose.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Mark,

> Thanks for the ref to that bug Uwe, was indeed the problem.

This is now committed: updates in FieldSortedHitQueue, new super-interface
for FieldCache.Parsers and SortField changes (see Mikes commit as I have no
committer status yet).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
>>
>> Which new sort stuff are you referring to?  Is it LUCENE-1471?
>
> Yes. First thing I did was try and patch this in, but the sort tests 
> failed. It would be the right order, but like the two center docs 
> would be reversed or something. No time to dig in, so I just switch to 
> the trunk MultiSearcher and all tests passed except for the two with 
> the above issues.
Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting different 
aspects of an issue thats messed up with the old MultiSearcher as well.

I think I have everything working, except there is a problem with sort 
by doc id (it won't reverse). Got the auto detection working though.

Thanks for the ref to that bug Uwe, was indeed the problem.

- Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> Mark Miller wrote:
>
>> I tried a quick poor mans version using a MultiSearcher and wrapping 
>> the sub readers as searchers. Other than some AUTO sort field 
>> detection problems, all tests do appear to pass.
>
> Excellent, that sounds like a tentatively positive result, though we 
> do need to get to the bottom of those differences.

I'm going to dig in more tonight I hope. The main issue is that using 
SortType.AUTO blows up because the MultiSearcher code expects it already 
to have been resolved to a sort type, but my hack kept that from 
happening so it hits a switch statement for AUTO that throws an 
exception. That shouldn't be a problem. The other issue is a weird 
situation where it hits an INT switch statement when the sort type is 
Byte and throws a class cast exception. I didn't have time to look into 
anything, but I'm guessing its AUTO related as well. Both probably non 
issues once I get time to do more.

>
>> The new sort stuff for MultiSearcher may be a tiny bit off...sort 
>> tests fail, though are only slightly off, with that patch. Havn't 
>> looked further yet - just hacked it up real quick. Seems to work, but 
>> needs work.
>
> Which new sort stuff are you referring to?  Is it LUCENE-1471?

Yes. First thing I did was try and patch this in, but the sort tests 
failed. It would be the right order, but like the two center docs would 
be reversed or something. No time to dig in, so I just switch to the 
trunk MultiSearcher and all tests passed except for the two with the 
above issues.

- Mark

>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> I tried a quick poor mans version using a MultiSearcher and wrapping  
> the sub readers as searchers. Other than some AUTO sort field  
> detection problems, all tests do appear to pass.

Excellent, that sounds like a tentatively positive result, though we  
do need to get to the bottom of those differences.

> The new sort stuff for MultiSearcher may be a tiny bit off...sort  
> tests fail, though are only slightly off, with that patch. Havn't  
> looked further yet - just hacked it up real quick. Seems to work,  
> but needs work.

Which new sort stuff are you referring to?  Is it LUCENE-1471?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

I tried a quick poor mans version using a MultiSearcher and wrapping the 
sub readers as searchers. Other than some AUTO sort field detection 
problems, all tests do appear to pass. The new sort stuff for 
MultiSearcher may be a tiny bit off...sort tests fail, though are only 
slightly off, with that patch. Havn't looked further yet - just hacked 
it up real quick. Seems to work, but needs work.

- Mark


Michael McCandless wrote:
>
> Mark Miller wrote:
>
>> Michael McCandless wrote:
>>>
>>> Mark Miller wrote:
>>>
>>>> What do we get from this though? A MultiSearcher (with the  scoring 
>>>> issues) that can properly do rewrite? Won't we have to take 
>>>> MultiSearchers scoring baggage into this as well?
>>>
>>> If this can work, what we'd get is far better reopen() performance
>>> when you sort-by-field, with no change to the returned results
>>> (rewrite, scores, sort order are identical).
>>>
>>> Say you have 1MM doc index, and then you add 100 docs & commit.
>>> Today, when you reopen() and then do a search, FieldCache recomputes
>>> from scratch (iterating through all Terms in entire index) the global
>>> arrays for the fields you're sorting on.  The cost is in proportion to
>>> total index size.
>>>
>>> With this change, only the new segment's terms will be iterated on, so
>>> the cost is in proportion to what new segments appeared.
>>>
>>> This is the same benefit we are seeking with LUCENE-831, for all uses
>>> of FieldCache (not just sort-by-field), it's just that I think we can
>>> achieve this speedup to sort-by-field without LUCENE-831.
>>
>> Yup, I'm with you on all that. Except the without LUCENE-831 part - 
>> we need some FieldCache meddling right? The current FieldCache 
>> approach doesn't allow us to meddle much. Isn't it more like, we want 
>> the LUCENE-831 API (or something similar), but we won't need the 
>> objectarray or merge stuff?
>
> We wouldn't need any change to FieldCache, because we only ask 
> FieldCache for int[] (eg) on the SegmentReader instances.  Because 
> reopen() shares SegmentReader instances, only the new segments would 
> have a cache miss in FieldCache.  I think?
>
> Once we do LUCENE-831, minus objectarray and merging, this change 
> would be basically the same, ie, accessing per-segment int values, 
> just with a new API.  Ie, by doing this change first I don't think 
> we're going to waste much in then cutting over in the future to 
> LUCENE-831's API (vs waiting for LUCENE-831 api).
>
>>> I think there would be no change to the scoring: we would still create
>>> a Weight based on the toplevel IndexReader, but then search each
>>> sub-reader separately, using that Weight.
>>>
>>> Though... that is unusual (to create a Weight with the parent
>>> IndexSearcher and then use it in the sub-searchers) -- will something
>>> break if we do that?  (This is new territory for me).
>>
>> Okay, right. That does change things. Would love to hear more 
>> opinions, but that certainly seems reasonable to me. You score each 
>> segment using tf/idf stats from all of the segments.
>
> That's my expectation (hope).  So the results are identical but 
> performance is much better.
>
>>> If something will break, I think we can still achieve this, but it
>>> will be a more invasive change and probably will have to be re-coupled
>>> to the new API we will introduce with LUCENE-831.  Marvin actually
>>> referred to how to do this, here:
>>>
>>>  https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854#action_12650854 
>>>
>>>
>>> in the paragraph starting with "If our goal is minimal impact...".
>>> Basically during collection, the FieldSortedHitQueue would have to
>>> keep track of subReaderIndex/subReaderDocID (mapping, through
>>> iteration, from the primary docID w/o doing a wasteful new binary
>>> search for each) and enroll into different pqueues indexed by
>>> subReaderIndex, then do the merge sort in the end.
>>>
>>> Mike
>>>
>>>>
>>>> Michael McCandless wrote:
>>>>>
>>>>> On thinking more about this... I think with a few small changes we
>>>>> could achieve Sort by field without materializing a full array.  We
>>>>> can decouple this change from LUCENE-831.
>>>>>
>>>>> I think all that's needed is:
>>>>>
>>>>> * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
>>>>>   IndexReader.getSubReaders.  Default impl could just return
>>>>>   length-1 array of itself.
>>>>>
>>>>> * Change IndexSearcher.sort that takes a Sort, to first call
>>>>>   IndexReader.getSubReaders, and then do the same logic that
>>>>>   MultiSearcher does, with improvements from LUCENE-1471 (run
>>>>>   separate search per-reader, then merge-sort the top hits from
>>>>>   each).
>>>>>
>>>>> The results should be functionally identical to what we have today,
>>>>> but, searching after doing a reopen() should be much faster since 
>>>>> we'd
>>>>> no longer re-build the global FieldCache array.
>>>>>
>>>>> Does this make sense?  It's a small change for a big win, I think.
>>>>> Does anyone want to take a crack at this patch?
>>>>>
>>>>> Mike
>>>>>
>>>>> Mark Miller wrote:
>>>>>
>>>>>> Michael McCandless wrote:
>>>>>>>
>>>>>>> I'd like to decouple "upgraded to Object" vs "materialize full 
>>>>>>> array", ie, so we can access native values w/o materializing the 
>>>>>>> full array.  I also think "upgrade to Object" is dangerous to 
>>>>>>> even offer since it's so costly.
>>>>>>>
>>>>>>>
>>>>>> I'm right with you. I didn't think the Object approach was really 
>>>>>> an upgrade (beyond losing the merge, which is especially 
>>>>>> important for StringIndex - it has no merge option at the moment) 
>>>>>> which is why I left both options for now. So I def agree we need 
>>>>>> to move to iterator, drop object, etc.
>>>>>>
>>>>>> Its the doin' that aint so easy. The iterator approach seems 
>>>>>> somewhat straightforward (though its complicated by needing to 
>>>>>> provide a random access object as well), but I'm still working 
>>>>>> through how we control so many iterator types (I dont see how you 
>>>>>> can use polymorphism yet ).
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Michael McCandless wrote:
>>
>> Mark Miller wrote:
>>
>>> What do we get from this though? A MultiSearcher (with the   
>>> scoring issues) that can properly do rewrite? Won't we have to  
>>> take MultiSearchers scoring baggage into this as well?
>>
>> If this can work, what we'd get is far better reopen() performance
>> when you sort-by-field, with no change to the returned results
>> (rewrite, scores, sort order are identical).
>>
>> Say you have 1MM doc index, and then you add 100 docs & commit.
>> Today, when you reopen() and then do a search, FieldCache recomputes
>> from scratch (iterating through all Terms in entire index) the global
>> arrays for the fields you're sorting on.  The cost is in proportion  
>> to
>> total index size.
>>
>> With this change, only the new segment's terms will be iterated on,  
>> so
>> the cost is in proportion to what new segments appeared.
>>
>> This is the same benefit we are seeking with LUCENE-831, for all uses
>> of FieldCache (not just sort-by-field), it's just that I think we can
>> achieve this speedup to sort-by-field without LUCENE-831.
>
> Yup, I'm with you on all that. Except the without LUCENE-831 part -  
> we need some FieldCache meddling right? The current FieldCache  
> approach doesn't allow us to meddle much. Isn't it more like, we  
> want the LUCENE-831 API (or something similar), but we won't need  
> the objectarray or merge stuff?

We wouldn't need any change to FieldCache, because we only ask  
FieldCache for int[] (eg) on the SegmentReader instances.  Because  
reopen() shares SegmentReader instances, only the new segments would  
have a cache miss in FieldCache.  I think?

Once we do LUCENE-831, minus objectarray and merging, this change  
would be basically the same, ie, accessing per-segment int values,  
just with a new API.  Ie, by doing this change first I don't think  
we're going to waste much in then cutting over in the future to  
LUCENE-831's API (vs waiting for LUCENE-831 api).

>> I think there would be no change to the scoring: we would still  
>> create
>> a Weight based on the toplevel IndexReader, but then search each
>> sub-reader separately, using that Weight.
>>
>> Though... that is unusual (to create a Weight with the parent
>> IndexSearcher and then use it in the sub-searchers) -- will something
>> break if we do that?  (This is new territory for me).
>
> Okay, right. That does change things. Would love to hear more  
> opinions, but that certainly seems reasonable to me. You score each  
> segment using tf/idf stats from all of the segments.

That's my expectation (hope).  So the results are identical but  
performance is much better.

>> If something will break, I think we can still achieve this, but it
>> will be a more invasive change and probably will have to be re- 
>> coupled
>> to the new API we will introduce with LUCENE-831.  Marvin actually
>> referred to how to do this, here:
>>
>>  https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854 
>> #action_12650854
>>
>> in the paragraph starting with "If our goal is minimal impact...".
>> Basically during collection, the FieldSortedHitQueue would have to
>> keep track of subReaderIndex/subReaderDocID (mapping, through
>> iteration, from the primary docID w/o doing a wasteful new binary
>> search for each) and enroll into different pqueues indexed by
>> subReaderIndex, then do the merge sort in the end.
>>
>> Mike
>>
>>>
>>> Michael McCandless wrote:
>>>>
>>>> On thinking more about this... I think with a few small changes we
>>>> could achieve Sort by field without materializing a full array.  We
>>>> can decouple this change from LUCENE-831.
>>>>
>>>> I think all that's needed is:
>>>>
>>>> * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
>>>>   IndexReader.getSubReaders.  Default impl could just return
>>>>   length-1 array of itself.
>>>>
>>>> * Change IndexSearcher.sort that takes a Sort, to first call
>>>>   IndexReader.getSubReaders, and then do the same logic that
>>>>   MultiSearcher does, with improvements from LUCENE-1471 (run
>>>>   separate search per-reader, then merge-sort the top hits from
>>>>   each).
>>>>
>>>> The results should be functionally identical to what we have today,
>>>> but, searching after doing a reopen() should be much faster since  
>>>> we'd
>>>> no longer re-build the global FieldCache array.
>>>>
>>>> Does this make sense?  It's a small change for a big win, I think.
>>>> Does anyone want to take a crack at this patch?
>>>>
>>>> Mike
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> I'd like to decouple "upgraded to Object" vs "materialize full  
>>>>>> array", ie, so we can access native values w/o materializing  
>>>>>> the full array.  I also think "upgrade to Object" is dangerous  
>>>>>> to even offer since it's so costly.
>>>>>>
>>>>>>
>>>>> I'm right with you. I didn't think the Object approach was  
>>>>> really an upgrade (beyond losing the merge, which is especially  
>>>>> important for StringIndex - it has no merge option at the  
>>>>> moment) which is why I left both options for now. So I def agree  
>>>>> we need to move to iterator, drop object, etc.
>>>>>
>>>>> Its the doin' that aint so easy. The iterator approach seems  
>>>>> somewhat straightforward (though its complicated by needing to  
>>>>> provide a random access object as well), but I'm still working  
>>>>> through how we control so many iterator types (I dont see how  
>>>>> you can use polymorphism yet ).
>>>>>
>>>>> - Mark
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> Mark Miller wrote:
>
>> What do we get from this though? A MultiSearcher (with the  scoring 
>> issues) that can properly do rewrite? Won't we have to take 
>> MultiSearchers scoring baggage into this as well?
>
> If this can work, what we'd get is far better reopen() performance
> when you sort-by-field, with no change to the returned results
> (rewrite, scores, sort order are identical).
>
> Say you have 1MM doc index, and then you add 100 docs & commit.
> Today, when you reopen() and then do a search, FieldCache recomputes
> from scratch (iterating through all Terms in entire index) the global
> arrays for the fields you're sorting on.  The cost is in proportion to
> total index size.
>
> With this change, only the new segment's terms will be iterated on, so
> the cost is in proportion to what new segments appeared.
>
> This is the same benefit we are seeking with LUCENE-831, for all uses
> of FieldCache (not just sort-by-field), it's just that I think we can
> achieve this speedup to sort-by-field without LUCENE-831.

Yup, I'm with you on all that. Except the without LUCENE-831 part - we 
need some FieldCache meddling right? The current FieldCache approach 
doesn't allow us to meddle much. Isn't it more like, we want the 
LUCENE-831 API (or something similar), but we won't need the objectarray 
or merge stuff?

>
> I think there would be no change to the scoring: we would still create
> a Weight based on the toplevel IndexReader, but then search each
> sub-reader separately, using that Weight.
>
> Though... that is unusual (to create a Weight with the parent
> IndexSearcher and then use it in the sub-searchers) -- will something
> break if we do that?  (This is new territory for me).

Okay, right. That does change things. Would love to hear more opinions, 
but that certainly seems reasonable to me. You score each segment using 
tf/idf stats from all of the segments.

>
> If something will break, I think we can still achieve this, but it
> will be a more invasive change and probably will have to be re-coupled
> to the new API we will introduce with LUCENE-831.  Marvin actually
> referred to how to do this, here:
>
>   
> https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854#action_12650854 
>
>
> in the paragraph starting with "If our goal is minimal impact...".
> Basically during collection, the FieldSortedHitQueue would have to
> keep track of subReaderIndex/subReaderDocID (mapping, through
> iteration, from the primary docID w/o doing a wasteful new binary
> search for each) and enroll into different pqueues indexed by
> subReaderIndex, then do the merge sort in the end.
>
> Mike
>
>>
>> Michael McCandless wrote:
>>>
>>> On thinking more about this... I think with a few small changes we
>>> could achieve Sort by field without materializing a full array.  We
>>> can decouple this change from LUCENE-831.
>>>
>>> I think all that's needed is:
>>>
>>>  * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
>>>    IndexReader.getSubReaders.  Default impl could just return
>>>    length-1 array of itself.
>>>
>>>  * Change IndexSearcher.sort that takes a Sort, to first call
>>>    IndexReader.getSubReaders, and then do the same logic that
>>>    MultiSearcher does, with improvements from LUCENE-1471 (run
>>>    separate search per-reader, then merge-sort the top hits from
>>>    each).
>>>
>>> The results should be functionally identical to what we have today,
>>> but, searching after doing a reopen() should be much faster since we'd
>>> no longer re-build the global FieldCache array.
>>>
>>> Does this make sense?  It's a small change for a big win, I think.
>>> Does anyone want to take a crack at this patch?
>>>
>>> Mike
>>>
>>> Mark Miller wrote:
>>>
>>>> Michael McCandless wrote:
>>>>>
>>>>> I'd like to decouple "upgraded to Object" vs "materialize full 
>>>>> array", ie, so we can access native values w/o materializing the 
>>>>> full array.  I also think "upgrade to Object" is dangerous to even 
>>>>> offer since it's so costly.
>>>>>
>>>>>
>>>> I'm right with you. I didn't think the Object approach was really 
>>>> an upgrade (beyond losing the merge, which is especially important 
>>>> for StringIndex - it has no merge option at the moment) which is 
>>>> why I left both options for now. So I def agree we need to move to 
>>>> iterator, drop object, etc.
>>>>
>>>> Its the doin' that aint so easy. The iterator approach seems 
>>>> somewhat straightforward (though its complicated by needing to 
>>>> provide a random access object as well), but I'm still working 
>>>> through how we control so many iterator types (I dont see how you 
>>>> can use polymorphism yet ).
>>>>
>>>> - Mark
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> What do we get from this though? A MultiSearcher (with the  scoring  
> issues) that can properly do rewrite? Won't we have to take  
> MultiSearchers scoring baggage into this as well?

If this can work, what we'd get is far better reopen() performance
when you sort-by-field, with no change to the returned results
(rewrite, scores, sort order are identical).

Say you have 1MM doc index, and then you add 100 docs & commit.
Today, when you reopen() and then do a search, FieldCache recomputes
from scratch (iterating through all Terms in entire index) the global
arrays for the fields you're sorting on.  The cost is in proportion to
total index size.

With this change, only the new segment's terms will be iterated on, so
the cost is in proportion to what new segments appeared.

This is the same benefit we are seeking with LUCENE-831, for all uses
of FieldCache (not just sort-by-field), it's just that I think we can
achieve this speedup to sort-by-field without LUCENE-831.

I think there would be no change to the scoring: we would still create
a Weight based on the toplevel IndexReader, but then search each
sub-reader separately, using that Weight.

Though... that is unusual (to create a Weight with the parent
IndexSearcher and then use it in the sub-searchers) -- will something
break if we do that?  (This is new territory for me).

If something will break, I think we can still achieve this, but it
will be a more invasive change and probably will have to be re-coupled
to the new API we will introduce with LUCENE-831.  Marvin actually
referred to how to do this, here:

   https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854 
#action_12650854

in the paragraph starting with "If our goal is minimal impact...".
Basically during collection, the FieldSortedHitQueue would have to
keep track of subReaderIndex/subReaderDocID (mapping, through
iteration, from the primary docID w/o doing a wasteful new binary
search for each) and enroll into different pqueues indexed by
subReaderIndex, then do the merge sort in the end.

Mike

>
> Michael McCandless wrote:
>>
>> On thinking more about this... I think with a few small changes we
>> could achieve Sort by field without materializing a full array.  We
>> can decouple this change from LUCENE-831.
>>
>> I think all that's needed is:
>>
>>  * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
>>    IndexReader.getSubReaders.  Default impl could just return
>>    length-1 array of itself.
>>
>>  * Change IndexSearcher.sort that takes a Sort, to first call
>>    IndexReader.getSubReaders, and then do the same logic that
>>    MultiSearcher does, with improvements from LUCENE-1471 (run
>>    separate search per-reader, then merge-sort the top hits from
>>    each).
>>
>> The results should be functionally identical to what we have today,
>> but, searching after doing a reopen() should be much faster since  
>> we'd
>> no longer re-build the global FieldCache array.
>>
>> Does this make sense?  It's a small change for a big win, I think.
>> Does anyone want to take a crack at this patch?
>>
>> Mike
>>
>> Mark Miller wrote:
>>
>>> Michael McCandless wrote:
>>>>
>>>> I'd like to decouple "upgraded to Object" vs "materialize full  
>>>> array", ie, so we can access native values w/o materializing the  
>>>> full array.  I also think "upgrade to Object" is dangerous to  
>>>> even offer since it's so costly.
>>>>
>>>>
>>> I'm right with you. I didn't think the Object approach was really  
>>> an upgrade (beyond losing the merge, which is especially important  
>>> for StringIndex - it has no merge option at the moment) which is  
>>> why I left both options for now. So I def agree we need to move to  
>>> iterator, drop object, etc.
>>>
>>> Its the doin' that aint so easy. The iterator approach seems  
>>> somewhat straightforward (though its complicated by needing to  
>>> provide a random access object as well), but I'm still working  
>>> through how we control so many iterator types (I dont see how you  
>>> can use polymorphism yet ).
>>>
>>> - Mark
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

What do we get from this though? A MultiSearcher (with the  scoring 
issues) that can properly do rewrite? Won't we have to take 
MultiSearchers scoring baggage into this as well?

Michael McCandless wrote:
>
> On thinking more about this... I think with a few small changes we
> could achieve Sort by field without materializing a full array.  We
> can decouple this change from LUCENE-831.
>
> I think all that's needed is:
>
>   * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
>     IndexReader.getSubReaders.  Default impl could just return
>     length-1 array of itself.
>
>   * Change IndexSearcher.sort that takes a Sort, to first call
>     IndexReader.getSubReaders, and then do the same logic that
>     MultiSearcher does, with improvements from LUCENE-1471 (run
>     separate search per-reader, then merge-sort the top hits from
>     each).
>
> The results should be functionally identical to what we have today,
> but, searching after doing a reopen() should be much faster since we'd
> no longer re-build the global FieldCache array.
>
> Does this make sense?  It's a small change for a big win, I think.
> Does anyone want to take a crack at this patch?
>
> Mike
>
> Mark Miller wrote:
>
>> Michael McCandless wrote:
>>>
>>> I'd like to decouple "upgraded to Object" vs "materialize full 
>>> array", ie, so we can access native values w/o materializing the 
>>> full array.  I also think "upgrade to Object" is dangerous to even 
>>> offer since it's so costly.
>>>
>>>
>> I'm right with you. I didn't think the Object approach was really an 
>> upgrade (beyond losing the merge, which is especially important for 
>> StringIndex - it has no merge option at the moment) which is why I 
>> left both options for now. So I def agree we need to move to 
>> iterator, drop object, etc.
>>
>> Its the doin' that aint so easy. The iterator approach seems somewhat 
>> straightforward (though its complicated by needing to provide a 
>> random access object as well), but I'm still working through how we 
>> control so many iterator types (I dont see how you can use 
>> polymorphism yet ).
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

   * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
     IndexReader.getSubReaders.  Default impl could just return
     length-1 array of itself.

   * Change IndexSearcher.sort that takes a Sort, to first call
     IndexReader.getSubReaders, and then do the same logic that
     MultiSearcher does, with improvements from LUCENE-1471 (run
     separate search per-reader, then merge-sort the top hits from
     each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since we'd
no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:

> Michael McCandless wrote:
>>
>> I'd like to decouple "upgraded to Object" vs "materialize full  
>> array", ie, so we can access native values w/o materializing the  
>> full array.  I also think "upgrade to Object" is dangerous to even  
>> offer since it's so costly.
>>
>>
> I'm right with you. I didn't think the Object approach was really an  
> upgrade (beyond losing the merge, which is especially important for  
> StringIndex - it has no merge option at the moment) which is why I  
> left both options for now. So I def agree we need to move to  
> iterator, drop object, etc.
>
> Its the doin' that aint so easy. The iterator approach seems  
> somewhat straightforward (though its complicated by needing to  
> provide a random access object as well), but I'm still working  
> through how we control so many iterator types (I dont see how you  
> can use polymorphism yet ).
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> I'd like to decouple "upgraded to Object" vs "materialize full array", 
> ie, so we can access native values w/o materializing the full array.  
> I also think "upgrade to Object" is dangerous to even offer since it's 
> so costly.
>
>
I'm right with you. I didn't think the Object approach was really an 
upgrade (beyond losing the merge, which is especially important for 
StringIndex - it has no merge option at the moment) which is why I left 
both options for now. So I def agree we need to move to iterator, drop 
object, etc.

Its the doin' that aint so easy. The iterator approach seems somewhat 
straightforward (though its complicated by needing to provide a random 
access object as well), but I'm still working through how we control so 
many iterator types (I dont see how you can use polymorphism yet ).

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Michael McCandless wrote:
>>
>> Today, with IndexSearcher(MultiReader), the FieldSortedHitQueue asks
>> FieldCache to materialize the full array for each field.  Whereas
>> MultiSearcher only asks each child reader to materialize its array  
>> for
>> the field, which is better because on reopen we only need to init the
>> array for the new segments.
> Thats what the ObjectArray does.

OK.

> There are currently two approaches in the patch.
>
> The primitive array approach is the one that uses the merge to  
> create a larger array. The win here is only on loading from disk  
> with reopen.
>
> The other approach is the ObjectArray. Its win is no merge and each  
> sub reader keeps its own array with no super array being created.  
> The loses right now are the binarysearch of subreaders, the method  
> call, and the cast. Seems to be about a 5-10% slowdown (if I  
> remember right).

I'd like to decouple "upgraded to Object" vs "materialize full array",  
ie, so we can access native values w/o materializing the full array.   
I also think "upgrade to Object" is dangerous to even offer since it's  
so costly.

> You have to choose one of the two 'mode' of operation with the  
> system property. This is not the type of thing that I want to end up  
> in the patch. However, it was even uglier a few iterations ago, so  
> eventually...I just keep pushing a little every few months when it  
> catches my eye. There is too much involved here for me to handle  
> myself (in terms of decisions, design), and prior to now, the main  
> interest has been by watchers.

OK.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>
> Today, with IndexSearcher(MultiReader), the FieldSortedHitQueue asks
> FieldCache to materialize the full array for each field.  Whereas
> MultiSearcher only asks each child reader to materialize its array for
> the field, which is better because on reopen we only need to init the
> array for the new segments.
Thats what the ObjectArray does.

There are currently two approaches in the patch.

The primitive array approach is the one that uses the merge to create a 
larger array. The win here is only on loading from disk with reopen.

The other approach is the ObjectArray. Its win is no merge and each sub 
reader keeps its own array with no super array being created. The loses 
right now are the binarysearch of subreaders, the method call, and the 
cast. Seems to be about a 5-10% slowdown (if I remember right).

You have to choose one of the two 'mode' of operation with the system 
property. This is not the type of thing that I want to end up in the 
patch. However, it was even uglier a few iterations ago, so 
eventually...I just keep pushing a little every few months when it 
catches my eye. There is too much involved here for me to handle myself 
(in terms of decisions, design), and prior to now, the main interest has 
been by watchers.

- Mark

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark Miller wrote:

> Michael McCandless (JIRA) wrote:
>>  However, stepping back, this is poor approach.  We should instead be
>> doing what MultiSearcher does, which is gather top results
>> per-sub-reader, and then merge-sort the results.  At that point, to  
>> do
>> the merge, we only need actual field values for those docs in the top
>> N.
>>
> What if one segment has the 20 best docs and a second segment has  
> only pretty poor docs - and then various mixes in between. This is  
> how it works with MultiSearcher right?

I was thinking only about sorting by field(s), which I think is  
functionally
identical through MultiSearcher vs IndexSearcher(MultiSegmentReader).

Today, with IndexSearcher(MultiReader), the FieldSortedHitQueue asks
FieldCache to materialize the full array for each field.  Whereas
MultiSearcher only asks each child reader to materialize its array for
the field, which is better because on reopen we only need to init the
array for the new segments.

If somehow we could make IndexSearcher(MultiReader) implement it's
field sorting searchers the same way MultiSearcher does, I think
that'd be a big win.

> MultiSearcher has a few aspects I don't like.

Do you mean the score differences vs IndexSearcher(MultiReader), or is  
there something else?

> I may not be fully understanding yet though. I have a slow  
> understanding time and a fast response time. Brilliant  
> combination :) Its like a fish with feet.

I too don't fully understand it!  Trying to think it through still...

> I hope I'll be playing around more with some of these ideas though  
> (free time is not as plentiful as I would like at the moment). I  
> should have some better comments to make about it then.

It'd be great to get closure on LUCENE-831.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless (JIRA) wrote:
>   
> However, stepping back, this is poor approach.  We should instead be
> doing what MultiSearcher does, which is gather top results
> per-sub-reader, and then merge-sort the results.  At that point, to do
> the merge, we only need actual field values for those docs in the top
> N.
>   
What if one segment has the 20 best docs and a second segment has only 
pretty poor docs - and then various mixes in between. This is how it 
works with MultiSearcher right? MultiSearcher has a few aspects I don't 
like.

I may not be fully understanding yet though. I have a slow understanding 
time and a fast response time. Brilliant combination :) Its like a fish 
with feet.

I hope I'll be playing around more with some of these ideas though (free 
time is not as plentiful as I would like at the moment). I should have 
some better comments to make about it then.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654055#action_12654055 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


[Note: my understanding of this area in general, and this patch in
particular, is still rather spotty... so please correct my
misconceptions in what follows...]

This change is a great improvement, since the cache management would
be per-IndexReader, and more public so that you could see what's
cached, access the cache via the reader, swap in your own cache
management, etc.

But I'm concerned, because this change continues the "materialize
massive array for entire index" approach, which is the major remaining
cost when (re)opening readers.  EG, isMergable()/mergeData() methods
build up the whole array from sub readers.

What would it take to never require materializing the full array for
the index, for Lucene's internal purposes (external users may continue
to do so if they want)?  Ie, leave the array bound to the "leaf"
IndexReader (ie, SegmentReader).  It was briefly touched on here:

  https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650964#action_12650964

I realize this is a big change, but I think we need to get there
eventually.

EG I can see in this patch that MultiReader & MultiSegmentReader do
expose a CacheData that has get and get2 (why do we have get2?) that
delegate to child readers, which is good, but it's not good that they
return Object (requires casting for every lookup).  We don't have
per-atomic-type variants?  Couldn't we expose eg an IntData class (and
all other types) that has int get(docID) abstract method, that
delegate to child readers?  (I'm also generally confused by why we
have the per-atomic-type switching happening in CacheKey subclasses
and not CacheData.)

Then... and probably the hardest thing to fix here: for all the
comparators we now materialize the full array.  I realize we use the
full array when sorting during a search of an
IndexSearcher(MultiReader(...)), because FieldSortedHitQueue is called
for every doc visited and must be able to quickly make its comparison.

However, stepping back, this is poor approach.  We should instead be
doing what MultiSearcher does, which is gather top results
per-sub-reader, and then merge-sort the results.  At that point, to do
the merge, we only need actual field values for those docs in the top
N.

If we could fix field-sorting like that (and I'm hazy on exactly how
to do so), I think Lucene internally would then never need the full
array?

This change also adds USE_OA_SORT, which is scary to me because Object
overhead per doc can be exceptionally costly.  Why do we need to even
offer that?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624835#action_12624835 ] 

Fuad Efendi commented on LUCENE-831:
------------------------------------

Would be nice to have TermVectorCache (if term vectors are stored in the index)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Marvin Humphrey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698219#action_12698219 ] 

Marvin Humphrey commented on LUCENE-831:
----------------------------------------

"FieldType" is probably a better name than "FieldSpec", as it implies
subclasses with "Type" as a suffix: FullTextType, StringType, BlobType,
Float32Type, etc.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697492#action_12697492 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Is there much difference in one massive array or an array of arrays? Its just as much space and just as dangerous, right?

One massive array is far more dangerous during reopen() (ie that's why we did LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that "dangerous".

bq. I like the idea of an iterator API, but it seems we will still have to provide random access with all its problems, right?

Right.

bq. Some apps will need random access to the field cache for any given document right?

Yes but I think such apps should move to the per-segment model (eg a Filter's getDocIdSet is called per segment reader).

If an app really wants to make a single massive array, they can certainly do so, outside of Lucene.

bq. Allow a custom field cache loader for each type?

Yes, possibly w/ different degrees of extension (much like Collector).  EG maybe you just want to override how you parse an int, or maybe you want to take control over the entire uninversion.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509958 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I think this patch is great.  Not only does it make all of the sort caching logic much easier to decipher, being able to throw in a sophisticated cache manager like ehcache (which can let caches overflow to disk) could end up being pretty powerful. I am also very interested in the possibility of pre-warming a Reader.

I will spend some time playing with what is here so far.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657404#action_12657404 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Thanks Jeremey! Thats great news. 

I think the issues is going to heavily affected by LUCENE-1483, so once thats done, I hope I'll be able to come right back to this.

bq. Looking at the getCachedData method for MultiReader and MultiSegmentReader, it doesn't appear that the CacheData objects from merge operations are cached. Is there any reason for this?

I think its just not done yet - I havn't looked at the merging since I started playing with the ObjectArrays - I think most of this stuff becomes moot with 1483 though - this will turn more into an API overhaul than an IndexReader reopen time saver.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699993#action_12699993 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I think we don't want to expose Uninverter though? The API should be neutral enough to naturally support loading from CSF, in which case Uninverter doesnt make sense...so we were going to go with having to override the value source to handle uninverter type stuff.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698272#action_12698272 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. Should CachingValueSource really include IndexReader in its key?

I see now - I wasnt always holding the ValueSource in the Reader - I was using getXXX(reader, field). Now I am back to that - and since the CachingValueSource holds the map, I am back to needing that. Being able to change the ValueSource on the fly now with SortField has implications for CachingValueSource. There has to be some kind of default ValueSource, which would likely do caching. Once you started using ValueSources from SortField, they won't share the cache. Get enough of them going and the RAM reqs jump.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649393#action_12649393 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I think this would actually be better if all cachekey types had to implement both ObjectArray access as well as primitive Array access. Makes the code cleaner and cuts down on the cachekey explosion. Should have done it this way to start, but couldnt see the forest through the trees back then i suppose.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698286#action_12698286 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I really like CachingValueSource, but somehow the cache has to move to the segmentreader or something... (not that there is not a segmentreader -> valuesource mapping)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689624#action_12689624 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I will attach my comments regarding the problem with the TrieRangeFilter and sorting (stop collecting terms into cache when lower precisions begin or only collect terms using a specific range (like a range filter). So you could fill a FieldCache and specify a starting term and ending term, all terms inbetween could be put into the cache, others outside left out. In this way, it would be possible to just use TrieUtils.prefixCodeLong() to specify the upper and lower integer bound encoded in the highest precision.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698232#action_12698232 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Note: I think we should add the option to sort nulls first or last with this.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700585#action_12700585 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

I am still thinking about the difference between function query's ValueSource and the new ValueSource and I would really like to combine both.
I know for sorting, the array approach is faster, but maybe the new ValueSource could provide both ways to access. In the array approach, one would only get arrays for single segments, but the method-like access could still map the document ids to the correct segment, to have a uniform access even to multi readers.
So, maybe there is a possibility to merge both approaches and only provide one ValueSource supplying both access strategies.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649188#action_12649188 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Also - your reopen time will vary greatly depending on how many segments need to be reopened and what size they are...so I think a lower merge factor would help, while hurting overall search speed of course.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649385#action_12649385 ] 

Alex Vigdor commented on LUCENE-831:
------------------------------------

To be honest, the cache never successfully refilled before the patch - or at least I gave up after waiting 10 minutes.  I was about to give up on sorting.  It could have to do with the fact that we're running with a relatively modest amount of RAM (768M) given our index size. But with the patch at least sorting is a realistic option!

I will look at adding the warming to my own code as you suggest; it is another peculiarity of this project that I can't know in the code what fields will be used for sorting, but I'll just track the searches coming through and aggregate any sorts they perform into a warming query.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559341#action_12559341 ] 

Michael Busch commented on LUCENE-831:
--------------------------------------

{quote}
As with Lucene 2.3 the reopen is possible, how about this issue? Would this be possible for 2.4?
{quote}

Yeah, this is a limitation currently of reopen(). I'm planning to work on it after the 2.3 release is
out and LUCENE-584 is committed!

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698264#action_12698264 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. How about SortField taking ValueSource? 

Right - theres the right thought I think. I'll play with that.

bq. Need to pass in ValueSource impl to IndexReader.open, defaulting to Cached(Uninverted)

Yes - only havn't because that code is kind of unfirm - it already seems like it will prob be moved out to SortField :) So I was just short-cut initing it.

bq. Should CachingValueSource really include IndexReader in its key?

Probably not then :) I'll be going over that tighter - I was in speed mode and havn't considered the CachingValueSource much yet - I kind of just banged out a quick impl (gotto love eclipses generate hashcode/equals), put it in and tested it.  To handle all of this 'do it for Long, repeat for Int, repeat for Byte, etc' I go somewhat into robot mode. Also, this early, I know you'll have me changing directions fast enough that I'm still in pretty rough hammer out mode.

bq. Why do we need to define StringInverter, IntInverter, etc in UninversionValueSource? 

Yes true. I was using anon classes in the prev patch, but upon switch to Uninverter, I just did mostly what came to mind quickest looking at your example code and what I had.
Indeed, overriding getXXX is simple and effective. As I think about it - now I  am thinking I did it for the getArray call to return the right type (easy with anon class, but custom passed Uninverter ?). Could return object and cast though ...

Do we need Uninverter if overriding getXXX is easy enough and we pass the ValueSource on SortField?

bq. Should we deprecate function.FieldCacheSource,

Yeah for sure. I'll take a closer look at the function package soon. Thus far, I've just got it compiling and the main sort tests passing. As soon as I feel the API is firming (which, sweet, it already is a bit I think), I'll start polishing and filling in the missing pieces, more thorough nocommits, comments.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698254#action_12698254 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I began to make the switch of allowing an Inverter to be attached to a SortField like parser with the idea of doing backcompat by wrapping the parser with an Inverter. I caught myself though, because the reason I wasn't doing that before is that Inverter may not make sense for a given ValueSource. By forcing you to set the inverters per type on the UninversionValueSource instead (current posted patch), this issue is kind of avoided - the possible problem is that it seems somewhat less dynamic in that you set it once on the ValueSource and leave it instead of being able to pass any impl any time on a SortField. Perhaps not so bad. But then how do I set the Uninverter for back compat with an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to the UninversionValueSource does it? Thread safety issues, and ... but then how to handle back compat with the parser? ... 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698254#action_12698254 ] 

Mark Miller edited comment on LUCENE-831 at 4/12/09 12:01 PM:
--------------------------------------------------------------

I began to make the switch of allowing an Uninverter to be attached to a SortField (like Parser) with the idea of doing backcompat by wrapping the Parser with an Inverter. I caught myself though, because the reason I wasn't doing that before is that Inverter may not make sense for a given ValueSource. By forcing you to set the inverters per type on the UninversionValueSource instead (current posted patch), this issue is kind of avoided - the possible problem is that it seems somewhat less dynamic in that you set it once on the ValueSource and leave it instead of being able to pass any impl any time on a SortField. Perhaps not so bad. But then how do I set the Uninverter for back compat with an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to the UninversionValueSource does it? Thread safety issues, and ... but then how to handle back compat with the parser? ... 

      was (Author: markrmiller@gmail.com):
    I began to make the switch of allowing an Inverter to be attached to a SortField like parser with the idea of doing backcompat by wrapping the parser with an Inverter. I caught myself though, because the reason I wasn't doing that before is that Inverter may not make sense for a given ValueSource. By forcing you to set the inverters per type on the UninversionValueSource instead (current posted patch), this issue is kind of avoided - the possible problem is that it seems somewhat less dynamic in that you set it once on the ValueSource and leave it instead of being able to pass any impl any time on a SortField. Perhaps not so bad. But then how do I set the Uninverter for back compat with an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to the UninversionValueSource does it? Thread safety issues, and ... but then how to handle back compat with the parser? ... 
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698257#action_12698257 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

How about SortField taking ValueSource?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657151#action_12657151 ] 

Robert Newson commented on LUCENE-831:
--------------------------------------

I was wondering if the next version of the patch could include a sample disk-based cache? It seems that CacheKey classes are fine for an in-memory HashMap (since SimpleMapCache works just fine) but I wonder if equals/hashCode is sufficient when the data is on disk?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698230#action_12698230 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

bq. I like this idea, but i am a little bit concerned about only one ValueSource for the Reader. This makes plugging in different sources for different field types hard.

I guess you would just have to set a TrieEnabledValueSource when creating your IndexReaders? I suppose it could extend UninversionValueSource and for given fields do Trie unencoding, uninverting, and all other fields, yeld to the super impl?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Jeremy Volkman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674023#action_12674023 ] 

Jeremy Volkman commented on LUCENE-831:
---------------------------------------

Are there still things planned for this issue now that LUCENE-1483 has been committed? 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated LUCENE-831:
----------------------------

    Attachment: fieldcache-overhaul.diff


This is based on an idea i had a few months ago and was recently reminded of because of several mail threads about FieldCache .. so i started fleshing it out on the plane last week.

I'm not entirely happy with it in it's current state, but I wanted to post it and see what people think of the overall approach.

if people like the direction this is going, I would definitely appreciate some help with API critique and good unit tests (so far i've been relying solely on the existing Unit Tests to validate that i'm not breaking anything -- but that doesn't really prove that the new APIs work the way they are intended to)


TODO List

 * lots of little :TODO:s in code, mainly about javadocs
 * add merge methods to StringIndexCacheKey 
   (a bit complicated, but should be possible/efficient)
 * figure out if there is any better way of dealing with 
   SortComparatorCacheKey and the visibility issues of 
   SortComparator.getComparable 
 * change norm caching to use new caches (if not the same 
   main cache, then at least the same classes in a private cache)
 * write an ass load more tests
 * is there a better way to deal with merging then to pass starts[] ?  
   (pass a new datastructure encapsulating starts/subReaders?)
 * CacheFactory seemed like a good idea initially so that MultiReader 
   on a multi-segment index could cascade down, but what if people 
   only want caching at the outermost level (regardless of wether 
   the key is mergable) ... factory can't assuming anything if reader 
   is not an instance of MultiReader
 * audit/change all core code using FieldCache to use new API
 * performance test that this doesn't hurt things in some way.



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698101#action_12698101 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

{quote}
How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc. (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion. Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it? It would default to
CachedValueSource(UninversionValueSource()). I think we should
require that you set this on opening the reader, and you can't later
change it.
{quote}

I like this idea, but i am a little bit concerned about only one ValueSource for the Reader. This makes plugging in different sources for different field types hard.

E.g.: One have a CSF and a TrieField and several normal int/float fields. For each of these fields he needs another ValueSource. The CSF field can be loaded from Payloads, the TrieField by decoding the prefix encoded values and the others like it is now.

So the IndexReaders ValueSource should be a Map of FieldValueSources, so the user could register FieldValueSources for different field types.

The idea of setting the ValueSource for the IndexReader is nice, we then could simply remove the extra SortField constructors, I added in LUCENE-1478, as it would be possible to specify the type for Sorting when creating the IndexReader. Search code then would simply say, sort by fields a, b, c without knowing what type of field it is. The sort code would get the type and the arrays from the underlying IndexReaders.

The same with the current function query value sources (just a question: are the functions query value sources then obsolete and can be merged with the "new" ValueSource)?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: fieldcache-overhaul.032208.diff

Patch roughly moves things forward.

I've added stuff for Long, Short, and Byte parsing, changed getAuto to a static cachkey method, switched core Lucene from using the FieldCache API to the new API, added some javadoc, and roughly have things going with the reopen method.

In short, still a lot to do, but this advances things enough so that you can apply it to trunk and check out the goodness that this can bring to sorting and reopen.

- Mark

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654488#action_12654488 ] 

Jason Rutherglen commented on LUCENE-831:
-----------------------------------------

M. McCandless:

"This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc...."

+1 For 3.0 this will be of great benefit in the effort to remove the excessive string creation
that happens right now with Lucene.  Term should also be more generic such that it
can also accept primitive or user defined types (and index format encodings).  

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698224#action_12698224 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. "FieldType" is probably a better name than "FieldSpec"

+1

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699649#action_12699649 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

This was the idea behin the "FieldType": You register at the top-level IndexReader/MultiReader/whatever the parsers/valuesources (e.g. in a map coded by field), all subreaders would also get this map (passed through) and if one asks for cache values for a specific field, he would get the correctly decoded fields (from CSF, Univerter, TrieUniverter, Stored Fields [not really, but would be possible]). This was the original approach of this issue: attach caching to the single index/segmentreaders (with possibility to "register" valuesources for specific fields).
In this case the SortField ctors taking ValueSource or Parser can be cancelled (and we can do this for 2.9, as the Parser ctor of SortField was not yet released!).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699644#action_12699644 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Right, you really want to use CacheByReaderValueSource. Better would probably be to get that cache on the segment reader as well. But I think that would mean bringing back some sort of general cache to IndexReader. You would have to be able to attach arbitrary ValueSources to the reader. We will see what ends up materializing. I am agonizingly slow at understanding anything, but quick to move anyway ;)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698151#action_12698151 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

or parsing is just done by the FieldValue implementation, with overrides or something? To change parsers you override UnivertedValuedSource returning your parsers in the callbacks (or something similiar?)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699545#action_12699545 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

Hi, looks good:

I am only not sure, what would be the right caching ValueSource. If you use a caching value source externally from IndexReader, what should I use? The original trie patch used the CachingValueSource (as when the patch was done, there only existed CacingValueSource):

{code}
+  public static final ValueSource TRIE_VALUE_SOURCE = new CachingValueSource(new TrieValueSource());
{code}

But correct would be CacheByReaderValueSource as a per-JVM singleton? For the tests is its not a problem, because there is only one index with one segment. If I use CachingValueSurce as a singleton, it would cache all values from all index readers mixed together?



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Rolling forward with Uninverter ...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698243#action_12698243 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

bq. At least, we should upgrade the exception from RuntimeException to a checked exception.
Exceptions are for expected conditions that can be adequately handled by the caller. RuntimeExceptions are for possible, but unexpected conditions that can theoretically be handled by the caller, but most of the times caller will terminate anyway.
Having several values in place where only one should be at all times is obviously an unexpected indexer's fault. So by using checked exception here you'll only provoke some ugly rethrowing/wrapping code, or propagation of said checked exception up the method hierarchy, without gaining any benefit at all.

bq. Or, we could turn the check off entirely.
Yes!

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697485#action_12697485 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}
I'd like to see the new FieldCache API de-emphasize "get me a single array holding all values for all docs in the index" for a MultiReader. That invocation is exceptionally costly in the context of reopened readers, and providing the illusion that one can simply get this array is dangerous. It's a "leaky API", like how virtual memory API pretends you can use more memory than is physically available.

I think it's OK to return an array-of-arrays (ie, one contiguous array per underlying segment); if the app really wants to make a massive array & concatenate it, they can do so outside of the FieldCache API. 
{quote}

Is there much difference in one massive array or an array of arrays? Its just as much space and just as dangerous, right? Some apps will need random access to the field cache for any given document right? Don't we always have to support that in some way, and won't it always be severely limited by RAM (until IO is as fast)?

I like the idea of an iterator API, but it seems we will still have to provide random access with all its problems, right?

{quote}
We should also set this API up as much as possible for LUCENE-1231. Ie, the current "un-invert the field" approach that FieldCache takes is merely one source of values per doc. Column stride fields in the future will be a different (faster) source of values, that should be able to "just plug in" under the hood somehow to this same exposure API.
{quote}

Definitely.

{quote}
On Uwe's suggestion for some flexibility on how the un-inversion takes place, I think allowing differing degrees of extension makes sense. EG we already allow you to provide a custom parser. We need to allow control on whether a given value replaces the already-seen value (LUCENE-1372), or whether to stop the looping early (Uwe's needs for improving Trie). We should also allow outright entire custom class that creates the value array.
{quote}

Allow a custom field cache loader for each type?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698423#action_12698423 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Okay, good point. I've got to take a closer look at what we are required to support for remote searching.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Newson updated LUCENE-831:
---------------------------------

    Attachment: ExtendedDocument.java


Type-safe Document-style object. Doesn't extend Document as it is final.



> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch reassigned LUCENE-831:
------------------------------------

    Assignee:     (was: Michael Busch)

I'm not sure how soon I'll have time to work on this; I don't want to block progress here, so I'm unassigning it for now in case someone else has time.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513777 ] 

Hoss Man commented on LUCENE-831:
---------------------------------

thanks for the feedback mark ... i honestly haven't looked at this patch since the last time i updated the issue ... (i'm not sure if i've even thought about it once since then).  it's the kind of things that seemed really cool important at the time, but then ... you know, other things come up.

by all means, feel free to update it.

as i recall, the biggest thing about this patch that was really just pie in the sky and may not make any sense is the whole concept of merging and letting subreaders of MultiReader do their own caching which could then percolate up.  I did it on the assumption that it would come in handy when reopening an IndexReader that contains several segments -- many of which may not have changed since the last time you opened the index.  but i really didn't have any idea how the whole reopening things would work.  i see now there is some reopen code in LUCENE-743, but frankly i'm still not sure wether the API makes sense, or is total overkill.

it might be better to gut the merging logic from the patch and add it later if/when there becomes a more real use case for it (the existing mergeData and isMergable methods could always be re-added to the abstract base classes if it turns out they do make sense)


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

I've added your SortField constructor Uwe.

I've also switched things to what I proposed.

You can access a ValueSource from SegmentReaders with getValueSource. You get a USO Exception on non SegmentReaders with the hint to use getSequentialSubReaders.
The built in FieldCache usage uses this new API instead, allowing caching per segment without global caching keyed by IndexReader. Its hardwired at the moment, but you would be able to set it on IndexReader open.

There is a back compat CacheByReaderUninverterValueSource that caches in a WeakHashMap by IndexReader and Uninverter (Uninverter is actually keyed by the Parser it wraps). This
works as a Singleton. Deprecated code, like the FieldCache uses this. You wouldn't want to straddle both API's, because for identical types / uninversion you would duplicate data (eg if you asked both APIs for the same data type with
the same parser/uninverter and field, you would get twice the data in RAM). This method still works with MultiReaders, and so nicely solves back compat with regards to only allowing access to ValueSource from the SegmentReader.

You can also set a ValueSource on a SortField. This would override the ValueSource used at sort time to one provided. There is a convenience class called CacheByReaderValueSource that caches
by Reader, field, type, key.

I think that solves all three needs (and my caching concerns) in a pretty nice way. Patch is not done, but all tests now pass except for TrieRange (have not run back compat tests yet due to Trie failure - havnt looked into yet either).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698394#action_12698394 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Also, this early, I know you'll have me changing directions fast enough that I'm still in pretty rough hammer out mode.

Yeah I hear you ;) Things're fast moving... I'm glad you're good with
The Eclipse.

bq. Do we need Uninverter if overriding getXXX is easy enough and we pass the ValueSource on SortField?

Good question... though it is nice/important to not have to implement
your own TermEnum/TermDocs iteration logic.

bq. Being able to change the ValueSource on the fly now with SortField has implications for CachingValueSource.

I think an app would need to pay attention here, ie, if caching is
needed the app should always pass the same CachingValueSource to all
sort fields.  It's up to the app to scope their ValueSources
correctly.  You're right that if you have a bug and don't always use a
consistent CachingValueSource, you can use too much RAM since a given
field's int[] can be double cached; but I think that's a bug in the
app?  It's analogous to using 2 IndexReaders instead of sharing?

Though... I'm not sure.  Something's not yet right about our approach
but I can't think of how to fix it... will mull it over.

I wonder if we can somehow fit norms handling into this?  Norms ought
to be some sort of XXX.getBytes() somewhere, but I'm not sure how
yet.  It's tricky because norms accept changes and thus must implement
copy-on-write.  So maybe we levae them out for now...


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Marvin Humphrey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698242#action_12698242 ] 

Marvin Humphrey commented on LUCENE-831:
----------------------------------------

> Another thing that'd be great to fix about FieldCache is its
> intermittent checking of the "some docs had more than one token in the
> field" error.

Add a FieldType that only allows one value per document. At index-time, 
verify when the doc is added that indeed, only one value was supplied.

In Lucy, I expect StringType to fill this role. FullTextType is for multi-token
fields.

Optionally, add a "NOT NULL" check to verify that each doc supplies a
value, or allow the FieldType object to specify a default value that should
be inserted.


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581189#action_12581189 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

I spent a little time getting this patch somewhat updated to the trunk and running various benchmarks with reopen. As expected, the sequence of searching a large index with a sort, adding a few docs and then reopening a Reader to perform a sorted search, can be 10 of times faster.

I think the new API is great too - I really like being able to experiment with other caching strategies. I find the code easier to follow as well.

Did you have any ideas for merging a String index? That is something that still needs to be done...

- Mark

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698226#action_12698226 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

Looks good, one addition: newTerm() could return false to stop iterating (for trie).

But I do not know how performat this is with the class-global variable currentVal...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.03.31.2008.diff

-Further refinements and code changes.
-Fixed ord comparisons across IndexReaders - as yonik pointed out. Standard sort tests didn't catch and I missed even with the reminder.
-Added a bunch of CacheKey unit tests.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698196#action_12698196 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

> I like this idea, but i am a little bit concerned about only one ValueSource for the Reader. 

Thinking more about this...

Over in KS/Lucy, the approach Marvin is taking is something called a
FieldSpec, to define the "extended type" for a field.  The idea is to
strongly decouple a field's type from its value, allowing that type to
be shared across different fields & instances of the same field.

So in KS/Lucy, presumably IndexReader would simply consult the
FieldSpec for a given field, to determine which ValueSource impl is
responsible for producing values for this field.

Right now details for a field are scattered about (PerFieldValueSource
and PerFieldAnalyzerWrapper and Field.Index/Store/TermVector.*,
FieldInfo, etc.). This then requires alot of app-level code to
properly use Trie* fields -- you have to use Trie* to analyze the
field, use Trie* to construct the query, use PerFieldValueSource to
populate the FieldCache, etc.

Maybe, as part of the cleanup of our three *Field classes, and index
vs search time documents, we should make steps towards having a
consolidated class that represents the "extended type" of a field.
Then in theory one could make a Field, attach a NumericFieldType() to
it (after renaming Trie* -> Numeric*), and then everything would
default properly.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-831:
---------------------------------

    Attachment: LUCENE-831-trieimpl.patch

I did some tests with the new API and TrieRange. Without patch it did not work because the missing StopFillCacheException (but this was intended, as it was only a hack).

I created a TrieValueSource, that is similar like the UninversionValueSource, but iterates only over shift=0 prefix encoded values.

The original patch only missed the following ctor of SortField:
{code}
  public SortField (String field, int type, ValueSource valueSource, boolean reverse) {
    initFieldType(field, type);
    this.reverse = reverse;
    this.valueSource = valueSource;
  }
{code}

After adding this, it worked (this ctor creates a "normal" sortfield but without custom comparator, but with a specific value-source). This ctor would replace the deprecated parser ctor.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697495#action_12697495 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}
One massive array is far more dangerous during reopen() (ie that's why we did LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that "dangerous".
{quote}

Okay, I got you now. We will force everyone to migrate to use fieldcache at the segment level rather than MR (or create their own array from the subarrays).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624844#action_12624844 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

That patch may have a goof , I'll peel off another soon. Unfortunately, it looks like this is slower than the orig implementation. I have to run more benchmarks, but it might even be in the 10-20% mark. My guess is that its the method calls - you may gain much faster reopen, but you appear to lose quite a bit on standard sort speed...

Could give the choice of going with the arrays, if they prove as fast as orig, but then back to needing an efficient StringIndex merge...

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689620#action_12689620 ] 

Michael Busch commented on LUCENE-831:
--------------------------------------

Shall we try to gather all requirements we have for this feature here?
I think recently new requirements were mentioned and they are now scattered accross different issues and email threads.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515914 ] 

Otis Gospodnetic commented on LUCENE-831:
-----------------------------------------

Catching up with java-dev (just 300-400 more emails to go! ;)), I spotted this in "Lucene 2.2 soon"  thread (Doron's email):

The part I am not sure of in this regard are my changes
to FieldCache and FieldcacheImpl, while LUCENE-831 is
ongoing too. (though btw 831 doesn't apply cleanly on
current trunk). (It is my plan to get into LUCENE-831
but I haven't got to it yet.)


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>         Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582443#action_12582443 ] 

Michael Busch commented on LUCENE-831:
--------------------------------------

{quote}
The benefit then is that reopen() of a reader would not have to
allocate & bulk copy massive arrays when updating the caches. It
would keep the cost of reopen closer to the size of the new segments.
{quote}

I agree, Mike. Currently during reopen() the MultiSegmentReader 
allocates a new norms array with size maxDoc(), which is, as you said,
inefficient if only some (maybe even small) segments changed.

The method call might be a little slower than the array lookup, but
I doubt that this would be very significant. We can make this change for
the norms and run performance tests to measure the slowdown.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Michael Busch
>             Fix For: 2.4
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698225#action_12698225 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------


How about something like this (NOTE: not compiled/tested):

{code}
abstract class Uninverter {
  abstract void newTerm(String text);
  abstract void handleDoc(int docID);
  void go(IndexReader r) {
    ... TermEnum/TermDocs uninvert code...
  }
}
{code}

and then:

{code}
class IntUninverter extends Uninverter {
  final int[] values;
  IntUninverter(IndexReader r) {
    values = new int[r.maxDoc()];
  }

  int currentVal;
  void newTerm(String text) {
    currentVal = Intger.parseInt(text);
  }

  void handleDoc(int docID) {
    values[docID] = currentVal;
  }
}
{code}


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698214#action_12698214 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}
I think the UninversionValueSource would accept a custom parser (String -> native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or perhaps outright overriding of the inversion loop itself (which if we make that class subclass-able should be simple).
{quote}

Its the accepting that seems tricky though. If the getInts() calls take the parser, you have to use instanceof code to work with ValueSource.  Thats why I was thinking maybe callbacks - if a new type is added you just add a new one returning a default parser. Then you can just extend and replace the parsers you want to. I wasn't a big fan of that idea, but I am not sure of a nice, clean, extensible way to specify a bunch of parsers to UninversionValueSource that allows the API to cleanly be used from ValueSource. It already kind of seemed annoying that you would have to set a new ValueSource on the reader just to specify different parsers. I guess at least that has to be accepted though.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Might as well post as far as I got in case anyone else ends up wanting to take a few swings.

I think all tests pass except for one remote in testsort (probably related to caching?), and clone checks in an IndexReader test that try and ensure the fieldcache contents are the same instance (they are not because of the caching - havnt gotten that far).

So the major issue remaining I think (other than natural cleanup and back test thinking) is getting the caching right. There will probably be other changes too, but I think getting the caching right might drive them.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693464#action_12693464 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

Let's make sure the new API fixes LUCENE-1579.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Moves ValueSource to SortField, giving us a caching problem. I've got a DEFAULT_VALUESOURCE in IndexReader that needs movement to some refinement.

No custom field comparator support yet, but parser back compat test passes.

I really am liking how its feeling now - have to think about the caching though.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698096#action_12698096 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Thanks Mike! Everything makes sense on first read, so I'll work in that direction.

In regards to FieldCache - yes it def will be deprecated (pretty rough patch to start so I might have missed some of that - I know plenty is undone).

As far as back compat with it, I was trying to make it so that if you happened to have code that still used it, and you used the new cache, you woudn't double your mem needs (the original point of backing FieldCache with the new API). Thats pretty restrictive though - indeed, things look much nicer if we don't attempt that (and in the case of a MultiReader cache array, we wouldn't be able to avoid it anyway, so I guess it does make sense we don't worry about it so much).

bq. Do we even need ComparatorFactory*?

Probably not then - I didn't really touch any of the custom type stuff yet. I mainly just got all of the sort tests except remote and custom to pass (though tests elsewhere are still failing).

I'll make another push with your suggestions.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697820#action_12697820 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

{quote}
If we are going to allow random access, I like the idea of sticking
with the arrays. They are faster than hiding behind a method, and it
allows easier movement from the old API.
{quote}

I agree.

{quote}
It would be nice if we can
still deprecate all of that by backing it with the new impl (as done
with the old patch).
{quote}

That seems fine?

bq. The current API (from this patch) still looks fairly good to me - a given cachekey gets your data, and knows how to construct it. You get data something like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It could be improved, but it seems a good start to me.

Agreed.

bq. The immediate problem I see is how to handle multireader vs reader. Not being able to treat them the same is a real pain. In the segment case, you just want an array back, in the multi-segment perhaps an array of arrays? Or unsupported? I havn't thought of anything nice.

I would lean towards throwing UOE, and suggesting that you call
getSequentialReaders instead.

Eg with the new getUniqueTermCount() we do that.

bq. We have always been able to customize a lot of behavior with our custom sort types - I guess the real issue is making the built in sort types customizable. So I guess we need someway to say, use this "cachekey" for this built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.

bq. When we load the new caches in FieldComparator, can we count on those being segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I suppose its really tied in with the multireader vs reader API.

Once getSequentialSubReaders() is called (and, recursively if needed),
then those "atomic" readers should be able to provide values.  I guess
that's the contract we require of a given IndexReader impl?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Marvin Humphrey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656150#action_12656150 ] 

Marvin Humphrey commented on LUCENE-831:
----------------------------------------

>> Building the docID -> ord array is straightforward for a single-segment
>> SegLexicon. The multi-segment case requires that several SegLexicons be
>> collated using a priority queue. In KS, there's a MultiLexicon class which
>> handles this; I don't believe that Lucene has an analogous class.
> 
> Lucene achieves the same functionality by using a MultiReader to read
> the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
> uses a pqueue under the hood) and building up StringIndex from that.
> It's very costly.

Ah, you're right, that class is analogous.  The difference is that
MultiTermEnum doesn't implement seek(), let alone seekByNum().  I was pretty
sure you wouldn't have bothered, since by loading the actual term values into
an array you eliminate the need for seeking the iterator.

> OK so your "keyword" field type would expose random-access to field
> values by docID, 

Yes.  There would be three files for each keyword field in a segment.

  * docID -> ord map.  A stack of i32_t, one per doc.
  * Character data.  Each unique field value would be stored as uncompressed
    UTF-8, sorted lexically (by default).
  * Term offsets.  A stack of i64_t, one per term plus one, demarcating the 
    term text boundaries in the character data file.

Assuming that we've mmap'd those files -- or slurped them -- here's the
function to find the keyword value associated with a doc num:

{code}
void
KWField_Look_Up(KeyWordField *self, i32_t doc_num, ViewCharBuf *target)
{
    if (doc_num > self->max_doc) {
        CONFESS("Doc num out of range: %u32 %u32", 
    }
    else {
        i64_t offset      = self->offsets[doc_num];
        i64_t next_offset = self->offsets[doc_num + 1];
        i64_t len         = next_offset - offset;
        ViewCB_Assign_Str(target, self->chardata + offset, len);
    }
}
{code}

I'm not sure whether IndexReader.fetchDoc() should retrieve the values for
keyword fields by default, but I lean towards yes.  The locality isn't ideal,
but I don't think it'll be bad enough to contemplate storing keyword values
redundantly alongside the other stored field values.

> to be used to merge the N segments' pqueues into a
> single final pqueue?

Yes, although I think you only need one two priority queues total: one
dedicated to iterating intra-segment, which gets emptied out after each
seg into the other, final queue.

> The alternative is to use iterator but pull the values into your
> pqueues when they are inserted. The benefit is iterator-only
> exposure, but the downside is likely higher net cost of insertion.
> And if the "assumption" is these fields can generally be ram resident
> (explicitly or via mmap), then the net benefit of iterator-only API is
> not high.

If I understand where you're going, you'd like to apply the design of the
deletions iterator to this problem?

For that to work, we'd need to store values for each document, rather than
only unique values... right?  And they couldn't be stored in sorted order,
because we aren't pre-sorting the docs in the segment according to the value
of a keyword field -- which means string diffs don't help.  You'd have a
single file, with each doc's values encoded as a vbyte byte-count followed by
UTF-8 character data.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699166#action_12699166 ] 

Mark Miller edited comment on LUCENE-831 at 4/15/09 5:16 AM:
-------------------------------------------------------------

Thinking a bit on this this morning:

I think that will work out right. We would have 3 different attacks at ValueSource.

1. A ValueSource per segment reader that handles all default ValueSource needs - you get it with IndexReader.getValueSource. Its an UninversionValueSource wrapped by a CachingValueSource by default.

2. A singleton back compat value source that is wrapped by CacheByReaderValueSource. It has extra methods that takes Uninverters, allowing custom Uninverters and caching by Uninverter.

3. You can override the ValueSource used for Sorting by attaching it to the SortField. Likely, you would wrap in a CacheByReaderValueSource and have your own singleton.

I think that gives us back compat and the best of both worlds?

      was (Author: markrmiller@gmail.com):
    Thinking a bit on this this morning:

I think that will work out right. We would have 3 different attacks at ValueSource.

1. A ValueSource per segment reader that handles all default ValueSource needs - you get it with IndexReader.getValueSource. Its an UninversionValueSource wrapped by a CachingValueSource by default.

2. A singleton back compat value source that is wrapped by CacheByReaderValueSource. It has extra methods that takes Uninverters, allowing custom Uninverters and caching by Uninverter.

3. You can override the ValueSource used for Sorting by attaching it to the SortField. Likely, you would weap in a CacheByReaderValueSource and have your own singleton.

I think that gives us back compat and the best of both worlds?
  
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700479#action_12700479 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

{quote}Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
"deleted docs" one wanted.{quote}

Some of your comments seem to indicate you think we will need to end up with an object rather than raw arrays?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697832#action_12697832 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Yes, good point. Okay, I think I have a much clearer picture of what needs to be done - this may be less work than I thought - a lot of what has been done is probably still helpful.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Hmmm - not sure what is up. There is already one small conflict for me (trunk is a rapidly changing target :) ), but its a pretty simple conflict.

There are revision number issues (I was connected to an older revision apparently). If thats the problem, try this patch (which also resolves the new simple conflict).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699440#action_12699440 ] 

Mark Miller commented on LUCENE-831:
------------------------------------

Thanks Uwe. I had it in my mind to update to trunk about an hour ago and...

I'll repost in a moment.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698233#action_12698233 ] 

Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

bq. I guess you would just have to set a TrieEnabledValueSource when creating your IndexReaders? I suppose it could extend UninversionValueSource and for given fields do Trie unencoding, uninverting, and all other fields, yeld to the super impl?
And then if you get some other XXX encoding, you'll end up with XXXVS extends UVS, TrieVS extends UVS, XXXAndTrieVS extends XXXVS or TrieVS + duplicate code from the other one. Ugly.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reassigned LUCENE-831:
----------------------------------

    Assignee:     (was: Mark Miller)

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654106#action_12654106 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

Maybe we need two trunks or branches or whatever :-). One for 2.9 and one for 3.0. This is a typical example for that in my opinion.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-831:
-------------------------------

    Attachment: LUCENE-831.patch

Couple of needed tweaks and a test for a custom ComparatorFactory.

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698195#action_12698195 ] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Any ideas on where parser fits in with valuesource?

I think the UninversionValueSource would accept a custom parser (String -> native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or perhaps outright overriding of the inversion loop itself (which if we make that class subclass-able should be simple).

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654105#action_12654105 ] 

Uwe Schindler commented on LUCENE-831:
--------------------------------------

Maybe every asignee should tag his issues that are related to sorting to be related (or similar) to this one. I am thinking about the latest developments in LUCENE-1478 and LUCENE-1481

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>             Fix For: 3.0
>
>         Attachments: fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff, fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org