You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Sturge (JIRA)" <ji...@apache.org> on 2008/12/10 22:18:44 UTC

[jira] Created: (LUCENE-1487) FieldCacheTermsFilter

FieldCacheTermsFilter
---------------------

                 Key: LUCENE-1487
                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
             Project: Lucene - Java
          Issue Type: New Feature
            Reporter: Tim Sturge


This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655517#action_12655517 ] 

Otis Gospodnetic commented on LUCENE-1487:
------------------------------------------

Would it be possible to reformat to use Lucene code style and add a bit of javadoc/unit test?  Eclipse and IDEA styles are at the bottom of http://wiki.apache.org/lucene-java/HowToContribute


> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler closed LUCENE-1487.
---------------------------------

    Resolution: Fixed

Sorry, I reopened the wrong issue, the correct class is FieldCacheRangeFilter.

Closing again.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668466#action_12668466 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

How about this:

{code}
/**
 * A {@link Filter} that only accepts documents whose single
 * term value in the specified field is contained in the
 * provided set of allowed terms.
 * 
 * <p/>
 * 
 * This is the same functionality as TermsFilter (from
 * contrib/queries), except this filter requires that the
 * field contains only a single term for all documents.
 * Because of drastically different implementations, they
 * also have different performance characteristics, as
 * described below.
 * 
 * <p/>
 * 
 * The first invocation of this filter on a given field will
 * be slower, since a {@link FieldCache.StringIndex} must be
 * created.  Subsequent invocations using the same field
 * will re-use this cache.  However, as with all
 * functionality based on {@link FieldCache}, persistent RAM
 * is consumed to hold the cache, and is not freed until the
 * {@link IndexReader} is closed.  In contrast, TermsFilter
 * has no persistent RAM consumption.
 * 
 * 
 * <p/>
 * 
 * With each search, this filter translates the specified
 * set of Terms into a private {@link OpenBitSet} keyed by
 * term number per unique {@link IndexReader} (normally one
 * reader per segment).  Then, during matching, the term
 * number for each docID is retrieved from the cache and
 * then checked for inclusion using the {@link OpenBitSet}.
 * Since all testing is done using RAM resident data
 * structures, performance should be very fast, most likely
 * fast enough to not require further caching of the
 * DocIdSet for each possible combination of terms.
 * However, because docIDs are simply scanned linearly, an
 * index with a great many small documents may find this
 * linear scan too costly.
 * 
 * <p/>
 * 
 * In contrast, TermsFilter builds up an {@link OpenBitSet},
 * keyed by docID, every time it's created, by enumerating
 * through all matching docs using {@link TermDocs} to seek
 * and scan through each term's docID list.  While there is
 * no linear scan of all docIDs, besides the allocation of
 * the underlying array in the {@link OpenBitSet}, this
 * approach requires a number of "disk seeks" in proportion
 * to the number of terms, which can be exceptionally costly
 * when there are cache misses in the OS's IO cache.
 * 
 * <p/>
 * 
 * Generally, this filter will be slower on the first
 * invocation for a given field, but subsequent invocations,
 * even if you change the allowed set of Terms, should be
 * faster than TermsFilter, especially as the number of
 * Terms being matched increases.  If you are matching only
 * a very small number of terms, and those terms in turn
 * match a very small number of documents, TermsFilter may
 * perform faster.
 *
 * <p/>
 *
 * Which filter is best is very application dependent.
 */
{code}

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-1487:
-------------------------------------

    Assignee:     (was: Uwe Schindler)

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656406#action_12656406 ] 

Yonik Seeley commented on LUCENE-1487:
--------------------------------------

FieldCacheStringFilter?
FieldCacheValueFilter?
FieldCacheMatchFilter?

Not sure if any of those are better though.  Perhaps it's enough that "FieldCache" is in the name to indicate that it only works on single-valued indexed fields that are able to be cached by the FieldCache.


> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated LUCENE-1487:
------------------------------------------

    Attachment: LUCENE-1487.patch

Attached a patch on trunk

# Adds Javadocs per the comments here and my understanding
# TestFieldCacheTermsFilter is a simple unit test

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1487:
----------------------------------

    Comment: was deleted

(was: This is why I closed LUCENE-1701 to get a good start to fold this in. It's not so complicated, just some restructuring and retrofit the inner classes to support all FieldCache types (only the next() Loop in the DocIdSetIterator is different))

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1487.
----------------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

Committed revision 738622.  Thanks Tim & Shalin!

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667996#action_12667996 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

Tim, are you still looking into this?  Or if you don't have the itch/time, does anyone else want to add javadocs & unit test for FieldCacheTermsFilter to move this forwards?

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668398#action_12668398 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

I agree: the wording can be improved.  I'll take a stab at it.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Sturge updated LUCENE-1487:
-------------------------------

    Attachment: FieldCacheTermsFilter.java

FieldCacheTermsFilter  using OpenBitSet.fastGet()

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723167#action_12723167 ] 

Uwe Schindler commented on LUCENE-1487:
---------------------------------------

This is why I closed LUCENE-1701 to get a good start to fold this in. It's not so complicated, just some restructuring and retrofit the inner classes to support all FieldCache types (only the next() Loop in the DocIdSetIterator is different)

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655757#action_12655757 ] 

Tim Sturge commented on LUCENE-1487:
------------------------------------

No problem at all. Should I assume this means that the idea is generally considered sound; the only question is getting something with sufficient tests/docs/level of finish?

I was expecting to get comments about the implementation first; last time what ended up going in was very different (in good ways) from my initial submission. 



> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-1487:
-------------------------------------

          Component/s: Search
        Lucene Fields: [New, Patch Available]  (was: [New])
    Affects Version/s: 2.4
        Fix Version/s: 2.9

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723098#action_12723098 ] 

Uwe Schindler edited comment on LUCENE-1487 at 6/23/09 8:28 AM:
----------------------------------------------------------------

Sorry *EDIT*:
This Filter is really cool on iterating on the FieldCache for StringIndex and can be even faster for ranges, that are int/float/double/... - so why not retrofit to our new naming-convention and extend:

- FieldCacheRangeFilter.newTermRange()
- FieldCacheRangeFilter.newByteRange()
- FieldCacheRangeFilter.newShortRange()
- FieldCacheRangeFilter.newIntRange()
- ...

It could because of that also be used on "old" int/long fields of dates, if a good parser is given (parser that does SimpleDateFormat -> long -> FieldCache -> direct comparison on this raw numbers). I would try to extend this to all types and it can be faster than TrieRange, if the range is already in FieldCache!

      was (Author: thetaphi):
    Is this Filter really needed anymore?

If yes, why not create an FieldCacheRangeFilter that can handle any data type from FieldCache:

- FieldCacheRangeFilter.newTermRange()
- FieldCacheRangeFilter.newByteRange()
- FieldCacheRangeFilter.newShortRange()
- FieldCacheRangeFilter.newIntRange()
- ...
  
> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668474#action_12668474 ] 

Mark Miller commented on LUCENE-1487:
-------------------------------------

+1

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656397#action_12656397 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

Yonik do you have any suggestions for a new name (I agree a new name would be better but can't think of one offhand).

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668173#action_12668173 ] 

Mark Miller commented on LUCENE-1487:
-------------------------------------

So the advantage appears to be that you can cache the field values and so calculate the filter faster for arbitrary terms, rather than having to calculate and cache a bitset for each set of terms if you used TermsFilter - Right? I think it should be easier to extract that info from the javadoc. And more clear on exactly what the tradeoffs are, and when I should choose which.

* The FieldCacheTermsFilter is faster than building a TermsFilter each time.

While I did figure it out eventually (if I figured it out right), I'm thinking it could be clearer. It could just be me though. I'm often a bit hazzy.



> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723158#action_12723158 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

bq.  why not retrofit to our new naming-convention and extend

+1!

Do you want to take a stab at this?


> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Sturge updated LUCENE-1487:
-------------------------------

    Attachment: FieldCacheTermsFilter.java

Reformatted version. I'm happy to change the name if that's the consensus but I can't think of any better alternatives right now.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1487:
------------------------------------------

    Assignee: Michael McCandless

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655854#action_12655854 ] 

Tim Sturge commented on LUCENE-1487:
------------------------------------

Mark, Otis, looking back over the bug history I totally see where you are coming from; I do look like I've just dumped this here without explanation which wasn't my intention.

Honestly I don't really know how useful this is; I think there's a set of cases where it works very well but how comparatively large that set is I am unsure. You can think of it as adding a level of indirection (from documents to terms) to filtering. 

The alternative (at least as far as I can see) is to do a union by term of sorted docid lists (which is fundamentally what a DisjunctionQuery does I think). There may well be other options.



> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656043#action_12656043 ] 

Yonik Seeley commented on LUCENE-1487:
--------------------------------------

I think the name should be different since it only works with single-valued fields, unlike other TermFilters and TermQueries.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reopened LUCENE-1487:
-----------------------------------


Is this Filter really needed anymore?

If yes, why not create an FieldCacheRangeFilter that can handle any data type from FieldCache:

- FieldCacheRangeFilter.newTermRange()
- FieldCacheRangeFilter.newByteRange()
- FieldCacheRangeFilter.newShortRange()
- FieldCacheRangeFilter.newIntRange()
- ...

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1487:
----------------------------------

    Comment: was deleted

(was: bq.  why not retrofit to our new naming-convention and extend

+1!

Do you want to take a stab at this?
)

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1487:
----------------------------------

    Comment: was deleted

(was: Sorry *EDIT*:
This Filter is really cool on iterating on the FieldCache for StringIndex and can be even faster for ranges, that are int/float/double/... - so why not retrofit to our new naming-convention and extend:

- FieldCacheRangeFilter.newTermRange()
- FieldCacheRangeFilter.newByteRange()
- FieldCacheRangeFilter.newShortRange()
- FieldCacheRangeFilter.newIntRange()
- ...

It could because of that also be used on "old" int/long fields of dates, if a good parser is given (parser that does SimpleDateFormat -> long -> FieldCache -> direct comparison on this raw numbers). I would try to extend this to all types and it can be faster than TrieRange, if the range is already in FieldCache!)

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655942#action_12655942 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

I think this is a useful filter impl, and a nice companion to FCRF.
I'd like to see it committed; formatting & test case are good next
steps.

TermsFilter (in contrib/queries) does the same thing, but creates a
bitset by docID up front by walking the TermDocs for each term.  An OR
query, wrapped in QueryWrapperFilter, is another way.

This impl uses FieldCache to create a bitset by term number and then
does a scan by docID, so it has different performance tradeoffs: for
"enum" fields (far more docs than unique terms -- like country, state,
etc.) it's fast to create this filter, and then applying the filter is
O(maxDocs) with a small constant factor.

I think for many apps it means you do not have to cache the filter
because creating & using it "on the fly" is plenty fast.



> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1487:
------------------------------------------

    Assignee: Uwe Schindler  (was: Michael McCandless)

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668157#action_12668157 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

Fabulous, thanks Shalin!  I changed UN_TOKENIZED --> NOT_ANALYZED in the javadoc,
and switched to MockRAMDirectory in the test.  I'll commit shortly.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657007#action_12657007 ] 

Michael McCandless commented on LUCENE-1487:
--------------------------------------------

{quote}
> Perhaps it's enough that "FieldCache" is in the name to indicate that it only works on single-valued indexed fields that are able to be cached by the FieldCache.
{quote}
This'd be my vote (keep the name FieldCacheTermsFilter).

Tim, the new patch looks great!  Could you add some javadocs describing the tradeoffs with this filter, and maybe a unit test?  Thanks.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655760#action_12655760 ] 

Mark Miller commented on LUCENE-1487:
-------------------------------------

Hold out Tim, your likely to get further comments before it goes in. I think Otis was just suggesting we start with those changes. Once your code is in the right format, your more likely to get a committer to spend some time with it. Sometimes we just reformat and add the tests ourselves depending on a host of factors, but in general, your more likely to get good comments  faster if that work has already been done.

Its a fair question to ask if the idea is sound, but just posting the work doesn't necessarily imply that you are looking for that advice before putting more work into what you have done. And many times questions do go unanswered, they are missed, people don't have the time at the moment - so its best to supply all of this stuff, unless you are prepared for a wait if their is no current interest in going over the patch.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655851#action_12655851 ] 

Tim Sturge commented on LUCENE-1487:
------------------------------------

I'm running a bit behind this week and I'm out most of next week so it may be a while before I get to this.

One thing I hope will be helpful in the interim is to repost here the java-dev exchange that lead to me posting this here; I suspect that many people who watch JIRA don't necessarily read java-dev as well and I hope the postings are informative.

Here's the exchange:

On 12/10/08 1:13 PM, "Tim Sturge" <ts...@hi5.com> wrote:

> Yes (mostly). It turns those terms into an OpenBitSet on the term array.
> Then it does a fastGet() in the next() and skipTo() loops to see if the term
> for that document is in the set.
> 
> The issue is that fastGet() is not as fast as the two inequalities in FCRF.
> I didn't directly benchmark FCTF against FCRF because I had a different
> application in mind for FCTF (location boxes). However it wasn't as
> efficient in that case as directly realizing the bit sets. This was mostly
> because in the application I had in mind there were a lot (>100K) of terms
> with relatively low frequency and queries that needed only a few hundred
> terms in the set.
> 
> I tried a sorted list of terms and Arrays.binarySearch() but that is way
> slower as is Set<Integer> (no surprise there). I was thinking about a custom
> hash table implementation but I'm not hopeful; it increases cycle cost and
> means 
> 
> So it is efficient but for a more limited set of cases than FCRF. My gut
> feeling is that FCRF is a better solution for "most" range filters, whereas
> FCTF is a better solution for "some" term set filters (versus creating
> TermsFilter objects on the fly each time) It all depends on how common the
> terms are and how large the sets of terms are. Lots of terms (or a few very
> common terms) it wins. A few less common terms it loses.
> 
> I'll open a JIRA issue for it.
> 
> Tim
> 
> On 12/10/08 12:45 PM, "Michael McCandless" <lu...@mikemccandless.com>
> wrote:
> 
>> 
>> It'd be great to get this into Lucene.
>> 
>> Does FieldCacheTermsFilter let you specify a set of arbitrary terms to
>> filter for, like TermsFilter in contrib/queries?  And it's space/time
>> efficient once FieldCache is populated?
>> 
>> Mike
>> 
>> Tim Sturge wrote:
>> 
>>> Mike, Mike,
>>> 
>>> I have an implementation of FieldCacheTermsFilter (which uses field
>>> cache to
>>> filter for a predefined set of terms) around if either of you are
>>> interested. It is faster than materializing the filter roughly when
>>> the
>>> filter matches more than 1% of the documents.
>>> 
>>> So it's not better for a large set of small filters (which you can
>>> materialize on the spot) but it is better for a small set (but more
>>> than 32)
>>> large filters.
>>> 
>>> Let me know if you're interested and I'll send it in.
>>> 
>>> Tim
>>> 
>>> On 12/10/08 3:34 AM, "Michael McCandless"
>>> <lu...@mikemccandless.com> wrote:
>>> 
>>>> 
>>>> In your approach, roughly how many filters do you have cached?  It
>>>> seems like it could be quite a few (one for each color, one for each
>>>> type, etc)?
>>>> 
>>>> You might be able to modify the new (on Lucene trunk)
>>>> FieldCacheRangeFilter to achieve this same filtering without actually
>>>> having to materialize the full bitset for each.
>>>> 
>>>> Mike
>>>> 
>>>> Michael Stoppelman wrote:
>>>> 
>>>>> Yeah looks similar to what we've implemented for ourselves
>>>>> (although I
>>>>> haven't looked at the implementation). We've got quite a custom
>>>>> version of
>>>>> lucene at this point. Using Solr at this point really isn't a viable
>>>>> option,
>>>>> but thanks for pointing this out.
>>>>> 
>>>>> M
>>>>> 
>>>>> On Tue, Dec 9, 2008 at 1:47 AM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>> 
>>>>>> 
>>>>>> This use case sounds alot like faceted navigation, which Solr
>>>>>> provides.
>>>>>> 
>>>>>> Mike
>>>>>> 
>>>>>> 
>>>>>> Michael Stoppelman wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying
>>>>>>> to
>>>>>>> integrate the new DodIdSet changes since
>>>>>>> o.a.l.search.Filter#bits() method
>>>>>>> is now depreciated. For our app we actually heavily rely on bits
>>>>>>> from the
>>>>>>> Filter to do post-query filtering (I explain why below).
>>>>>>> 
>>>>>>> For example, if someone searches for product: "ipod" and then
>>>>>>> filters a
>>>>>>> type: "nano" (e.g. mini/nano/regular) AND color: "red" (e.g.
>>>>>>> red/yellow/blue). In our current model the results are gathered in
>>>>>>> the
>>>>>>> following way:
>>>>>>> 
>>>>>>> 1) "ipod" w/o attributes is run and the results are stored in a
>>>>>>> hitcollector
>>>>>>> 2) "ipod" results are now filtered for color="red" AND type="mini"
>>>>>>> using
>>>>>>> the
>>>>>>> lucene Filters
>>>>>>> 3) The filtered results are returned to the user.
>>>>>>> 
>>>>>>> The reason that the attributes are filtered post-query is so that
>>>>>>> we can
>>>>>>> return the other types and colors the user can filter by in the
>>>>>>> future.
>>>>>>> Meaning the UI would be able to show "blue", "green", "pink",
>>>>>>> etc... if we
>>>>>>> pre-filtered results by color and type before hand we wouldn't
>>>>>>> know what
>>>>>>> the
>>>>>>> other filter options would be there for a broader result set.
>>>>>>> 
>>>>>>> Does anyone else have this use case? I'd imagine other folks are
>>>>>>> probably
>>>>>>> doing similar things to accomplish this.
>>>>>>> 
>>>>>>> M
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1487) FieldCacheTermsFilter

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668501#action_12668501 ] 

Shalin Shekhar Mangar commented on LUCENE-1487:
-----------------------------------------------

+1

This is much more clear. Thanks Michael.

> FieldCacheTermsFilter
> ---------------------
>
>                 Key: LUCENE-1487
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1487
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Tim Sturge
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: FieldCacheTermsFilter.java, FieldCacheTermsFilter.java, LUCENE-1487.patch
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of terms rather than a range. It works best when the set is comparatively large or the terms are comparatively common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org