You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2009/04/18 12:15:14 UTC

[jira] Issue Comment Edited: (LUCENE-1518) Merge Query and Filter classes

    [ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700437#action_12700437 ] 

Uwe Schindler edited comment on LUCENE-1518 at 4/18/09 3:14 AM:
----------------------------------------------------------------

My opinion is that the attached patch has most backwards-compatibility (with some small toString() issues), but it makes Filter a subclass of query with the default constant score logic. Further work to remove the extra Filter classes for MultiTermQueries (RangeFilter, PrefixFilter,...) and so on can be done later or as part of this patch (did'nt want to start with this before being sure, that the API looks good). 

This will not make anything faster, but make the API for users simplier. No longer differentiate between query and filter.

The optimization can be part of LUCENE-1345. In this case, BooleanQuery API does not need to be changed (no extra Filter clauses, because Filters can be added directly as normal Query clauses). The optimizations from 1345 then could directltly test with instanceof, if something is a filter.

WAS: One problem is Fuzzy Query that can also be used as Filter (subclass of MultiTermQuery), but there scoring is important.

EDIT:
This is not a problem: MultiTermQuery will also be a filter and vice versa. It depends on what rewrite does. If ConstantScoreMode is on, it will return itsself (through super.rewrite()), if it rewrites to boolean query, it should do it. In this case the result of rewrite is no longer a filter, and filter optimization do not apply. So important with this new API is, that rewrite is called first (even for filters, but as filters are queries now, this would automatically be done). The optimizations in 1345 then would only get the result of the rewrite (and instanceof checks would be on the rewritten one, which maybe a filter or not).

      was (Author: thetaphi):
    My opinion is that the attached patch has most backwards-compatibility (with some small toString() issues), but it makes Filter a subclass of query with the default constant score logic. Further work to remove the extra Filter classes for MultiTermQueries (RangeFilter, PrefixFilter,...) and so on can be done later or as part of this patch (did'nt want to start with this before being sure, that the API looks good). One problem is Fuzzy Query that can also be used as Filter (subclass of MultiTermQuery), but there scoring is important.

This will not make anything faster, but make the API for users simplier. No longer differentiate between query and filter.

The optimization can be part of LUCENE-1345. In this case, BooleanQuery API does not need to be changed (no extra Filter clauses, because Filters can be added directly as normal Query clauses). The optimizations from 1345 then could directltly test with instanceof, if something is a filter.
  
> Merge Query and Filter classes
> ------------------------------
>
>                 Key: LUCENE-1518
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1518
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1518.patch
>
>
> This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query.
> The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query.
> I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a "default" weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345).
> Here some ideas how to implement Searcher.search() with Query and Filter:
> - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents.
> - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before
> - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters.
> For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org