You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/01/27 12:54:37 UTC

[jira] [Created] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters

Michael McCandless created LUCENE-5418:
------------------------------------------

             Summary: Don't use .advance on costly (e.g. distance range facets) filters
                 Key: LUCENE-5418
                 URL: https://issues.apache.org/jira/browse/LUCENE-5418
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/facet
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 5.0, 4.7


If you use a distance filter today (see http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html ), then drill down on one of those ranges, under the hood Lucene is using .advance on the Filter, which is very costly because we end up computing distance on (possibly many) hits that don't match the query.

It's better performance to find the hits matching the Query first, and then check the filter.

FilteredQuery can already do this today, when you use its QUERY_FIRST_FILTER_STRATEGY.  This essentially accomplishes the same thing as Solr's "post filters" (I think?) but with a far simpler/better/less code approach.

E.g., I believe ElasticSearch uses this API when it applies costly filters.

Longish term, I think  Query/Filter ought to know itself that it's expensive, and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed to IndexSearcher.search, we should also be "smart" here and not call .advance on such clauses.  But that'd be a biggish change ... so for today the "workaround" is the user must carefully construct the FilteredQuery themselves.

In the mean time, as another workaround, I want to fix DrillSideways so that when you drill down on such filters it doesn't use .advance; this should give a good speedup for the "normal path" API usage with a costly filter.

I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I plan to merge this back to trunk / 4.7.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org