You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Keith Laban (JIRA)" <ji...@apache.org> on 2016/04/25 21:05:13 UTC

[jira] [Comment Edited] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

    [ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256797#comment-15256797 ] 

Keith Laban edited comment on SOLR-8988 at 4/25/16 7:04 PM:
------------------------------------------------------------

Added second version of patch which has this feature disabled by default but can be enabled with {{facet.distrib.mco=true}}. 

I also did some benchmarking and under all scenarios tested the new way is either the same or way faster. The test was with 12 shards everything evenly distributed. 

Two things to note about this test:
- All terms have the same count which would be the worst case for refinement which is evident in the shape of each graph. Overrequesting is far more efficient.
- All segments are evenly distributed however in the real world, the best performance gains for this patch would be seen when there are many segments which contain no relevant terms for the query.

More details about the test.
- 2 node cloud running locally each with 4g
- 12 shards without replication (only 12 total cores)
- terms were integers with doc values enabled
- instances were restarted after each test to avoid lingering GC issues, however each test had some warmup queries before running the test
- The Y-axis is average QTime(ms) over 100 test runs


was (Author: k317h):
Added second version of patch which has this feature disabled by default but can be enabled with {{facet.distrib.mco=true}}. 

I also did some benchmarking and under all scenarios tested the new way is either the same or way faster. The test was with 12 shards everything evenly distributed. 

Two things to note about this test:
- All terms have the same count which would be the worst case for refinement which is evident in the shape of each graph. Overrequesting is far more efficient.
- All segments are evenly distributed however in the real world, the best performance gains for this patch would be seen when there are many segments which contain no relevant terms for the query.

> Improve facet.method=fcs performance in SolrCloud
> -------------------------------------------------
>
>                 Key: SOLR-8988
>                 URL: https://issues.apache.org/jira/browse/SOLR-8988
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. As far as I can tell there is no reason to set {{facet.mincount=0}} for refinement purposes . After trying to make sense of all the refinement logic, I cant see how the difference between _no value_ and _value=0_ would have a negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org