You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/30 21:31:00 UTC

[jira] [Commented] (SOLR-11711) Improve memory usage of pivot facets

    [ https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273492#comment-16273492 ] 

ASF GitHub Bot commented on SOLR-11711:
---------------------------------------

GitHub user HoustonPutman opened a pull request:

    https://github.com/apache/lucene-solr/pull/279

    SOLR-11711: Improved memory usage for distributed field and pivot facets.

    Removed the FACET_DISTRIB_MCO option, since the behavior is now built in.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HoustonPutman/lucene-solr pivot_facet_memory_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #279
    
----
commit 8b7ef286100730e26a9bdc8875fce31a5b47b59a
Author: Houston Putman <hp...@bloomberg.net>
Date:   2017-11-30T21:10:50Z

    Removed FACET_DISTRIB_MCO option, improved memory usage for distributed field and pivot facets.

----


> Improve memory usage of pivot facets
> ------------------------------------
>
>                 Key: SOLR-11711
>                 URL: https://issues.apache.org/jira/browse/SOLR-11711
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: faceting
>    Affects Versions: master (8.0)
>            Reporter: Houston Putman
>              Labels: pull-request-available
>             Fix For: 5.6, 6.7, 7.2
>
>
> Currently while sending pivot facet requests to each shard, the {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with a specified limit > 0. However with a mincount of 0, the pivot facet will use exponentially more wasted memory for every pivot field added. This is because there will be a total of {{limit^(# of pivots)}} pivot values created in memory, even though the vast majority of them will have counts of 0, and are therefore useless.
> Imagine the scenario of a pivot facet with 3 levels, and `facet.limit=1000`. There will be a billion pivot values created, and there will almost definitely be nowhere near a billion pivot values with counts > 0.
> This likely due to the reasoning mentioned in [this comment in the original distributed pivot facet ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898]. Basically it was thought that the refinement code would need to know that a count was 0 for a shard so that a refinement request wasn't sent to that shard. However this is checked in the code, [in this part of the refinement candidate checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275]. Therefore if the {{pivot.mincount}} was set to 1, the non-existent values would either:
> * Not be known, because the {{facet.limit}} was smaller than the number of facet values with positive counts. This isn't an issue, because they wouldn't have been returned with {{pivot.mincount}} set to 0.
> * Would be known, because the {{facet.limit}} would be larger than the number of facet values returned. therefore this conditional would return false (since we are only talking about pivot facets sorted by count).
> The solution, is to use the same pivot mincount as would be used if no limit was specified. 
> This also relates to a similar problem in field faceting that was "fixed" in [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The solution was to add a flag, {{facet.distrib.mco}}, which would enable not choosing a mincount of 0 when unnessesary. Since this flag can only increase performance, and doesn't break any queries I have removed it as an option and replaced the code to use the feature always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org