You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/02/18 01:40:12 UTC

[jira] [Updated] (SOLR-6349) LocalParams for enabling/disabling individual stats

     [ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-6349:
---------------------------
    Attachment: SOLR-6349.patch


Starting to get back into this, here's a quick checkpoint of some small progress


Step #1: This new patch brings Xu's latest patch up to date with trunk using the minimal changes that seemed to work -- in particular: I haven't started really digging into the code changes other then getting things to compile & tests to pass.

Step #2...

My main focus for now is making sure the tests are rock solid & all inclusive so we can then iterate on the code changes (see early comments about my cocerns with spreading hte logic arround).  Only 2 noticable changes in this patch...

* Fixed FacetPivotSmallTest.testPivotFacetStatsUnsortedTagged
** was prematurely specifying 'mean=true' but then trying to assert that all stats were returned
** beefed this up to also assert that it got an expected number of stats - if we add more stats in the future, this will be a canary that the test needs updated to assert the correct values for these new stats.

* StatsComponentTest
** added more asserts to the 3 testFieldStatisticsResults_TYPE_FieldAlwaysMissing to ensure expected values for all stats (when there is nothing to compute stats on)...{noformat}
// numerics & strings & dates
min=null
max=null
// just numerics
sum=0.0
sumOfSquares=0.0
stddev=0.0
mean=NaN
{noformat}
*** these are based on the current behavior of the code ... my initial gut reaction was that they should all be null, but a quick bit of research says that in maths the "empty sum" is defined as "0" -- if you start with that premise, then the values for the rest seems correct to me, but i'm definitely interested in knowing if there are contrary opinions (is NaN better?)
** included "expected number of stats" asserts in these tests as well - more canary's if/when future stats are added.


> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
>                 Key: SOLR-6349
>                 URL: https://issues.apache.org/jira/browse/SOLR-6349
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Hoss Man
>         Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch
>
>
> Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the "sum" (and it will definitely become excessively verbose in the responses).  
> The plan here is to use local params to make this configurable.  All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org