You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Gerald Bonfiglio (Jira)" <ji...@apache.org> on 2022/06/27 17:31:00 UTC

[jira] [Updated] (SOLR-16267) JSON Facet Stats methods include docs with no field value when using nested function

     [ https://issues.apache.org/jira/browse/SOLR-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gerald Bonfiglio updated SOLR-16267:
------------------------------------
    Description: 
I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions.  Below is an example which hopefully helps illustrate the behavior I’m seeing.

 

I have a JSON Facet string of the following:
{code:java}
json.facet={
   "grp_0": {
      "field": "ssnm",
      "limit": -1,
      "type": "terms",
      "mincount": 1,
      "refine": true,
      "sort": {"index": "asc"},
      "facet": {
         "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
        "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
         "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
         "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
         "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
         "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
      }
   }
}
{code}
 

And an example of one of the buckets returned is:
{code:java}
  "facets":{
    "count":32,
    "grp_0":{
      "buckets":[{
          "val":"Activity",
          "count":6,
          "count_sqrt_TotalCpuUsec":6,
          "sum_sqrt_TotalCpuUsec":495.29246931322893,
          "count_TotalCpuUsec":4,
          "sum_TotalCpuUsec":61464.399999999994,
          "avg_TotalCpuUsec":15366.099999999999,
          "avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
 

Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).  My issue is with the calculation of avg(sqrt(TotalCpuUsec)).  The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4.  However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6.  I think it should have been divided by 4, since only 4 documents have a value for this field.  It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.

This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.

  was:
I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions.  Below is an example which hopefully helps illustrate the behavior I’m seeing.

I have a JSON Facet string of the following:

 
{code:java}
json.facet={
   "grp_0": {
      "field": "ssnm",
      "limit": -1,
      "type": "terms",
      "mincount": 1,
      "refine": true,
      "sort": {"index": "asc"},
      "facet": {
         "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
        "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
         "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
         "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
         "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
         "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
      }
   }
}
{code}
And an example of one of the buckets returned is:

 
{code:java}
  "facets":{
    "count":32,
    "grp_0":{
      "buckets":[{
          "val":"Activity",
          "count":6,
          "count_sqrt_TotalCpuUsec":6,
          "sum_sqrt_TotalCpuUsec":495.29246931322893,
          "count_TotalCpuUsec":4,
          "sum_TotalCpuUsec":61464.399999999994,
          "avg_TotalCpuUsec":15366.099999999999,
          "avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
 

Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).  My issue is with the calculation of avg(sqrt(TotalCpuUsec)).  The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4.  However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6.  I think it should have been divided by 4, since only 4 documents have a value for this field.  It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.

This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.


> JSON Facet Stats methods include docs with no field value when using nested function
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-16267
>                 URL: https://issues.apache.org/jira/browse/SOLR-16267
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>    Affects Versions: 8.11.1
>         Environment: Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-184-generic x86_64)
> Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
>            Reporter: Gerald Bonfiglio
>            Priority: Major
>              Labels: Facet, JSON
>
> I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions.  Below is an example which hopefully helps illustrate the behavior I’m seeing.
>  
> I have a JSON Facet string of the following:
> {code:java}
> json.facet={
>    "grp_0": {
>       "field": "ssnm",
>       "limit": -1,
>       "type": "terms",
>       "mincount": 1,
>       "refine": true,
>       "sort": {"index": "asc"},
>       "facet": {
>          "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
>         "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
>          "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
>          "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
>          "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
>          "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
>       }
>    }
> }
> {code}
>  
> And an example of one of the buckets returned is:
> {code:java}
>   "facets":{
>     "count":32,
>     "grp_0":{
>       "buckets":[{
>           "val":"Activity",
>           "count":6,
>           "count_sqrt_TotalCpuUsec":6,
>           "sum_sqrt_TotalCpuUsec":495.29246931322893,
>           "count_TotalCpuUsec":4,
>           "sum_TotalCpuUsec":61464.399999999994,
>           "avg_TotalCpuUsec":15366.099999999999,
>           "avg_sqrt_TotalCpuUsec":82.54874488553816},
> .
> .
> .
> } ]}}}
> {code}
>  
> Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).  My issue is with the calculation of avg(sqrt(TotalCpuUsec)).  The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4.  However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6.  I think it should have been divided by 4, since only 4 documents have a value for this field.  It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.
> This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org