You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Gerald Bonfiglio (Jira)" <ji...@apache.org> on 2022/06/27 17:31:00 UTC
[jira] [Updated] (SOLR-16267) JSON Facet Stats methods include docs with no field value when using nested function
[ https://issues.apache.org/jira/browse/SOLR-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gerald Bonfiglio updated SOLR-16267:
------------------------------------
Description:
I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions. Below is an example which hopefully helps illustrate the behavior I’m seeing.
I have a JSON Facet string of the following:
{code:java}
json.facet={
"grp_0": {
"field": "ssnm",
"limit": -1,
"type": "terms",
"mincount": 1,
"refine": true,
"sort": {"index": "asc"},
"facet": {
"avg_TotalCpuUsec": "avg(TotalCpuUsec)",
"avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
"count_TotalCpuUsec": "countvals(TotalCpuUsec)",
"count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
"sum_TotalCpuUsec": "sum(TotalCpuUsec)",
"sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
}
}
}
{code}
And an example of one of the buckets returned is:
{code:java}
"facets":{
"count":32,
"grp_0":{
"buckets":[{
"val":"Activity",
"count":6,
"count_sqrt_TotalCpuUsec":6,
"sum_sqrt_TotalCpuUsec":495.29246931322893,
"count_TotalCpuUsec":4,
"sum_TotalCpuUsec":61464.399999999994,
"avg_TotalCpuUsec":15366.099999999999,
"avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec). My issue is with the calculation of avg(sqrt(TotalCpuUsec)). The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4. However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think it should have been divided by 4, since only 4 documents have a value for this field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.
This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.
was:
I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions. Below is an example which hopefully helps illustrate the behavior I’m seeing.
I have a JSON Facet string of the following:
{code:java}
json.facet={
"grp_0": {
"field": "ssnm",
"limit": -1,
"type": "terms",
"mincount": 1,
"refine": true,
"sort": {"index": "asc"},
"facet": {
"avg_TotalCpuUsec": "avg(TotalCpuUsec)",
"avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
"count_TotalCpuUsec": "countvals(TotalCpuUsec)",
"count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
"sum_TotalCpuUsec": "sum(TotalCpuUsec)",
"sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
}
}
}
{code}
And an example of one of the buckets returned is:
{code:java}
"facets":{
"count":32,
"grp_0":{
"buckets":[{
"val":"Activity",
"count":6,
"count_sqrt_TotalCpuUsec":6,
"sum_sqrt_TotalCpuUsec":495.29246931322893,
"count_TotalCpuUsec":4,
"sum_TotalCpuUsec":61464.399999999994,
"avg_TotalCpuUsec":15366.099999999999,
"avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec). My issue is with the calculation of avg(sqrt(TotalCpuUsec)). The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4. However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think it should have been divided by 4, since only 4 documents have a value for this field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.
This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.
> JSON Facet Stats methods include docs with no field value when using nested function
> ------------------------------------------------------------------------------------
>
> Key: SOLR-16267
> URL: https://issues.apache.org/jira/browse/SOLR-16267
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Affects Versions: 8.11.1
> Environment: Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-184-generic x86_64)
> Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> Reporter: Gerald Bonfiglio
> Priority: Major
> Labels: Facet, JSON
>
> I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions. Below is an example which hopefully helps illustrate the behavior I’m seeing.
>
> I have a JSON Facet string of the following:
> {code:java}
> json.facet={
> "grp_0": {
> "field": "ssnm",
> "limit": -1,
> "type": "terms",
> "mincount": 1,
> "refine": true,
> "sort": {"index": "asc"},
> "facet": {
> "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
> "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
> "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
> "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
> "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
> "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
> }
> }
> }
> {code}
>
> And an example of one of the buckets returned is:
> {code:java}
> "facets":{
> "count":32,
> "grp_0":{
> "buckets":[{
> "val":"Activity",
> "count":6,
> "count_sqrt_TotalCpuUsec":6,
> "sum_sqrt_TotalCpuUsec":495.29246931322893,
> "count_TotalCpuUsec":4,
> "sum_TotalCpuUsec":61464.399999999994,
> "avg_TotalCpuUsec":15366.099999999999,
> "avg_sqrt_TotalCpuUsec":82.54874488553816},
> .
> .
> .
> } ]}}}
> {code}
>
> Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec). My issue is with the calculation of avg(sqrt(TotalCpuUsec)). The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4. However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think it should have been divided by 4, since only 4 documents have a value for this field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.
> This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org