You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/07 21:36:12 UTC
[GitHub] [druid] benkrug opened a new issue #11219: doubleMean ignores useDefaultValueForNull=false, treats nulls as zeros
benkrug opened a new issue #11219:
URL: https://github.com/apache/druid/issues/11219
### Affected Version
Seen in 0.20.0
### Description
doubleMean ignores the setting useDefaultValueForNull=false, and treats nulls as zeros. Taking a mean, it possibly ignores the nulls when summing, but apparently divides by a count that includes the rows with nulls. It should exclude those from the count when dividing.
The attached ingestion spec and data file load 10 rows, with myMetric including values from 1 to 10, excluding 5 and 10 (which are null), so the data ends up looking like this:
myDim myMetric
1 1
2 2
3 3
4 4
5 null
6 6
7 7
8 8
9 9
10 null
Here's a very basic doubleMean query:
{
"queryType": "timeseries",
"dataSource": {
"type": "table",
"name": "numbersAndNulls"
},
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"granularity": {
"type": "all"
},
"aggregations": [
{
"type": "doubleMean",
"name": "doubleMean",
"fieldName": "myMetric",
"expression": null
}
]
}
The mean should be (sum myMetric)/(count myMetric) - 40/8, or 5. However, 4 is returned - 40/10. So it's treating the "5" and "10" rows as if they included 0's.
If we filter for not null, we get the right result.
{
"queryType": "timeseries",
"dataSource": {
"type": "table",
"name": "numbersAndNulls"
},
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"granularity": {
"type": "all"
},
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "myMetric",
"value": null,
"extractionFn": null
}
},
"aggregations": [
{
"type": "doubleMean",
"name": "doubleMean",
"fieldName": "myMetric",
"expression": null
}
]
}
But we shouldn't have to add this filter.
[data.txt](https://github.com/apache/druid/files/6444381/data.txt)
[ingest.txt](https://github.com/apache/druid/files/6444382/ingest.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] Hikamare commented on issue #11219: doubleMean ignores useDefaultValueForNull=false, treats nulls as zeros
Posted by GitBox <gi...@apache.org>.
Hikamare commented on issue #11219:
URL: https://github.com/apache/druid/issues/11219#issuecomment-844053799
Having the same issue. Is there an additional step that has to be done for such queries to work or this is a feature that should only be used with filtering?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org