You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/07 21:36:12 UTC

[GitHub] [druid] benkrug opened a new issue #11219: doubleMean ignores useDefaultValueForNull=false, treats nulls as zeros

benkrug opened a new issue #11219:
URL: https://github.com/apache/druid/issues/11219


   
   
   ### Affected Version
   
   Seen in 0.20.0
   
   ### Description
   
   doubleMean ignores the setting useDefaultValueForNull=false, and treats nulls as zeros.  Taking a mean, it possibly ignores the nulls when summing, but apparently divides by a count that includes the rows with nulls.  It should exclude those from the count when dividing.
   
   The attached ingestion spec and data file load 10 rows, with myMetric including values from 1 to 10, excluding 5 and 10 (which are null), so the data ends up looking like this:
   
   myDim myMetric
   1           1
   2           2
   3           3
   4           4
   5           null
   6           6
   7           7
   8           8
   9           9
   10          null
   
   Here's a very basic doubleMean query:
   
   {
     "queryType": "timeseries",
     "dataSource": {
       "type": "table",
       "name": "numbersAndNulls"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "granularity": {
       "type": "all"
     },
     "aggregations": [
       {
         "type": "doubleMean",
         "name": "doubleMean",
         "fieldName": "myMetric",
         "expression": null
       }
     ]
   }
   
   The mean should be (sum myMetric)/(count myMetric) - 40/8, or 5.  However, 4 is returned - 40/10.  So it's treating the "5" and "10" rows as if they included 0's.
   
   If we filter for not null, we get the right result. 
   
   {
     "queryType": "timeseries",
     "dataSource": {
       "type": "table",
       "name": "numbersAndNulls"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "granularity": {
       "type": "all"
     },
       "filter": {
       "type": "not",
       "field": {
         "type": "selector",
         "dimension": "myMetric",
         "value": null,
         "extractionFn": null
       }
     },
     "aggregations": [
       {
         "type": "doubleMean",
         "name": "doubleMean",
         "fieldName": "myMetric",
         "expression": null
       }
     ]
   }
   
   But we shouldn't have to add this filter.
   
   [data.txt](https://github.com/apache/druid/files/6444381/data.txt)
   [ingest.txt](https://github.com/apache/druid/files/6444382/ingest.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] Hikamare commented on issue #11219: doubleMean ignores useDefaultValueForNull=false, treats nulls as zeros

Posted by GitBox <gi...@apache.org>.
Hikamare commented on issue #11219:
URL: https://github.com/apache/druid/issues/11219#issuecomment-844053799


   Having the same issue. Is there an additional step that has to be done for such queries to work or this is a feature that should only be used with filtering?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org