You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by GitBox <gi...@apache.org> on 2018/07/09 16:43:41 UTC

[GitHub] sf-mk opened a new issue #5982: "buckets" post aggregation of "approxHistogram" returns exception on Nan histograms

sf-mk opened a new issue #5982: "buckets" post aggregation of "approxHistogram" returns exception on Nan histograms
URL: https://github.com/apache/incubator-druid/issues/5982
 
 
   Following documentation in:
   http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html
   
   I make a request where approxHistogramFold is followed by a buckets post aggregation.  In instances where one of the bins that the approxHistogramFold aggregation returns is a series of NaN values then the buckets post aggregation will return one of two unknown exceptions: "java.lang.ArrayIndexOutOfBoundsException" or "com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY"
   
   This only happens when querying time ranges near the beginning of the start of data.  I believe the NaN return values from the aggregation stage are caused by attempting to aggregate a bucket which something believes has data, but actually does not.
   
   Examples of requests and responses:
   
   Request1 (no post aggregation):
   ```
   {
     "queryType": "timeseries",
     "dataSource": "DS",
     "granularity": "minute",
     "aggregations": [
       {
         "fieldName": "histogram",
         "name": "h",
         "type": "approxHistogramFold"
       }
     ],
     "postAggregations": [],
     "intervals": [
       "2018-07-09T14:26:00.000Z/2018-07-09T15:25:00.00Z"
     ],
     "filter": {
       "type": "and",
       "fields": [
         {
           "dimension": "field1",
           "type": "selector",
           "value": "a"
         }
       ]
     }
   }
   
   ```
   
   Response1:
   ```
   [
     {
       "timestamp": "2018-07-09T15:24:00.000Z", 
       "result": {
         "h": {
           "breaks": [
             "Infinity", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "-Infinity"
           ], 
           "counts": [
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN"
           ]
         }
       }
     }
   ]
   
   ```
   
   Request2:  Identical to Request1, but with:
   ```
     "postAggregations": [
       {
         "type": "buckets",
         "name": "r",
         "fieldName": "h",
         "bucketSize": "10000000"
       }
     ],
   ```
   
   Response2:
   ```
   {
     "error": "Unknown exception",
     "errorMessage": null,
     "errorClass": "java.lang.ArrayIndexOutOfBoundsException",
     "host": "druid-server:8100",
     "message": "Unknown exception"
   }
   ```
   OR (the exception seems to change if the query is made much later)
   ```
   {
     "errorClass": "java.lang.RuntimeException", 
     "host": null, {
     "errorClass": "java.lang.RuntimeException", 
     "host": null, 
     "errorMessage": "com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.SequenceInputStream@2e1f6a8; line: -1, column: -1])\n at [Source: java.io.SequenceInputStream@2e1f6a8; line: -1, column: 58]", 
     "error": "Unknown exception"
   }
   
     "errorMessage": "com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.SequenceInputStream@2e1f6a8; line: -1, column: -1])\n at [Source: java.io.SequenceInputStream@2e1f6a8; line: -1, column: 58]", 
     "error": "Unknown exception"
   }
   ```
   
   Interestingly the equalBuckets post aggregator does not seem to suffer from this problem, it simply returns a histogram of NaNs similar to the original.
   My guess is that the toHistogram(bucketSize, offset) function attempts to do it's calculations under the assumption that the histogram has sensible values in it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org