You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/11/21 18:40:12 UTC

[GitHub] [incubator-druid] sascha-coenen opened a new issue #8922: ArrayIndexOutOfBoundsException on historical in class GroupByMergingQueryRunnerV2 when grouping on high-cardinality dimension

sascha-coenen opened a new issue #8922: ArrayIndexOutOfBoundsException on historical in class GroupByMergingQueryRunnerV2 when grouping on high-cardinality dimension
URL: https://github.com/apache/incubator-druid/issues/8922
 
 
   
   
   ### Affected Version
   Druid 0.16.0
   
   ### Description
   
   An ArrayIndexOutOfBoundsException is thrown by a historical when grouping on a high-cardinality dimension. 
   
   This happens reproducibly on a test cluster with two historicals (r4.4xl) and a dataset that was ingested with the new native index_parallel job using the druid indexer, so not the middlemanager but the new indexer process. 
   The dataset is 12 GB in size as displayed in Druid's legacy coordinator console and consists of 500 shards, segment size is around 25MB with 100k records per segment and rollup was disabled during the ingestion. (83 dimensions, 1 metric)
   
   The data constitutes a single hour which has no query granularity and hourly segment granularity.
   The data set was just for testing, so it is not a battle-hardened data model. The data is in so far battle tested as that is is authentic production data which we ingest with a hadoop indexer into our production cluster. In this case the data is rolled up. In contrast, I took this data set and ingested it without rollup using the new experimental ingestion pipeline that was introduced in Druid 0.16.  
   
   I was executing the following query
   
   `
   SELECT
     "deviceId",
     COUNT(*) AS "Count"
   FROM "hackathon"
   WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' YEAR
   GROUP BY 1
   ORDER BY "Count" DESC
   LIMIT 100
   `
   
   This always raises the following exception within one of the historicals:
   
   `
        [java] 2019-11-21T18:17:57,132 ERROR [processing-3] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences!
        [java] java.lang.ArrayIndexOutOfBoundsException
        [java] 2019-11-21T18:17:57,132 ERROR [processing-3] com.google.common.util.concurrent.Futures$CombinedFuture - input future failed.
        [java] java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
        [java] 	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:253) ~[druid-processing-0.16.0-incubating.jar:0.16.0-incubating]
        [java] 	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:233) ~[druid-processing-0.16.0-incubating.jar:0.16.0-incubating]
        [java] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
        [java] 	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.16.0-incubating.jar:0.16.0-incubating]
        [java] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
        [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
        [java] 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
        [java] Caused by: java.lang.ArrayIndexOutOfBoundsException
   `
   
   
   If I instead group on other dimensions then I do not receive any exeption. So this specifically happens with a dimension that has a high cardinality because it contains device IDs.
   However, I tried to group on other high-cardinality dimensions like a session ID that contains a GUUID and this only resulted in a ResourceLimitExceeded exception which is fine.
   
   I couldn't provoke another ArrayIndexOutOfBoundsException so far with any of the other columns.
   
   I then relaxed the above query by removing the ORDER BY clause and then a resultset was returned to me.
   I was also able to retain the ORDER BY clause by adding a filter condition on "deviceId IS NOT NULL" which did not raise an ArrayIndexOutOfBounds exception anymore but ran into the ResourceLimitExceededException.
   
   In summary, it looks to me as if the error might have to do with high cardinality dimensions that can contain null entries.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org