You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/22 10:15:44 UTC

[GitHub] [druid] dtheodor opened a new issue #9743: Crash when initializing the partial segment merge phase of index_parallel

dtheodor opened a new issue #9743:
URL: https://github.com/apache/druid/issues/9743


   I'm running druid 0.17.0, a parallel re-index from a druid data source with `single_dim` partioning and 12 concurrent tasks. There are two middle managers running, each in a node of 8 cores and 32GB ram. The middle managers are configured with 8 worker capacity each, so when the job is running they are at 6/8, 7/8 capacity. I reduced the defaults of `maxRowsInMemory` to 100000 and `maxRowsPerSegment` to 4000000 to avoid a couple<sup>[1](#crash1)</sup> other<sup>[2](#crash2)</sup> crashes
   
   `partial dimension distribution` and `partial segment generation` finish successfully, but then initializing the `partial segment merge` crashes with this
   
   ```
   2020-04-21T23:26:36,887 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask - Number of merge tasks is set to [10] based on totalNumMergeTasks[10] and number of partitions[26]
   2020-04-21T23:26:36,889 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[AbstractTask{id='index_parallel_events13_klmjpjll_2020-04-21T19:52:19.481Z', groupId='index_parallel_events13_klmjpjll_2020-04-21T19:52:19.481Z', taskResource=TaskResource{availabilityGroup='index_parallel_events13_klmjpjll_2020-04-21T19:52:19.481Z', requiredCapacity=1}, dataSource='events13', context={forceTimeChunkLock=true}}]
   java.lang.IndexOutOfBoundsException: toIndex = 27
   	at java.util.ArrayList.subListRangeCheck(ArrayList.java:1012) ~[?:1.8.0_242]
   	at java.util.ArrayList.subList(ArrayList.java:1004) ~[?:1.8.0_242]
   	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.createMergeIOConfigs(ParallelIndexSupervisorTask.java:768) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.createGenericMergeIOConfigs(ParallelIndexSupervisorTask.java:738) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runRangePartitionMultiPhaseParallel(ParallelIndexSupervisorTask.java:614) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runMultiPhaseParallel(ParallelIndexSupervisorTask.java:540) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:448) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
   2020-04-21T23:26:36,901 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
     "id" : "index_parallel_events13_klmjpjll_2020-04-21T19:52:19.481Z",
     "status" : "FAILED",
     "duration" : 12851572,
     "errorMsg" : "java.lang.IndexOutOfBoundsException: toIndex = 27",
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   ```
   
   This is my spec in case it helps
   
   ```json
   {
     "type": "index_parallel",
     "spec": {
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "events12",
           "interval": "2018-09-22/2018-10-01"
         }
       },
       "tuningConfig": {
         "type": "index_parallel",
         "indexSpec": {
           "bitmap": {
             "type": "roaring"
           }
         },
         "maxRowsInMemory": 100000,
         "maxNumConcurrentSubTasks": 12,
         "partitionsSpec": {
           "type": "single_dim",
           "partitionDimension": "event_type",
           "maxRowsPerSegment": 4000000
         },
         "forceGuaranteedRollup": true
       },
       "dataSchema": {
         "dataSource": "events13",
         "granularitySpec": {
           "type": "uniform",
           "queryGranularity": "NONE",
           "segmentGranularity": "DAY",
           "rollup": false,
           "intervals": ["2018-09-22/2018-10-01"]
         }
       }
     }
   }
   ```
   
   #### Other crashes
   <a name="crash1">1</a>: I'm getting `java.lang.OutOfMemoryError: Java heap space` in `partial segment generation` tasks. My dataset consists of about 40 string dimensions and the Kafka ingestion tasks for the same dataset work fine with the default 1M `maxRowsInMemory`. I have to reduce it down to 100K to avoid heap errors in parallel ingest.
   
   <a name="crash2">2</a>: Apparently there's a hardcoded limit of 2GB to "file smooshing" which is what happens to files in the merge phase. Which means it's impossible for segments to be that large. However the limit of 5000000  `maxRowsPerSegment` should make it impossible to reach these sizes. I know for my dataset, 5m rows is about 700MB. So I am clueless as to how these 2GB are accumulated
   
   ```
   2020-04-21T16:06:04,658 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[AbstractTask{id='partial_index_generic_merge_events13_fggkflak_2020-04-21T14:44:12.773Z', groupId='index_parallel_events13_papogljl_2020-04-21T09:28:40.014Z', taskResource=TaskResource{availabilityGroup='partial_index_generic_merge_events13_fggkflak_2020-04-21T14:44:12.773Z', requiredCapacity=1}, dataSource='events13', context={forceTimeChunkLock=true}}]
   org.apache.druid.java.util.common.IAE: Asked to add buffers[2,982,201,910] larger than configured max[2,147,483,647]
   	at org.apache.druid.java.util.common.io.smoosh.FileSmoosher.addWithSmooshedWriter(FileSmoosher.java:160) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.IndexMergerV9.makeColumn(IndexMergerV9.java:454) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:223) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:915) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:833) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:811) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.mergeSegmentsInSamePartition(PartialSegmentMergeTask.java:373) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.mergeAndPushSegments(PartialSegmentMergeTask.java:302) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:198) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:44) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) ~[druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
   2020-04-21T16:06:04,667 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
     "id" : "partial_index_generic_merge_events13_fggkflak_2020-04-21T14:44:12.773Z",
     "status" : "FAILED",
     "duration" : 4899796,
     "errorMsg" : "org.apache.druid.java.util.common.IAE: Asked to add buffers[2,982,201,910] larger than configured ma...",
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] quangvn1 commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
quangvn1 commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-736418845


   > Thanks. I'm avoiding this bug for now by targeting a single day interval in every task. I'm stuck with `org.apache.druid.java.util.common.IAE: Asked to add buffers[2,891,159,858] larger than configured max[2,147,483,647]`. I've lowered `maxRowsPerSegment` to only 500k to see if I can get past it
   
    I'm stuck with org.apache.druid.java.util.common.IAE: Asked to add buffers[2,891,159,858] larger than configured max[2,147,483,647], too. Could you help me to resolve problem, please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dtheodor commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
dtheodor commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-619258321


   Closing this, the original issue is fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] quangvn1 commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
quangvn1 commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-736668854


   @dtheodor please help me!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dtheodor commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
dtheodor commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-617954300


   Thanks. I'm avoiding this bug for now by targeting a single day interval in every task. I'm stuck with `org.apache.druid.java.util.common.IAE: Asked to add buffers[2,891,159,858] larger than configured max[2,147,483,647]`. I've lowered `maxRowsPerSegment` to only 500k to see if I can get past it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
ccaominh commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-617915452


   The `IndexOutOfBounds` exception during the `partial segment merge` phase of native batch ingestion is fixed by https://github.com/apache/druid/pull/9448 and is present in 0.18.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bamcdx commented on issue #9743: Crash when initializing the partial segment merge phase of index_parallel

Posted by GitBox <gi...@apache.org>.
bamcdx commented on issue #9743:
URL: https://github.com/apache/druid/issues/9743#issuecomment-1019289331


   i also has the problem,
   org.apache.druid.java.util.common.IAE: Asked to add buffers[2,891,159,858] larger than configured max[2,147,483,647]
   
   how do you reslove the problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org