You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/22 09:45:37 UTC

[GitHub] [druid] saifat29 commented on issue #9842: Intermittent NullPointerExceptions causing Task Failure

saifat29 commented on issue #9842:
URL: https://github.com/apache/druid/issues/9842#issuecomment-678620228


   I too faced the same issue on Druid v0.17.0 -
   Data is ingested into Druid through Kafka realtime ingestion with rollups enabled.
   Suddenly the task failed and segments got lost.
   
   From the below logs it is evident that there was a null exception in the group by query.
   
   Attaching the surrounding relevant logs-
   ```
   2020-08-21T03:14:31,451 INFO [[index_kafka_aggregations_95b07d728be26d3_iacokiha]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Saved sequence metadata to disk: [SequenceMetadata{sequenceId=2, sequenceName='index_kafka_aggregations_95b07d728be26d3_2', assignments=[], startOffsets={0=2073149116, 1=2073148653}, exclusiveStartPartitions=[], endOffsets={0=2077740467, 1=2077737653}, sentinel=false, checkpointed=true}]
   2020-08-21T03:14:31,451 INFO [[index_kafka_aggregations_95b07d728be26d3_iacokiha]-publish] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Adding SegmentHandoffCallback for dataSource[aggregations] Segment[SegmentDescriptor{interval=2020-08-21T02:00:00.000Z/2020-08-21T03:00:00.000Z, version='2020-08-21T02:00:10.526Z', partitionNumber=1}]
   2020-08-21T03:15:00,801 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2020-08-21T02:00:00.000Z/2020-08-21T03:00:00.000Z, version='2020-08-21T02:00:10.526Z', partitionNumber=1}]]
   2020-08-21T03:16:00,784 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Segment Handoff complete for dataSource[aggregations] Segment[SegmentDescriptor{interval=2020-08-21T02:00:00.000Z/2020-08-21T03:00:00.000Z, version='2020-08-21T02:00:10.526Z', partitionNumber=1}]
   2020-08-21T03:16:00,787 INFO [[index_kafka_aggregations_95b07d728be26d3_iacokiha]-appenderator-persist] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[aggregations_2020-08-21T02:00:00.000Z_2020-08-21T03:00:00.000Z_2020-08-21T02:00:10.526Z_1] at path[/druid/segments/middlemanager:8102/middlemanager:8102_indexer-executor__default_tier_2020-08-21T01:46:10.992Z_72a1a95bc07543efaa1fe266685d72e40]
   2020-08-21T03:16:00,832 ERROR [processing-0] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2 - Exception with one of the sequences!
   java.lang.NullPointerException: null
   	at org.apache.druid.segment.realtime.FireHydrant.getIncrementedSegment(FireHydrant.java:78) ~[druid-server-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.realtime.FireHydrant.getAndIncrementSegment(FireHydrant.java:138) ~[druid-server-0.17.0.jar:0.17.0]
   	at org.apache.druid.segment.realtime.appenderator.SinkQuerySegmentWalker.lambda$null$0(SinkQuerySegmentWalker.java:212) ~[druid-server-0.17.0.jar:0.17.0]
   	at com.google.common.collect.Iterators$8.transform(Iterators.java:794) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) ~[guava-16.0.1.jar:?]
   	at org.apache.druid.query.SinkQueryRunners$1.next(SinkQueryRunners.java:56) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.SinkQueryRunners$1.next(SinkQueryRunners.java:46) ~[druid-processing-0.17.0.jar:0.17.0]
   	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:646) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.Iterators.addAll(Iterators.java:356) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.Lists.newArrayList(Lists.java:147) ~[guava-16.0.1.jar:?]
   	at com.google.common.collect.Lists.newArrayList(Lists.java:129) ~[guava-16.0.1.jar:?]
   	at org.apache.druid.query.ChainedExecutionQueryRunner$1.make(ChainedExecutionQueryRunner.java:104) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:39) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$1.accumulate(SpecificSegmentQueryRunner.java:79) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:163) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.spec.SpecificSegmentQueryRunner.access$100(SpecificSegmentQueryRunner.java:42) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.spec.SpecificSegmentQueryRunner$2.wrap(SpecificSegmentQueryRunner.java:145) ~[druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[druid-core-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:246) [druid-processing-0.17.0.jar:0.17.0]
   	at org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunnerV2$1$1$1.call(GroupByMergingQueryRunnerV2.java:233) [druid-processing-0.17.0.jar:0.17.0]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232]
   	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:247) [druid-processing-0.17.0.jar:0.17.0]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
   2020-08-21T03:16:00,873 ERROR [qtp1136890657-129[groupBy_[aggregations]_17bc90cf-a1d9-4380-bcc5-a1b565b5c91a]] org.apache.druid.server.QueryResource - Exception handling request: {class=org.apache.druid.server.QueryResource, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException, query={"queryType":"groupBy","dataSource":{"type":"table","name":"aggregations"},"intervals":{"type":"segments","segments":[{"itvl":"2020-08-21T02:00:00.000Z/2020-08-21T03:00:00.000Z","ver":"2020-08-21T02:00:10.526Z","part":1},{"itvl":"2020-08-21T02:00:00.000Z/2020-08-21T03:00:00.000Z","ver":"2020-08-21T02:00:10.526Z","part":2}]},"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"d_id","outputName":"d0","outputType":"LONG"}],"aggregations":[{"type":"doubleSum","name":"a0","fieldName":"s_dr","expression":null}],"postAggregations":[],"having":
 null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"applyLimitPushDown":false,"defaultTimeout":300000,"finalize":false,"fudgeTimestamp":"1597968000000","groupByOutermost":false,"groupByStrategy":"v2","maxQueuedBytes":3333333,"maxScatterGatherBytes":9223372036854775807,"queryFailTime":1597980040147,"queryId":"17bc90cf-a1d9-4380-bcc5-a1b565b5c91a","resultAsArray":true,"sqlQueryId":"a371d117-bffa-4a5f-9115-d8812eb489b0","timeout":299998},"descending":false}, peer=10.0.7.4} (java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException)
   2020-08-21T03:16:01,000 INFO [[index_kafka_aggregations_95b07d728be26d3_iacokiha]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Dropped segment[aggregations_2020-08-21T02:00:00.000Z_2020-08-21T03:00:00.000Z_2020-08-21T02:00:10.526Z_1].
   2020-08-21T03:16:16,707 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
   2020-08-21T03:16:16,708 INFO [Thread-70] org.apache.druid.cli.CliPeon - Running shutdown hook
   2020-08-21T03:16:16,708 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
   2020-08-21T03:16:16,709 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/middlemanager:8102]
   2020-08-21T03:16:16,722 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/listeners/lookups/__default/http:middlemanager:8102]
   2020-08-21T03:16:16,723 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/segments/middlemanager:8102/middlemanager:8102_indexer-executor__default_tier_2020-08-21T01:46:10.992Z_72a1a95bc07543efaa1fe266685d72e40]
   2020-08-21T03:16:16,724 INFO [Thread-70] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/internal-discovery/PEON/middlemanager:8102]
   2020-08-21T03:16:16,725 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
   2020-08-21T03:16:16,733 INFO [Thread-70] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@b2da3a5{HTTP/1.1,[http/1.1]}{0.0.0.0:8102}
   2020-08-21T03:16:16,734 INFO [Thread-70] org.eclipse.jetty.server.session - node0 Stopped scavenging
   2020-08-21T03:16:16,736 INFO [Thread-70] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@1b485c8b{/,null,UNAVAILABLE}
   2020-08-21T03:16:16,740 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
   2020-08-21T03:16:16,741 INFO [Thread-70] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:middlemanager:8102]
   2020-08-21T03:16:16,743 INFO [Thread-70] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_aggregations_95b07d728be26d3_iacokiha].
   2020-08-21T03:16:16,743 INFO [Thread-70] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
   2020-08-21T03:16:16,750 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Error while publishing segments for sequenceNumber[SequenceMetadata{sequenceId=2, sequenceName='index_kafka_aggregations_95b07d728be26d3_2', assignments=[], startOffsets={0=2073149116, 1=2073148653}, exclusiveStartPartitions=[], endOffsets={0=2077740467, 1=2077737653}, sentinel=false, checkpointed=true}]
   java.util.concurrent.CancellationException: Task was cancelled.
   	at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:826) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.Futures$CombinedFuture$1.run(Futures.java:1505) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134) [guava-16.0.1.jar:?]
   	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:891) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:278) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:164) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
   Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
   	at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355) ~[guava-16.0.1.jar:?]
   	at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131) ~[guava-16.0.1.jar:?]
   	... 15 more
   2020-08-21T03:16:16,752 INFO [Thread-70] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
   2020-08-21T03:16:16,752 INFO [Thread-70] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
   2020-08-21T03:16:16,752 INFO [Thread-70] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopping.
   2020-08-21T03:16:16,752 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[aggregations_2020-08-21T02:00:00.000Z_2020-08-21T03:00:00.000Z_2020-08-21T02:00:10.526Z_2] at path[/druid/segments/middlemanager:8102/middlemanager:8102_indexer-executor__default_tier_2020-08-21T01:46:10.992Z_72a1a95bc07543efaa1fe266685d72e40]
   2020-08-21T03:16:16,752 INFO [Thread-70] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopped.
   2020-08-21T03:16:16,752 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
   2020-08-21T03:16:16,753 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self [{"druidNode":{"service":"druid/middleManager","host":"middlemanager","bindOnHost":false,"plaintextPort":8102,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
   2020-08-21T03:16:16,771 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
     "id" : "index_kafka_aggregations_95b07d728be26d3_iacokiha",
     "status" : "SUCCESS",
     "duration" : 5416234,
     "errorMsg" : null,
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   2020-08-21T03:16:16,774 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[Thread-70,5,main]] is non daemon.
   2020-08-21T03:16:16,774 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping
   Cannot remove shutdown hook, already shutting down!
   Finished peon task
   2020-08-21T03:16:16,801 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
   2020-08-21T03:16:16,804 INFO [Thread-70] org.apache.zookeeper.ZooKeeper - Session: 0x1013a00d03800a8 closed
   2020-08-21T03:16:16,804 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1013a00d03800a8
   2020-08-21T03:16:16,804 INFO [Thread-70] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
   ```
   
   This issue is very critical as ingestion tasks have become gamble.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org