You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "James Xu (JIRA)" <ji...@apache.org> on 2013/12/14 08:35:07 UTC

[jira] [Created] (STORM-80) NPE caused by TridentBoltExecutor reusing TrackedBatches between batch groups

James Xu created STORM-80:
-----------------------------

             Summary: NPE caused by TridentBoltExecutor reusing TrackedBatches between batch groups
                 Key: STORM-80
                 URL: https://issues.apache.org/jira/browse/STORM-80
             Project: Apache Storm (Incubating)
          Issue Type: Bug
            Reporter: James Xu


https://github.com/nathanmarz/storm/issues/421

I'm seeing intermittent errors caused by SubtopologyBolt.execute being called with a BatchInfo whose ProcessorContext is set up for a different Batch Group. In particular I'm seeing null pointer exceptions from PartitionPersistProcessor because its state fields were never set up correctly.

The best I can tell the id key (IBatchID) being used for the _batches map in TridentBoltExecutor is not unique between batch groups. As a result the tracked batch will have been initialized for a different Batch Group and set of processors.

I hoped to be able to track down the source of this issue but can't determine where the BatchIDs are being added to the tuples.

If it matters, my topology has two streams each reading from their own OpaqueTransactionalKafka spout w/different topics.

Backtrace:

65108 [Thread-25] ERROR backtype.storm.daemon.executor - 
java.lang.RuntimeException: java.lang.NullPointerException
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.daemon.executor$fn__3551$fn__3563$fn__3610.invoke(executor.clj:712) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) ~[storm-0.9.0-wip4.jar:na]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
        at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09]
Caused by: java.lang.NullPointerException: null
        at storm.trident.planner.processor.PartitionPersistProcessor.execute(PartitionPersistProcessor.java:59) ~[storm-0.9.0-wip4.jar:na]
        at storm.trident.planner.SubtopologyBolt$InitialReceiver.receive(SubtopologyBolt.java:189) ~[storm-0.9.0-wip4.jar:na]
        at storm.trident.planner.SubtopologyBolt.execute(SubtopologyBolt.java:129) ~[storm-0.9.0-wip4.jar:na]
        at storm.trident.topology.TridentBoltExecutor.execute(TridentBoltExecutor.java:352) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.daemon.executor$fn__3551$tuple_action_fn__3553.invoke(executor.clj:607) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.daemon.executor$mk_task_receiver$fn__3474.invoke(executor.clj:379) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.disruptor$clojure_handler$reify__3011.onEvent(disruptor.clj:43) ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84) ~[storm-0.9.0-wip4.jar:na]
        ... 6 common frames omitted

Also, I'm only seeing this in LocalCluster mode, not in production.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)