You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2017/07/01 07:04:00 UTC

[jira] [Commented] (STORM-2426) First tuples fail after worker is respawn

    [ https://issues.apache.org/jira/browse/STORM-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071062#comment-16071062 ] 

Stig Rohde Døssing commented on STORM-2426:
-------------------------------------------

Yes, I believe so. This looks a lot like how the subscribe API behaves when there are multiple KafkaConsumers in a thread, as described here https://issues.apache.org/jira/browse/STORM-2514?focusedCommentId=16014195&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16014195. I'm not really sure why the rebalance takes 3.5 minutes, since I would expect it to take 2.5 (the session timeout). In the linked demo code, the long rebalance happens because rebalancing can't finish until all the KafkaConsumers call (and block in) poll, which can't happen if there are multiple consumers in a thread. The rebalance times out at the session timeout.

Was the number of executors lower than the number of tasks when you had this problem [~EitZei]?

STORM-2542 gets rid of long rebalances, so it should be fixed in any case even if it's not the issue where the task count isn't equal to the executor number.

> First tuples fail after worker is respawn
> -----------------------------------------
>
>                 Key: STORM-2426
>                 URL: https://issues.apache.org/jira/browse/STORM-2426
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka-client
>    Affects Versions: 1.0.2
>            Reporter: Antti Järvinen
>         Attachments: 2017-03-20-Kafka-spout-issue.txt, 2017-03-21-Timeout-ticks.txt
>
>
> Topology with two Kafka spouts (org.apache.storm.kafka.spout.KafkaSpout) reading from two different topics with same consumer group ID. 
> 1. Kill the only worker process for topology
> 2. Storm creates new worker
> 3. Kafka starts rebalancing (log line 15-16)
> 4. Kafka rebalancing done (log line 18-19)
> 5. Kafka topics read and tuples emitted (log line 28-29)
> 6. Tuples immediately fail (log line 30-33)
> The delay between tuples emitted and tuples failing is just some 10 ms. No bolts in topology received the tuples.
> What could cause this? The assumption is that there are uncommitted messages in Spout when it is killed and those are retried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)