You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/05 04:03:28 UTC

[jira] [Updated] (STORM-406) Trident topologies getting stuck when using Netty transport (reproducible)

     [ https://issues.apache.org/jira/browse/STORM-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-406:
-------------------------------
    Component/s: storm-core

> Trident topologies getting stuck when using Netty transport (reproducible)
> --------------------------------------------------------------------------
>
>                 Key: STORM-406
>                 URL: https://issues.apache.org/jira/browse/STORM-406
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.9.2-incubating, 0.9.1-incubating, 0.9.0.1
>         Environment: Linux, OpenJDK 7
>            Reporter: Danijel Schiavuzzi
>            Assignee: Kishor Patil
>            Priority: Blocker
>              Labels: b
>             Fix For: 0.9.3
>
>
> When using the new, default Netty transport, Trident topologies sometimes get stuck, while under ZeroMQ everything is working fine.
> I can reliably reproduce this issue by killing a Storm worker on a running Trident topology. If the worker gets re-spawned on the same slot (port), the topology stops processing. But if the worker re-spawns on a different port, topology processing continues normally.
> The Storm cluster configuration is pretty standard, there are two Supervisor nodes, one node has also Nimbus, UI and DRPC running on it. I have four slots per Supervisor, and run my test topology with setNumWorkers set to 8 so that it occupies all eight slots across the cluster. Killing a worker in this configuration will always re-spawn the worker on the same node and slot (port), thus causing the topology to stop processing. This is 100% reproducible on a few Storm clusters of mine, across multiple Storm versions (0.9.0.1, 0.9.1, 0.9.2).
> I have reproduced this with multiple Trident topologies, the simplest of which is the TridentWordCount topology from storm-starter. I've just modified it a little to add an additional Trident filter to log the tuple throughput: https://github.com/dschiavu/storm-trident-stuck-topology
> Non-transactional Trident topologies just silently stop processing, while transactional topologies continuously retry the batches and are re-emitted by the spout, however they never get processed by the next bolts in the chain so they time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)