You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Srinath (JIRA)" <ji...@apache.org> on 2014/05/17 04:11:14 UTC

[jira] [Created] (STORM-323) Unacknowledged __tick and __metrics_tick tuples hangs worker processes

Srinath created STORM-323:
-----------------------------

             Summary: Unacknowledged __tick and __metrics_tick tuples hangs worker processes
                 Key: STORM-323
                 URL: https://issues.apache.org/jira/browse/STORM-323
             Project: Apache Storm (Incubating)
          Issue Type: Bug
    Affects Versions: 0.9.1-incubating
         Environment: Storm:
Nimbus, Supervisor and Zookeeper running on Centos 6.2 over m1.small instances (1.7G mem, 1 CPU, 1 core)
Netty as the transport

Topology:
2 worker processes on the same supervisor instance each allocated 512 Mb of heap
Each of the worker processes have around 30 executors running around 112 tasks.
            Reporter: Srinath
            Priority: Critical


Symptoms observed:
1. One of the bolts not getting executed after about 5 days of run
2. Spout gradually slows down and finally stops calling nextTuple()
3. Topology is non-functional since there is no exchange of tuples across worker processes

Notes from troubleshooting:
1. There is a transfer of data across worker processes but the bolt is not receiving the tuples
2. backtype.storm.messaging.netty.Server#message_queue is not getting consumed.
3. Later on found that there are several __tick and __metrics_tick tuples piling up in memory over a period of time. This piling up is gradual and probably the reason why it takes so long for it to cause any visible problems.

I have shared access to thread dumps and topology layout at https://drive.google.com/folderview?id=0B2F_3UACQZNESXpwZlA4MFlqSVU&usp=drive_web




--
This message was sent by Atlassian JIRA
(v6.2#6252)