You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Michael Chang <mi...@tellapart.com> on 2014/04/15 10:41:09 UTC

Storm Topology Halts

user@storm.incubator.apache.orgHey all,

Issue:

We are having issues with stuck topologies.  When submitted and started,
our topology will start processing for a while, then completely halt for
around topology.max.spout.pending seconds, after which it seems that all
the in-flight tuples are failed.  This cycle will loop continuously.  Has
anybody seen this issue / have suggestions about how to debug?

Environment:

We are running a storm cluster in AWS, non-vpc.  We're running 0.9.1 but
using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
originally trying this with the regular netty transport, and reverting back
to the zmq transport seemed to help at first, but now we're seeing the same
behavior as well, so it seems like a deeper rooted problem than just the
transport.

Any help would be appreciated.

Thanks,

Michael

Re: 答复: Storm Topology Halts

Posted by Michael Chang <mi...@tellapart.com>.

Hey Oliver,

We're not seeing any errors in our logs after switching to the zmq
transport.  The reason we tried the zmq transport was several users on the
mailing list seemed to have had some success with switching to it after
seeing intermittent issues with netty.  We were seeing timeouts between
workers communication.  The reconnect errors looked like the ones
referenced here: http://pastebin.com/XXZBsEj1 (from
http://mail-archives.apache.org/mod_mbox/incubator-storm-user/201403.mbox/%3CCA%2BB%2BUZujQ7m_Y%2BZOv61hc7FK4EL3%3D%2B_6bpnnOK7quh-9ksN6tg%40mail.gmail.com%3E
)


On Tue, Apr 15, 2014 at 6:52 AM, Oliver Hall <ol...@metabroadcast.com> wrote:

> Hi Michael,
>
> We've been seeing a similar issue, after upgrading from Storm 0.8.2 to
> 0.9.0.1. When we start our topology, it times out every batch of
> events. We're using Trident, with nodes set up on AWS (see
>
> http://mail-archives.apache.org/mod_mbox/incubator-storm-user/201404.mbox/%3CCAMij6%3Dc1drvX7QR6gk7JYZG9gk0%3DHbS0fKcTOEdKpvr%2BTqcSyg%40mail.gmail.com%3E
> for our mail to the list a few days back). We're not getting any
> errors in the logs at all, do you get any?
>
>
> Oli Hall
>
> On 15 April 2014 10:37, 朱春来 <zh...@jd.com> wrote:
> > Hi Michael Chang,
> >
> >
> >
> >          Did you ack or fail tuple in the bolt timely and please check
> the
> > bolt processing speed of a tuple.
> >
> >
> >
> >
> >
> >
> >
> > 发件人: Michael Chang [mailto:mike@tellapart.com]
> > 发送时间: 2014年4月15日 16:41
> > 收件人: user@storm.incubator.apache.org
> > 主题: Storm Topology Halts
> >
> >
> >
> > user@storm.incubator.apache.orgHey all,
> >
> >
> >
> > Issue:
> >
> >
> >
> > We are having issues with stuck topologies.  When submitted and started,
> our
> > topology will start processing for a while, then completely halt for
> around
> > topology.max.spout.pending seconds, after which it seems that all the
> > in-flight tuples are failed.  This cycle will loop continuously.  Has
> > anybody seen this issue / have suggestions about how to debug?
> >
> >
> >
> > Environment:
> >
> >
> >
> > We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
> > using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
> originally
> > trying this with the regular netty transport, and reverting back to the
> zmq
> > transport seemed to help at first, but now we’re seeing the same
> behavior as
> > well, so it seems like a deeper rooted problem than just the transport.
> >
> >
> >
> > Any help would be appreciated.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Michael
>

Re: 答复: Storm Topology Halts

Posted by Oliver Hall <ol...@metabroadcast.com>.

Hi Michael,

We've been seeing a similar issue, after upgrading from Storm 0.8.2 to
0.9.0.1. When we start our topology, it times out every batch of
events. We're using Trident, with nodes set up on AWS (see
http://mail-archives.apache.org/mod_mbox/incubator-storm-user/201404.mbox/%3CCAMij6%3Dc1drvX7QR6gk7JYZG9gk0%3DHbS0fKcTOEdKpvr%2BTqcSyg%40mail.gmail.com%3E
for our mail to the list a few days back). We're not getting any
errors in the logs at all, do you get any?


Oli Hall

On 15 April 2014 10:37, 朱春来 <zh...@jd.com> wrote:
> Hi Michael Chang,
>
>
>
>          Did you ack or fail tuple in the bolt timely and please check the
> bolt processing speed of a tuple.
>
>
>
>
>
>
>
> 发件人: Michael Chang [mailto:mike@tellapart.com]
> 发送时间: 2014年4月15日 16:41
> 收件人: user@storm.incubator.apache.org
> 主题: Storm Topology Halts
>
>
>
> user@storm.incubator.apache.orgHey all,
>
>
>
> Issue:
>
>
>
> We are having issues with stuck topologies.  When submitted and started, our
> topology will start processing for a while, then completely halt for around
> topology.max.spout.pending seconds, after which it seems that all the
> in-flight tuples are failed.  This cycle will loop continuously.  Has
> anybody seen this issue / have suggestions about how to debug?
>
>
>
> Environment:
>
>
>
> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
> using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were originally
> trying this with the regular netty transport, and reverting back to the zmq
> transport seemed to help at first, but now we’re seeing the same behavior as
> well, so it seems like a deeper rooted problem than just the transport.
>
>
>
> Any help would be appreciated.
>
>
>
> Thanks,
>
>
>
> Michael

Re: 答复: Storm Topology Halts

Posted by Srinath C <sr...@gmail.com>.

My guess is that the issue could be in marking a tuple tree as complete. If
that does not happen within "topology.message.timeout.secs", that tuple
will be cleared from the acker's rotating map and marked as failure. The
bigger the tuple tree the more is the time the tuple has to wait to be
marked complete - until all bolts ack the tuple. By simplifying my
topology, my tuple tree is very much simplified (S->B1). As soon as B1 acks
the tuple, it is marked as complete.



On Wed, Apr 16, 2014 at 11:02 PM, Michael Chang <mi...@tellapart.com> wrote:

> Hi Srinath,
>
> Thanks for the update.  I'm not quite sure why your solution would remedy
> the solution (seems like there are more tuples in flight in the system),
> but it's great that you could provide a working setup.
>
> Michael
>
>
> On Tue, Apr 15, 2014 at 11:33 PM, Srinath C <sr...@gmail.com> wrote:
>
>> Hi Michael,
>>     I experimented a bit by making changes to my topology and now I'm
>> seeing consistent acking with little failures on the spout.
>>
>>     My topology had a spout S emitting tuples and two BaseRichBolts B1
>> (store tuple) and B2 (aggregate tuple) were receiving tuples from the
>> default stream of the spout. I made S emit each tuple twice on different
>> streams - one with a message Id for reliable delivery streamed out to B1
>> and another stream without a message Id streamed out to B2.
>>
>>    With this change there is a significant amount of improvement to the
>> number of failed tuples. Its almost down to 1-2% of the total now. Even the
>> failures occurred only at peak rates of tuple rate.
>>
>>    I'd like to try out more experiments to figure what was wrong with my
>> earlier topology, but I'm time constrained right now.
>>    Hope it helps and let me know if you figure out anything..
>>
>> Regards,
>> Srinath.
>>
>>
>>
>> On Tue, Apr 15, 2014 at 9:33 PM, Michael Chang <mi...@tellapart.com>wrote:
>>
>>> Hey Srinath,
>>>
>>> Yep, our ackers don't seem overloaded at all, and the behavior you are
>>> seeing sounds exactly like what we are seeing here.
>>>
>>>
>>> On Tue, Apr 15, 2014 at 6:47 AM, Srinath C <sr...@gmail.com> wrote:
>>>
>>>> I have been seeing this behaviour on 0.9.0.1 running on (aws &
>>>> non-vpc). All tuples get a fail() on the spout and I'm not sure why. Even a
>>>> simple case of spoutA -> boltB is showing up this behaviour after a
>>>> continuous flow of tuples.
>>>>
>>>> So far increasing ACKer count hasn't helped. All I could figure out was
>>>> the fail() is called from  backtype.storm.utils.RotatingMap#rotate
>>>> which I believe means that the topology.max.spout.pending time has
>>>> exceeded and the tuple is not yet marked as completed. I'm pretty sure
>>>> there are no exceptions in handling the tuples.
>>>>
>>>> Will update if I find any insights.
>>>>
>>>>
>>>>
>>>> On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <zh...@jd.com> wrote:
>>>>
>>>>> Hi Michael Chang,
>>>>>
>>>>>
>>>>>
>>>>>          Did you ack or fail tuple in the bolt timely and please check
>>>>> the bolt processing speed of a tuple.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *发件人:* Michael Chang [mailto:mike@tellapart.com]
>>>>> *发送时间:* 2014年4月15日 16:41
>>>>> *收件人:* user@storm.incubator.apache.org
>>>>> *主题:* Storm Topology Halts
>>>>>
>>>>>
>>>>>
>>>>> user@storm.incubator.apache.orgHey all,
>>>>>
>>>>>
>>>>>
>>>>> Issue:
>>>>>
>>>>>
>>>>>
>>>>> We are having issues with stuck topologies.  When submitted and
>>>>> started, our topology will start processing for a while, then completely
>>>>> halt for around topology.max.spout.pending seconds, after which it seems
>>>>> that all the in-flight tuples are failed.  This cycle will loop
>>>>> continuously.  Has anybody seen this issue / have suggestions about how to
>>>>> debug?
>>>>>
>>>>>
>>>>>
>>>>> Environment:
>>>>>
>>>>>
>>>>>
>>>>> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1
>>>>> but using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
>>>>> originally trying this with the regular netty transport, and reverting back
>>>>> to the zmq transport seemed to help at first, but now we’re seeing the same
>>>>> behavior as well, so it seems like a deeper rooted problem than just the
>>>>> transport.
>>>>>
>>>>>
>>>>>
>>>>> Any help would be appreciated.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Michael
>>>>>
>>>>
>>>>
>>>
>>
>

Re: 答复: Storm Topology Halts

Posted by Michael Chang <mi...@tellapart.com>.

Hi Srinath,

Thanks for the update.  I'm not quite sure why your solution would remedy
the solution (seems like there are more tuples in flight in the system),
but it's great that you could provide a working setup.

Michael


On Tue, Apr 15, 2014 at 11:33 PM, Srinath C <sr...@gmail.com> wrote:

> Hi Michael,
>     I experimented a bit by making changes to my topology and now I'm
> seeing consistent acking with little failures on the spout.
>
>     My topology had a spout S emitting tuples and two BaseRichBolts B1
> (store tuple) and B2 (aggregate tuple) were receiving tuples from the
> default stream of the spout. I made S emit each tuple twice on different
> streams - one with a message Id for reliable delivery streamed out to B1
> and another stream without a message Id streamed out to B2.
>
>    With this change there is a significant amount of improvement to the
> number of failed tuples. Its almost down to 1-2% of the total now. Even the
> failures occurred only at peak rates of tuple rate.
>
>    I'd like to try out more experiments to figure what was wrong with my
> earlier topology, but I'm time constrained right now.
>    Hope it helps and let me know if you figure out anything..
>
> Regards,
> Srinath.
>
>
>
> On Tue, Apr 15, 2014 at 9:33 PM, Michael Chang <mi...@tellapart.com> wrote:
>
>> Hey Srinath,
>>
>> Yep, our ackers don't seem overloaded at all, and the behavior you are
>> seeing sounds exactly like what we are seeing here.
>>
>>
>> On Tue, Apr 15, 2014 at 6:47 AM, Srinath C <sr...@gmail.com> wrote:
>>
>>> I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc).
>>> All tuples get a fail() on the spout and I'm not sure why. Even a simple
>>> case of spoutA -> boltB is showing up this behaviour after a continuous
>>> flow of tuples.
>>>
>>> So far increasing ACKer count hasn't helped. All I could figure out was
>>> the fail() is called from  backtype.storm.utils.RotatingMap#rotate
>>> which I believe means that the topology.max.spout.pending time has
>>> exceeded and the tuple is not yet marked as completed. I'm pretty sure
>>> there are no exceptions in handling the tuples.
>>>
>>> Will update if I find any insights.
>>>
>>>
>>>
>>> On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <zh...@jd.com> wrote:
>>>
>>>> Hi Michael Chang,
>>>>
>>>>
>>>>
>>>>          Did you ack or fail tuple in the bolt timely and please check
>>>> the bolt processing speed of a tuple.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *发件人:* Michael Chang [mailto:mike@tellapart.com]
>>>> *发送时间:* 2014年4月15日 16:41
>>>> *收件人:* user@storm.incubator.apache.org
>>>> *主题:* Storm Topology Halts
>>>>
>>>>
>>>>
>>>> user@storm.incubator.apache.orgHey all,
>>>>
>>>>
>>>>
>>>> Issue:
>>>>
>>>>
>>>>
>>>> We are having issues with stuck topologies.  When submitted and
>>>> started, our topology will start processing for a while, then completely
>>>> halt for around topology.max.spout.pending seconds, after which it seems
>>>> that all the in-flight tuples are failed.  This cycle will loop
>>>> continuously.  Has anybody seen this issue / have suggestions about how to
>>>> debug?
>>>>
>>>>
>>>>
>>>> Environment:
>>>>
>>>>
>>>>
>>>> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1
>>>> but using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
>>>> originally trying this with the regular netty transport, and reverting back
>>>> to the zmq transport seemed to help at first, but now we’re seeing the same
>>>> behavior as well, so it seems like a deeper rooted problem than just the
>>>> transport.
>>>>
>>>>
>>>>
>>>> Any help would be appreciated.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Michael
>>>>
>>>
>>>
>>
>

Re: 答复: Storm Topology Halts

Posted by Srinath C <sr...@gmail.com>.

Hi Michael,
    I experimented a bit by making changes to my topology and now I'm
seeing consistent acking with little failures on the spout.

    My topology had a spout S emitting tuples and two BaseRichBolts B1
(store tuple) and B2 (aggregate tuple) were receiving tuples from the
default stream of the spout. I made S emit each tuple twice on different
streams - one with a message Id for reliable delivery streamed out to B1
and another stream without a message Id streamed out to B2.

   With this change there is a significant amount of improvement to the
number of failed tuples. Its almost down to 1-2% of the total now. Even the
failures occurred only at peak rates of tuple rate.

   I'd like to try out more experiments to figure what was wrong with my
earlier topology, but I'm time constrained right now.
   Hope it helps and let me know if you figure out anything..

Regards,
Srinath.



On Tue, Apr 15, 2014 at 9:33 PM, Michael Chang <mi...@tellapart.com> wrote:

> Hey Srinath,
>
> Yep, our ackers don't seem overloaded at all, and the behavior you are
> seeing sounds exactly like what we are seeing here.
>
>
> On Tue, Apr 15, 2014 at 6:47 AM, Srinath C <sr...@gmail.com> wrote:
>
>> I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc).
>> All tuples get a fail() on the spout and I'm not sure why. Even a simple
>> case of spoutA -> boltB is showing up this behaviour after a continuous
>> flow of tuples.
>>
>> So far increasing ACKer count hasn't helped. All I could figure out was
>> the fail() is called from  backtype.storm.utils.RotatingMap#rotate which
>> I believe means that the topology.max.spout.pending time has exceeded
>> and the tuple is not yet marked as completed. I'm pretty sure there are no
>> exceptions in handling the tuples.
>>
>> Will update if I find any insights.
>>
>>
>>
>> On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <zh...@jd.com> wrote:
>>
>>> Hi Michael Chang,
>>>
>>>
>>>
>>>          Did you ack or fail tuple in the bolt timely and please check
>>> the bolt processing speed of a tuple.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *发件人:* Michael Chang [mailto:mike@tellapart.com]
>>> *发送时间:* 2014年4月15日 16:41
>>> *收件人:* user@storm.incubator.apache.org
>>> *主题:* Storm Topology Halts
>>>
>>>
>>>
>>> user@storm.incubator.apache.orgHey all,
>>>
>>>
>>>
>>> Issue:
>>>
>>>
>>>
>>> We are having issues with stuck topologies.  When submitted and started,
>>> our topology will start processing for a while, then completely halt for
>>> around topology.max.spout.pending seconds, after which it seems that all
>>> the in-flight tuples are failed.  This cycle will loop continuously.  Has
>>> anybody seen this issue / have suggestions about how to debug?
>>>
>>>
>>>
>>> Environment:
>>>
>>>
>>>
>>> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
>>> using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
>>> originally trying this with the regular netty transport, and reverting back
>>> to the zmq transport seemed to help at first, but now we’re seeing the same
>>> behavior as well, so it seems like a deeper rooted problem than just the
>>> transport.
>>>
>>>
>>>
>>> Any help would be appreciated.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Michael
>>>
>>
>>
>

Re: 答复: Storm Topology Halts

Posted by Michael Chang <mi...@tellapart.com>.

Hey Srinath,

Yep, our ackers don't seem overloaded at all, and the behavior you are
seeing sounds exactly like what we are seeing here.


On Tue, Apr 15, 2014 at 6:47 AM, Srinath C <sr...@gmail.com> wrote:

> I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc).
> All tuples get a fail() on the spout and I'm not sure why. Even a simple
> case of spoutA -> boltB is showing up this behaviour after a continuous
> flow of tuples.
>
> So far increasing ACKer count hasn't helped. All I could figure out was
> the fail() is called from  backtype.storm.utils.RotatingMap#rotate which
> I believe means that the topology.max.spout.pending time has exceeded and
> the tuple is not yet marked as completed. I'm pretty sure there are no
> exceptions in handling the tuples.
>
> Will update if I find any insights.
>
>
>
> On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <zh...@jd.com> wrote:
>
>> Hi Michael Chang,
>>
>>
>>
>>          Did you ack or fail tuple in the bolt timely and please check
>> the bolt processing speed of a tuple.
>>
>>
>>
>>
>>
>>
>>
>> *发件人:* Michael Chang [mailto:mike@tellapart.com]
>> *发送时间:* 2014年4月15日 16:41
>> *收件人:* user@storm.incubator.apache.org
>> *主题:* Storm Topology Halts
>>
>>
>>
>> user@storm.incubator.apache.orgHey all,
>>
>>
>>
>> Issue:
>>
>>
>>
>> We are having issues with stuck topologies.  When submitted and started,
>> our topology will start processing for a while, then completely halt for
>> around topology.max.spout.pending seconds, after which it seems that all
>> the in-flight tuples are failed.  This cycle will loop continuously.  Has
>> anybody seen this issue / have suggestions about how to debug?
>>
>>
>>
>> Environment:
>>
>>
>>
>> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
>> using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
>> originally trying this with the regular netty transport, and reverting back
>> to the zmq transport seemed to help at first, but now we’re seeing the same
>> behavior as well, so it seems like a deeper rooted problem than just the
>> transport.
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Michael
>>
>
>

Re: 答复: Storm Topology Halts

Posted by Srinath C <sr...@gmail.com>.

I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc).
All tuples get a fail() on the spout and I'm not sure why. Even a simple
case of spoutA -> boltB is showing up this behaviour after a continuous
flow of tuples.

So far increasing ACKer count hasn't helped. All I could figure out was the
fail() is called from  backtype.storm.utils.RotatingMap#rotate which I
believe means that the topology.max.spout.pending time has exceeded and the
tuple is not yet marked as completed. I'm pretty sure there are no
exceptions in handling the tuples.

Will update if I find any insights.

On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <zh...@jd.com> wrote:

> Hi Michael Chang,
>
>
>
>          Did you ack or fail tuple in the bolt timely and please check the
> bolt processing speed of a tuple.
>
>
>
>
>
>
>
> *发件人:* Michael Chang [mailto:mike@tellapart.com]
> *发送时间:* 2014年4月15日 16:41
> *收件人:* user@storm.incubator.apache.org
> *主题:* Storm Topology Halts
>
>
>
> user@storm.incubator.apache.orgHey all,
>
>
>
> Issue:
>
>
>
> We are having issues with stuck topologies.  When submitted and started,
> our topology will start processing for a while, then completely halt for
> around topology.max.spout.pending seconds, after which it seems that all
> the in-flight tuples are failed.  This cycle will loop continuously.  Has
> anybody seen this issue / have suggestions about how to debug?
>
>
>
> Environment:
>
>
>
> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
> using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
> originally trying this with the regular netty transport, and reverting back
> to the zmq transport seemed to help at first, but now we’re seeing the same
> behavior as well, so it seems like a deeper rooted problem than just the
> transport.
>
>
>
> Any help would be appreciated.
>
>
>
> Thanks,
>
>
>
> Michael
>

Re: 答复: Storm Topology Halts

Posted by Michael Chang <mi...@tellapart.com>.

Hey 朱春来,

The processing time on the bolts seem reasonable, as does the overall
complete latency of the spout.


On Tue, Apr 15, 2014 at 2:37 AM, 朱春来 <zh...@jd.com> wrote:

> Hi Michael Chang,
>
>
>
>          Did you ack or fail tuple in the bolt timely and please check the
> bolt processing speed of a tuple.
>
>
>
>
>
>
>
> *发件人:* Michael Chang [mailto:mike@tellapart.com]
> *发送时间:* 2014年4月15日 16:41
> *收件人:* user@storm.incubator.apache.org
> *主题:* Storm Topology Halts
>
>
>
> user@storm.incubator.apache.orgHey all,
>
>
>
> Issue:
>
>
>
> We are having issues with stuck topologies.  When submitted and started,
> our topology will start processing for a while, then completely halt for
> around topology.max.spout.pending seconds, after which it seems that all
> the in-flight tuples are failed.  This cycle will loop continuously.  Has
> anybody seen this issue / have suggestions about how to debug?
>
>
>
> Environment:
>
>
>
> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
> using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
> originally trying this with the regular netty transport, and reverting back
> to the zmq transport seemed to help at first, but now we’re seeing the same
> behavior as well, so it seems like a deeper rooted problem than just the
> transport.
>
>
>
> Any help would be appreciated.
>
>
>
> Thanks,
>
>
>
> Michael
>

答复: Storm Topology Halts

Posted by 朱春来 <zh...@jd.com>.

Hi Michael Chang,

 

         Did you ack or fail tuple in the bolt timely and please check the
bolt processing speed of a tuple.

 

 

 

发件人: Michael Chang [mailto:mike@tellapart.com] 
发送时间: 2014年4月15日 16:41
收件人: user@storm.incubator.apache.org
主题: Storm Topology Halts

 

user@storm.incubator.apache.orgHey all,

 

Issue:

 

We are having issues with stuck topologies.  When submitted and started, our
topology will start processing for a while, then completely halt for around
topology.max.spout.pending seconds, after which it seems that all the
in-flight tuples are failed.  This cycle will loop continuously.  Has
anybody seen this issue / have suggestions about how to debug?

 

Environment:

 

We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1 but
using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were originally
trying this with the regular netty transport, and reverting back to the zmq
transport seemed to help at first, but now we’re seeing the same behavior
as well, so it seems like a deeper rooted problem than just the transport.

 

Any help would be appreciated.

 

Thanks,

 

Michael