You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Eran Chinthaka Withana <er...@gmail.com> on 2015/05/14 02:11:12 UTC

Storm topology getting stuck

Hi,

Storm version: 0.9.2

I'm running a topology where, based on an event, I try to synch one
database to the other. After a kafka spout (with db info in the message),
- the first bolt sends a tuple for each table in the db,
- the second bolt, reads from the given table and sends batches of rows for
a given table
- third bolt, writes data to the database
- fourth one (field grouping with 3rd bolt) sends success/failure for a
table
- last one (field grouping with 4th bolt) collects all table level info and
sends out the final message

This topology runs without any issue for small databases. But when the db
gets slightly larger seems like the topology gets stuck after processing
some tuples and not proceeding beyond that.

I saw a discussion similar to this here[1] but seems it is happening due to
too many pending spout messages. But in my cases, its related to the large
number of tuples coming out from bolts. As yoyu can imagine the fan out
from the second bolt can be extremely high. For example, in one case I was
sending as many as 1000 tuples from second to third bolt and from there to
4th.

I'm just wondering why this is getting stuck? Are there any buffer sizes in
play here? How can I fix this without ideally not changing the topology
design.

Really appreciate your input here.

Thanks,
Eran Withana

Re: Storm topology getting stuck

Posted by Dima Dragan <di...@belleron.net>.

Hi Eran,

Have you checked storm UI metrics? Is there capacity overload?

Also please check log files for errors.

Best regards,
Dmytro Dragan
On May 14, 2015 08:45, "Eran Chinthaka Withana" <er...@gmail.com>
wrote:

> Hi Nathan
>
> No I still didn't try jstack
>
> But I'm just wondering whether this is something that could within storm
> itself due to the amount of messages being emitted for one spout message.
> Wondering whether the internal executor buffers or ackers are having
> issues. Have you seen this before? Also, is this something storm could
> handle you think?
>
> Thanks,
> Eran Chinthaka Withana
>
> On Wed, May 13, 2015 at 6:34 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> Did you try running jstack to see what your threads are doing?
>>
>> On Wed, May 13, 2015 at 9:13 PM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>>
>>> Here is the complete config and topology builder code (written in
>>> scala): http://pastebin.com/xhJGDsXn
>>>
>>
>>
>

Re: Storm topology getting stuck

Posted by Nathan Leung <nc...@gmail.com>.

When you send a tuple from the spout, it sets the message id in the acker.
When a message fans out, it did logical xor of new tuple ids with the
existing message id, and when tuples get acked their message id id xor'd
into the same value again. This fanning out tuples should be quite light
weight.

I've worked with a topology that had very very night fan outs but we did
not use reliable messaging for this topology because of the time scale so
it's not directly comparable. Maybe others who have experience with this
setup can chime in.
On May 14, 2015 1:45 AM, "Eran Chinthaka Withana" <er...@gmail.com>
wrote:

> Hi Nathan
>
> No I still didn't try jstack
>
> But I'm just wondering whether this is something that could within storm
> itself due to the amount of messages being emitted for one spout message.
> Wondering whether the internal executor buffers or ackers are having
> issues. Have you seen this before? Also, is this something storm could
> handle you think?
>
> Thanks,
> Eran Chinthaka Withana
>
> On Wed, May 13, 2015 at 6:34 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> Did you try running jstack to see what your threads are doing?
>>
>> On Wed, May 13, 2015 at 9:13 PM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>>
>>> Here is the complete config and topology builder code (written in
>>> scala): http://pastebin.com/xhJGDsXn
>>>
>>
>>
>

Re: Storm topology getting stuck

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Hi Nathan

No I still didn't try jstack

But I'm just wondering whether this is something that could within storm
itself due to the amount of messages being emitted for one spout message.
Wondering whether the internal executor buffers or ackers are having
issues. Have you seen this before? Also, is this something storm could
handle you think?

Thanks,
Eran Chinthaka Withana

On Wed, May 13, 2015 at 6:34 PM, Nathan Leung <nc...@gmail.com> wrote:

> Did you try running jstack to see what your threads are doing?
>
> On Wed, May 13, 2015 at 9:13 PM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
>> Here is the complete config and topology builder code (written in scala):
>> http://pastebin.com/xhJGDsXn
>>
>
>

Re: Storm topology getting stuck

Posted by Nathan Leung <nc...@gmail.com>.

Did you try running jstack to see what your threads are doing?

On Wed, May 13, 2015 at 9:13 PM, Eran Chinthaka Withana <
eran.chinthaka@gmail.com> wrote:

> Here is the complete config and topology builder code (written in scala):
> http://pastebin.com/xhJGDsXn
>

Re: Storm topology getting stuck

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Here is the complete config and topology builder code (written in scala):
http://pastebin.com/xhJGDsXn

Re: Storm topology getting stuck

Posted by Eran Chinthaka Withana <er...@gmail.com>.

No they are not. I set a very high timeout. For testing it is,

conf.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, Int.box(1 * 60 * 60))

Thanks,
Eran Chinthaka Withana

On Wed, May 13, 2015 at 6:03 PM, Nathan Leung <nc...@gmail.com> wrote:

> Are your tuples from the spout timing out and being replayed? This could
> cause the topology to spin doing the same table/s over and over.
> On May 13, 2015 8:12 PM, "Eran Chinthaka Withana" <
> eran.chinthaka@gmail.com> wrote:
>
>> Hi,
>>
>> Storm version: 0.9.2
>>
>> I'm running a topology where, based on an event, I try to synch one
>> database to the other. After a kafka spout (with db info in the message),
>> - the first bolt sends a tuple for each table in the db,
>> - the second bolt, reads from the given table and sends batches of rows
>> for a given table
>> - third bolt, writes data to the database
>> - fourth one (field grouping with 3rd bolt) sends success/failure for a
>> table
>> - last one (field grouping with 4th bolt) collects all table level info
>> and sends out the final message
>>
>> This topology runs without any issue for small databases. But when the db
>> gets slightly larger seems like the topology gets stuck after processing
>> some tuples and not proceeding beyond that.
>>
>> I saw a discussion similar to this here[1] but seems it is happening due
>> to too many pending spout messages. But in my cases, its related to the
>> large number of tuples coming out from bolts. As yoyu can imagine the fan
>> out from the second bolt can be extremely high. For example, in one case I
>> was sending as many as 1000 tuples from second to third bolt and from there
>> to 4th.
>>
>> I'm just wondering why this is getting stuck? Are there any buffer sizes
>> in play here? How can I fix this without ideally not changing the topology
>> design.
>>
>> Really appreciate your input here.
>>
>> Thanks,
>> Eran Withana
>>
>

Re: Storm topology getting stuck

Posted by Nathan Leung <nc...@gmail.com>.

Are your tuples from the spout timing out and being replayed? This could
cause the topology to spin doing the same table/s over and over.
On May 13, 2015 8:12 PM, "Eran Chinthaka Withana" <er...@gmail.com>
wrote:

> Hi,
>
> Storm version: 0.9.2
>
> I'm running a topology where, based on an event, I try to synch one
> database to the other. After a kafka spout (with db info in the message),
> - the first bolt sends a tuple for each table in the db,
> - the second bolt, reads from the given table and sends batches of rows
> for a given table
> - third bolt, writes data to the database
> - fourth one (field grouping with 3rd bolt) sends success/failure for a
> table
> - last one (field grouping with 4th bolt) collects all table level info
> and sends out the final message
>
> This topology runs without any issue for small databases. But when the db
> gets slightly larger seems like the topology gets stuck after processing
> some tuples and not proceeding beyond that.
>
> I saw a discussion similar to this here[1] but seems it is happening due
> to too many pending spout messages. But in my cases, its related to the
> large number of tuples coming out from bolts. As yoyu can imagine the fan
> out from the second bolt can be extremely high. For example, in one case I
> was sending as many as 1000 tuples from second to third bolt and from there
> to 4th.
>
> I'm just wondering why this is getting stuck? Are there any buffer sizes
> in play here? How can I fix this without ideally not changing the topology
> design.
>
> Really appreciate your input here.
>
> Thanks,
> Eran Withana
>