You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Scot Kronenfeld <sc...@gmail.com> on 2014/06/11 21:26:54 UTC

Python - When spout dies, bolts keep receiving input on STDIN

I am using Storm w/ my bolts and spouts written in Python.

When I am running in a test environment locally I have a problem where if
the spout dies, the bolts consume 100% CPU and gradually increase their
memory.  Here are the details:

My spout is reading from mongo.  Sometimes it loses its cursor (due to a
network hiccup or something) and it raises an exception and bails out.  I
can also reliably reproduce this problem by using "kill -9 <spout PID>".

Using strace and then a debugger, I figured out that the bolts are stuck in
this tight loop in the readMsg function inside storm.py (which ships with
storm)

    while True:
        line = sys.stdin.readline()[:-1]
        if line == "end":
            break
        msg = msg + line + "\n"

The readline() call is a blocking call, but the bolt keeps getting blank
lines as input.

Note: the memory problem is because newlines keep getting appended.  Since
the input is used as JSON, it would probably be safe to just remove the
addition of the newline (I'm not 100% positive because that might not work
if there are newlines in an embedded string within the JSON).  But that
still doesn't fix the core issue.

I think the problem is that if the spout does not come down cleanly,
something in the Java keeps sending input to the bolts.  I'm about to dig
into the Java code but I don't know any Java so I figured it was worth a
quick message to the Storm list to see if this is a known problem.  Or even
if anyone has a pointer for where to look in the Java.  I haven't
previously looked at the Storm source - it's just been a black box to this
point for me.

Thanks,
scot

Re: Python - When spout dies, bolts keep receiving input on STDIN

Posted by Andrew Montalenti <an...@parsely.com>.
Also looks like a recent pull request fixes this issue in storm.py:

https://github.com/apache/incubator-storm/pull/140/files


On Tue, Jun 17, 2014 at 7:33 AM, Andrew Montalenti <an...@parsely.com>
wrote:

> I believe this is a problem in the storm.py multilang prototype adapter
> bundled with Storm. We fixed this issue in a more full fledged multilang
> adapter that is available here:
>
> https://github.com/Parsely/streamparse
>
> You could try pip install streamparse and change your bolt to subclass
> streamparse.bolt.Bolt instead and see if the problem goes away. Full API
> docs here:
>
> http://streamparse.readthedocs.org/en/latest/api.html
> On Jun 11, 2014 3:27 PM, "Scot Kronenfeld" <sc...@gmail.com> wrote:
>
>> I am using Storm w/ my bolts and spouts written in Python.
>>
>> When I am running in a test environment locally I have a problem where if
>> the spout dies, the bolts consume 100% CPU and gradually increase their
>> memory.  Here are the details:
>>
>> My spout is reading from mongo.  Sometimes it loses its cursor (due to a
>> network hiccup or something) and it raises an exception and bails out.  I
>> can also reliably reproduce this problem by using "kill -9 <spout PID>".
>>
>> Using strace and then a debugger, I figured out that the bolts are stuck
>> in this tight loop in the readMsg function inside storm.py (which ships
>> with storm)
>>
>>     while True:
>>         line = sys.stdin.readline()[:-1]
>>         if line == "end":
>>             break
>>         msg = msg + line + "\n"
>>
>> The readline() call is a blocking call, but the bolt keeps getting blank
>> lines as input.
>>
>> Note: the memory problem is because newlines keep getting appended.
>> Since the input is used as JSON, it would probably be safe to just remove
>> the addition of the newline (I'm not 100% positive because that might not
>> work if there are newlines in an embedded string within the JSON).  But
>> that still doesn't fix the core issue.
>>
>> I think the problem is that if the spout does not come down cleanly,
>> something in the Java keeps sending input to the bolts.  I'm about to dig
>> into the Java code but I don't know any Java so I figured it was worth a
>> quick message to the Storm list to see if this is a known problem.  Or even
>> if anyone has a pointer for where to look in the Java.  I haven't
>> previously looked at the Storm source - it's just been a black box to this
>> point for me.
>>
>> Thanks,
>> scot
>>
>

Re: Python - When spout dies, bolts keep receiving input on STDIN

Posted by Andrew Montalenti <an...@parsely.com>.
I believe this is a problem in the storm.py multilang prototype adapter
bundled with Storm. We fixed this issue in a more full fledged multilang
adapter that is available here:

https://github.com/Parsely/streamparse

You could try pip install streamparse and change your bolt to subclass
streamparse.bolt.Bolt instead and see if the problem goes away. Full API
docs here:

http://streamparse.readthedocs.org/en/latest/api.html
On Jun 11, 2014 3:27 PM, "Scot Kronenfeld" <sc...@gmail.com> wrote:

> I am using Storm w/ my bolts and spouts written in Python.
>
> When I am running in a test environment locally I have a problem where if
> the spout dies, the bolts consume 100% CPU and gradually increase their
> memory.  Here are the details:
>
> My spout is reading from mongo.  Sometimes it loses its cursor (due to a
> network hiccup or something) and it raises an exception and bails out.  I
> can also reliably reproduce this problem by using "kill -9 <spout PID>".
>
> Using strace and then a debugger, I figured out that the bolts are stuck
> in this tight loop in the readMsg function inside storm.py (which ships
> with storm)
>
>     while True:
>         line = sys.stdin.readline()[:-1]
>         if line == "end":
>             break
>         msg = msg + line + "\n"
>
> The readline() call is a blocking call, but the bolt keeps getting blank
> lines as input.
>
> Note: the memory problem is because newlines keep getting appended.  Since
> the input is used as JSON, it would probably be safe to just remove the
> addition of the newline (I'm not 100% positive because that might not work
> if there are newlines in an embedded string within the JSON).  But that
> still doesn't fix the core issue.
>
> I think the problem is that if the spout does not come down cleanly,
> something in the Java keeps sending input to the bolts.  I'm about to dig
> into the Java code but I don't know any Java so I figured it was worth a
> quick message to the Storm list to see if this is a known problem.  Or even
> if anyone has a pointer for where to look in the Java.  I haven't
> previously looked at the Storm source - it's just been a black box to this
> point for me.
>
> Thanks,
> scot
>