You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Carlos Perelló Marín <ca...@serverdensity.com> on 2015/05/28 21:45:01 UTC

tuple size limitation?

Hi,

While working with Apache Storm 0.9.4 with python + multilang, I found that
one tuple was hanging the topology. It took me a while to figure what's
going on and why it stopped processing payloads until I found that the
hanged bolt was blocked waiting from input on its stdin (it hangs calling
emit).

After inspecting the tuple that hanged it I found that it includes a json
string that is about 75KB long, it's valid JSON so it's not corrupted but
for some reason breaks when it's emitted.

I'm using Kafka as a way to inject tuples into my topology and the
KafkaSpout is able to inject such tuple so I wonder whether it's just a
limitation of the multilang implementation...

Is there any hint to debug or fix it?

The worse thing is that there was no errors on the supervisor nor workers
logs I just found this because I inspected the processes manually with
strace and adding log output on my code to find the place where it hanged.

Thanks in advance!

-- 

Carlos Perelló Marínhttps://www.serverdensity.com

Re: tuple size limitation?

Posted by Carlos Perelló Marín <ca...@serverdensity.com>.
Right, that's why I don't understand why serializing with json the tuple
before emitting it fixes the issue. If the whole message is going to be
serialized with JSON anyway I would expect it to work. (I'm ignoring the
JSON encoding/decoding performance, just talking about functionality).
Also, the python dictionary doesn't have any data type that json is not
able to handle, so that's not the issue.

On 29 May 2015 at 14:35, Nathan Leung <nc...@gmail.com> wrote:

> The default (and in old releases ONLY) multi lang serializer is json,
> which is in fact slow.
> On May 29, 2015 8:04 AM, "Andrew Xor" <an...@gmail.com> wrote:
>
>> ​I think in the storm documentation it clearly says that not only you
>> have to serialize your objects but when using custom types it is better to
>> implement your own to avoid the "native" serializer which is quite slow.​ I
>> have not used storm multi-lang though to be honest.
>>
>> Regards.
>>
>> On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín <
>> carlos@serverdensity.com> wrote:
>>
>>> Found the problem... I'm not serializing the json object so when I call
>>> emit, it's a python dictionary. It works most of the time, but for some
>>> reason we found several values that break it.
>>>
>>> I'm not 100% it's not a problem with the storm's multilang support,
>>> given that the emit ends doing a json.dumps() call anyway before sending it
>>> to the ShellBolt or ShellSpout Java class, so it should not break the
>>> protocol.
>>>
>>> I have a workaround for my problem, but would be nice to know if it's a
>>> bug or the right behavior, because having to serialize / unserialize that
>>> argument on every bolt would cost us some extra processing time.
>>>
>>> Thanks.
>>>
>>> On 28 May 2015 at 22:35, Andrew Xor <an...@gmail.com> wrote:
>>>
>>>> This must be awkward as I have used storm with tuples that are quite
>>>> large with no such problem. Try to replicate with a single spout that
>>>> generates huge tuples and a single bolt as a consumer and report back your
>>>> results
>>>>
>>>> Regards
>>>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> I would take the kafka spout, JSON, your code out of the equation and
>>>>> replicate the problem with a spout that generates strings of various
>>>>> lengths around 75KB.
>>>>>
>>>>> Thank you for your time!
>>>>>
>>>>> +++++++++++++++++++++
>>>>> Jeff Maass <ma...@gmail.com>
>>>>> linkedin.com/in/jeffmaass
>>>>> stackoverflow.com/users/373418/maassql
>>>>> +++++++++++++++++++++
>>>>>
>>>>>
>>>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>>>>> carlos@serverdensity.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> While working with Apache Storm 0.9.4 with python + multilang, I
>>>>>> found that one tuple was hanging the topology. It took me a while to figure
>>>>>> what's going on and why it stopped processing payloads until I found that
>>>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>>>>> calling emit).
>>>>>>
>>>>>> After inspecting the tuple that hanged it I found that it includes a
>>>>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>>>>> but for some reason breaks when it's emitted.
>>>>>>
>>>>>> I'm using Kafka as a way to inject tuples into my topology and the
>>>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>>>>> limitation of the multilang implementation...
>>>>>>
>>>>>> Is there any hint to debug or fix it?
>>>>>>
>>>>>> The worse thing is that there was no errors on the supervisor nor
>>>>>> workers logs I just found this because I inspected the processes manually
>>>>>> with strace and adding log output on my code to find the place where it
>>>>>> hanged.
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>
>>>
>>


-- 

Carlos Perelló Marínhttps://www.serverdensity.com

Re: tuple size limitation?

Posted by Nathan Leung <nc...@gmail.com>.
The default (and in old releases ONLY) multi lang serializer is json, which
is in fact slow.
On May 29, 2015 8:04 AM, "Andrew Xor" <an...@gmail.com> wrote:

> ​I think in the storm documentation it clearly says that not only you have
> to serialize your objects but when using custom types it is better to
> implement your own to avoid the "native" serializer which is quite slow.​ I
> have not used storm multi-lang though to be honest.
>
> Regards.
>
> On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín <
> carlos@serverdensity.com> wrote:
>
>> Found the problem... I'm not serializing the json object so when I call
>> emit, it's a python dictionary. It works most of the time, but for some
>> reason we found several values that break it.
>>
>> I'm not 100% it's not a problem with the storm's multilang support, given
>> that the emit ends doing a json.dumps() call anyway before sending it to
>> the ShellBolt or ShellSpout Java class, so it should not break the protocol.
>>
>> I have a workaround for my problem, but would be nice to know if it's a
>> bug or the right behavior, because having to serialize / unserialize that
>> argument on every bolt would cost us some extra processing time.
>>
>> Thanks.
>>
>> On 28 May 2015 at 22:35, Andrew Xor <an...@gmail.com> wrote:
>>
>>> This must be awkward as I have used storm with tuples that are quite
>>> large with no such problem. Try to replicate with a single spout that
>>> generates huge tuples and a single bolt as a consumer and report back your
>>> results
>>>
>>> Regards
>>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <ma...@gmail.com>
>>> wrote:
>>>
>>>> I would take the kafka spout, JSON, your code out of the equation and
>>>> replicate the problem with a spout that generates strings of various
>>>> lengths around 75KB.
>>>>
>>>> Thank you for your time!
>>>>
>>>> +++++++++++++++++++++
>>>> Jeff Maass <ma...@gmail.com>
>>>> linkedin.com/in/jeffmaass
>>>> stackoverflow.com/users/373418/maassql
>>>> +++++++++++++++++++++
>>>>
>>>>
>>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>>>> carlos@serverdensity.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> While working with Apache Storm 0.9.4 with python + multilang, I found
>>>>> that one tuple was hanging the topology. It took me a while to figure
>>>>> what's going on and why it stopped processing payloads until I found that
>>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>>>> calling emit).
>>>>>
>>>>> After inspecting the tuple that hanged it I found that it includes a
>>>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>>>> but for some reason breaks when it's emitted.
>>>>>
>>>>> I'm using Kafka as a way to inject tuples into my topology and the
>>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>>>> limitation of the multilang implementation...
>>>>>
>>>>> Is there any hint to debug or fix it?
>>>>>
>>>>> The worse thing is that there was no errors on the supervisor nor
>>>>> workers logs I just found this because I inspected the processes manually
>>>>> with strace and adding log output on my code to find the place where it
>>>>> hanged.
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --
>>>>>
>>>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>>>
>>>>>
>>>>
>>
>>
>> --
>>
>> Carlos Perelló Marínhttps://www.serverdensity.com
>>
>>
>

Re: tuple size limitation?

Posted by Andrew Xor <an...@gmail.com>.
​I think in the storm documentation it clearly says that not only you have
to serialize your objects but when using custom types it is better to
implement your own to avoid the "native" serializer which is quite slow.​ I
have not used storm multi-lang though to be honest.

Regards.

On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín <
carlos@serverdensity.com> wrote:

> Found the problem... I'm not serializing the json object so when I call
> emit, it's a python dictionary. It works most of the time, but for some
> reason we found several values that break it.
>
> I'm not 100% it's not a problem with the storm's multilang support, given
> that the emit ends doing a json.dumps() call anyway before sending it to
> the ShellBolt or ShellSpout Java class, so it should not break the protocol.
>
> I have a workaround for my problem, but would be nice to know if it's a
> bug or the right behavior, because having to serialize / unserialize that
> argument on every bolt would cost us some extra processing time.
>
> Thanks.
>
> On 28 May 2015 at 22:35, Andrew Xor <an...@gmail.com> wrote:
>
>> This must be awkward as I have used storm with tuples that are quite
>> large with no such problem. Try to replicate with a single spout that
>> generates huge tuples and a single bolt as a consumer and report back your
>> results
>>
>> Regards
>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <ma...@gmail.com> wrote:
>>
>>> I would take the kafka spout, JSON, your code out of the equation and
>>> replicate the problem with a spout that generates strings of various
>>> lengths around 75KB.
>>>
>>> Thank you for your time!
>>>
>>> +++++++++++++++++++++
>>> Jeff Maass <ma...@gmail.com>
>>> linkedin.com/in/jeffmaass
>>> stackoverflow.com/users/373418/maassql
>>> +++++++++++++++++++++
>>>
>>>
>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>>> carlos@serverdensity.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> While working with Apache Storm 0.9.4 with python + multilang, I found
>>>> that one tuple was hanging the topology. It took me a while to figure
>>>> what's going on and why it stopped processing payloads until I found that
>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>>> calling emit).
>>>>
>>>> After inspecting the tuple that hanged it I found that it includes a
>>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>>> but for some reason breaks when it's emitted.
>>>>
>>>> I'm using Kafka as a way to inject tuples into my topology and the
>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>>> limitation of the multilang implementation...
>>>>
>>>> Is there any hint to debug or fix it?
>>>>
>>>> The worse thing is that there was no errors on the supervisor nor
>>>> workers logs I just found this because I inspected the processes manually
>>>> with strace and adding log output on my code to find the place where it
>>>> hanged.
>>>>
>>>> Thanks in advance!
>>>>
>>>> --
>>>>
>>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>>
>>>>
>>>
>
>
> --
>
> Carlos Perelló Marínhttps://www.serverdensity.com
>
>

Re: tuple size limitation?

Posted by Carlos Perelló Marín <ca...@serverdensity.com>.
Found the problem... I'm not serializing the json object so when I call
emit, it's a python dictionary. It works most of the time, but for some
reason we found several values that break it.

I'm not 100% it's not a problem with the storm's multilang support, given
that the emit ends doing a json.dumps() call anyway before sending it to
the ShellBolt or ShellSpout Java class, so it should not break the protocol.

I have a workaround for my problem, but would be nice to know if it's a bug
or the right behavior, because having to serialize / unserialize that
argument on every bolt would cost us some extra processing time.

Thanks.

On 28 May 2015 at 22:35, Andrew Xor <an...@gmail.com> wrote:

> This must be awkward as I have used storm with tuples that are quite large
> with no such problem. Try to replicate with a single spout that generates
> huge tuples and a single bolt as a consumer and report back your results
>
> Regards
> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <ma...@gmail.com> wrote:
>
>> I would take the kafka spout, JSON, your code out of the equation and
>> replicate the problem with a spout that generates strings of various
>> lengths around 75KB.
>>
>> Thank you for your time!
>>
>> +++++++++++++++++++++
>> Jeff Maass <ma...@gmail.com>
>> linkedin.com/in/jeffmaass
>> stackoverflow.com/users/373418/maassql
>> +++++++++++++++++++++
>>
>>
>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>> carlos@serverdensity.com> wrote:
>>
>>> Hi,
>>>
>>> While working with Apache Storm 0.9.4 with python + multilang, I found
>>> that one tuple was hanging the topology. It took me a while to figure
>>> what's going on and why it stopped processing payloads until I found that
>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>> calling emit).
>>>
>>> After inspecting the tuple that hanged it I found that it includes a
>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>> but for some reason breaks when it's emitted.
>>>
>>> I'm using Kafka as a way to inject tuples into my topology and the
>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>> limitation of the multilang implementation...
>>>
>>> Is there any hint to debug or fix it?
>>>
>>> The worse thing is that there was no errors on the supervisor nor
>>> workers logs I just found this because I inspected the processes manually
>>> with strace and adding log output on my code to find the place where it
>>> hanged.
>>>
>>> Thanks in advance!
>>>
>>> --
>>>
>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>
>>>
>>


-- 

Carlos Perelló Marínhttps://www.serverdensity.com

Re: tuple size limitation?

Posted by Andrew Xor <an...@gmail.com>.
This must be awkward as I have used storm with tuples that are quite large
with no such problem. Try to replicate with a single spout that generates
huge tuples and a single bolt as a consumer and report back your results

Regards
On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <ma...@gmail.com> wrote:

> I would take the kafka spout, JSON, your code out of the equation and
> replicate the problem with a spout that generates strings of various
> lengths around 75KB.
>
> Thank you for your time!
>
> +++++++++++++++++++++
> Jeff Maass <ma...@gmail.com>
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
>
>
> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
> carlos@serverdensity.com> wrote:
>
>> Hi,
>>
>> While working with Apache Storm 0.9.4 with python + multilang, I found
>> that one tuple was hanging the topology. It took me a while to figure
>> what's going on and why it stopped processing payloads until I found that
>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>> calling emit).
>>
>> After inspecting the tuple that hanged it I found that it includes a json
>> string that is about 75KB long, it's valid JSON so it's not corrupted but
>> for some reason breaks when it's emitted.
>>
>> I'm using Kafka as a way to inject tuples into my topology and the
>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>> limitation of the multilang implementation...
>>
>> Is there any hint to debug or fix it?
>>
>> The worse thing is that there was no errors on the supervisor nor workers
>> logs I just found this because I inspected the processes manually with
>> strace and adding log output on my code to find the place where it hanged.
>>
>> Thanks in advance!
>>
>> --
>>
>> Carlos Perelló Marínhttps://www.serverdensity.com
>>
>>
>

Re: tuple size limitation?

Posted by Jeffery Maass <ma...@gmail.com>.
I would take the kafka spout, JSON, your code out of the equation and
replicate the problem with a spout that generates strings of various
lengths around 75KB.

Thank you for your time!

+++++++++++++++++++++
Jeff Maass <ma...@gmail.com>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
carlos@serverdensity.com> wrote:

> Hi,
>
> While working with Apache Storm 0.9.4 with python + multilang, I found
> that one tuple was hanging the topology. It took me a while to figure
> what's going on and why it stopped processing payloads until I found that
> the hanged bolt was blocked waiting from input on its stdin (it hangs
> calling emit).
>
> After inspecting the tuple that hanged it I found that it includes a json
> string that is about 75KB long, it's valid JSON so it's not corrupted but
> for some reason breaks when it's emitted.
>
> I'm using Kafka as a way to inject tuples into my topology and the
> KafkaSpout is able to inject such tuple so I wonder whether it's just a
> limitation of the multilang implementation...
>
> Is there any hint to debug or fix it?
>
> The worse thing is that there was no errors on the supervisor nor workers
> logs I just found this because I inspected the processes manually with
> strace and adding log output on my code to find the place where it hanged.
>
> Thanks in advance!
>
> --
>
> Carlos Perelló Marínhttps://www.serverdensity.com
>
>