You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@eagle.apache.org by Edward Zhang <yo...@apache.org> on 2015/12/01 05:46:26 UTC

[Discussion] storm local-mode event object reuse bug

Hi Storm developers,

Today, I hit one possible storm issue which happens in local mode. In local
mode, one event object is sent out of spout and looks it does not go
through serialization/deserialization, instead this event object including
its members is directly referenced by following bolts. So when one bolt
modifies this event object then another bolt will also see the changes
immediately.

For example the event object sent by spout includes one java Map object, if
there are 2 following bolts after this spout, then in one bolt if we modify
this Map object, then the other bolt will see that or throw
concurrentmodificationexception if it iterates the Map Object.

Please let us know if this behavior should be corrected by storm framework
or by storm application. In storm application, we can do deep copy if it's
local mode, but in storm framework, probably serialization/deserialization
should be always executed.

Let me know your thoughts.

Thanks
Edward Zhang

Re: [Discussion] storm local-mode event object reuse bug

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.
In my opinion, it is not about immutability of an object. It is about the contract between storm framework and storm application. In this case, it looks like application code has to deep copy every object from input because it can’t be reused.

I think that is also fine if the contract is that application should assume the event object you received is possibly shared. But ImmutableMap would not solve the problem.

Thanks
Edward

From: "Grant Overby (groverby)" <gr...@cisco.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, December 1, 2015 at 8:25
To: user <us...@storm.apache.org>>
Subject: Re: [Discussion] storm local-mode event object reuse bug

Serialization isn’t free. By skipping it where possible, even in a cluster, it’s worth doing so to conserve CPU resources.

Using immutable objects is cheaper. Assuming you’re coding in java, consider using ImmutableMap, ImmutableMap.Builder, and similar classes in the Guava library from Google. http://docs.guava-libraries.googlecode.com/git-history/v18.0/javadoc/com/google/common/collect/ImmutableMap.html


[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
groverby@cisco.com<ma...@cisco.com>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.





From: Nathan Leung <nc...@gmail.com>>
Reply-To: user <us...@storm.apache.org>>
Date: Tuesday, December 1, 2015 at 9:30 AM
To: user <us...@storm.apache.org>>
Subject: Re: [Discussion] storm local-mode event object reuse bug

It is bypassed by design.  As noted in https://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html, the emitted objects must be immutable.  If you're intent on modifying them, be very careful.

On Tue, Dec 1, 2015 at 4:28 AM, Stephen Powis <sp...@salesforce.com>> wrote:
I believe anytime tuples are passed between bolts on the same jvm (either in local mode or in remote mode where the upstream and downstream bolt both reside on the same worker) serialization is bypassed by design.

On Tue, Dec 1, 2015 at 1:46 PM, Edward Zhang <yo...@apache.org>> wrote:
Hi Storm developers,

Today, I hit one possible storm issue which happens in local mode. In local mode, one event object is sent out of spout and looks it does not go through serialization/deserialization, instead this event object including its members is directly referenced by following bolts. So when one bolt modifies this event object then another bolt will also see the changes immediately.

For example the event object sent by spout includes one java Map object, if there are 2 following bolts after this spout, then in one bolt if we modify this Map object, then the other bolt will see that or throw concurrentmodificationexception if it iterates the Map Object.

Please let us know if this behavior should be corrected by storm framework or by storm application. In storm application, we can do deep copy if it's local mode, but in storm framework, probably serialization/deserialization should be always executed.

Let me know your thoughts.

Thanks
Edward Zhang



Re: [Discussion] storm local-mode event object reuse bug

Posted by "Grant Overby (groverby)" <gr...@cisco.com>.
Serialization isn't free. By skipping it where possible, even in a cluster, it's worth doing so to conserve CPU resources.

Using immutable objects is cheaper. Assuming you're coding in java, consider using ImmutableMap, ImmutableMap.Builder, and similar classes in the Guava library from Google. http://docs.guava-libraries.googlecode.com/git-history/v18.0/javadoc/com/google/common/collect/ImmutableMap.html


[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
groverby@cisco.com<ma...@cisco.com>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.





From: Nathan Leung <nc...@gmail.com>>
Reply-To: user <us...@storm.apache.org>>
Date: Tuesday, December 1, 2015 at 9:30 AM
To: user <us...@storm.apache.org>>
Subject: Re: [Discussion] storm local-mode event object reuse bug

It is bypassed by design.  As noted in https://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html, the emitted objects must be immutable.  If you're intent on modifying them, be very careful.

On Tue, Dec 1, 2015 at 4:28 AM, Stephen Powis <sp...@salesforce.com>> wrote:
I believe anytime tuples are passed between bolts on the same jvm (either in local mode or in remote mode where the upstream and downstream bolt both reside on the same worker) serialization is bypassed by design.

On Tue, Dec 1, 2015 at 1:46 PM, Edward Zhang <yo...@apache.org>> wrote:
Hi Storm developers,

Today, I hit one possible storm issue which happens in local mode. In local mode, one event object is sent out of spout and looks it does not go through serialization/deserialization, instead this event object including its members is directly referenced by following bolts. So when one bolt modifies this event object then another bolt will also see the changes immediately.

For example the event object sent by spout includes one java Map object, if there are 2 following bolts after this spout, then in one bolt if we modify this Map object, then the other bolt will see that or throw concurrentmodificationexception if it iterates the Map Object.

Please let us know if this behavior should be corrected by storm framework or by storm application. In storm application, we can do deep copy if it's local mode, but in storm framework, probably serialization/deserialization should be always executed.

Let me know your thoughts.

Thanks
Edward Zhang



Re: [Discussion] storm local-mode event object reuse bug

Posted by Nathan Leung <nc...@gmail.com>.
It is bypassed by design.  As noted in
https://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html,
the emitted objects must be immutable.  If you're intent on modifying them,
be very careful.

On Tue, Dec 1, 2015 at 4:28 AM, Stephen Powis <sp...@salesforce.com> wrote:

> I believe anytime tuples are passed between bolts on the same jvm (either
> in local mode or in remote mode where the upstream and downstream bolt both
> reside on the same worker) serialization is bypassed by design.
>
> On Tue, Dec 1, 2015 at 1:46 PM, Edward Zhang <yo...@apache.org>
> wrote:
>
>> Hi Storm developers,
>>
>> Today, I hit one possible storm issue which happens in local mode. In
>> local mode, one event object is sent out of spout and looks it does not go
>> through serialization/deserialization, instead this event object including
>> its members is directly referenced by following bolts. So when one bolt
>> modifies this event object then another bolt will also see the changes
>> immediately.
>>
>> For example the event object sent by spout includes one java Map object,
>> if there are 2 following bolts after this spout, then in one bolt if we
>> modify this Map object, then the other bolt will see that or throw
>> concurrentmodificationexception if it iterates the Map Object.
>>
>> Please let us know if this behavior should be corrected by storm
>> framework or by storm application. In storm application, we can do deep
>> copy if it's local mode, but in storm framework, probably
>> serialization/deserialization should be always executed.
>>
>> Let me know your thoughts.
>>
>> Thanks
>> Edward Zhang
>>
>
>

Re: [Discussion] storm local-mode event object reuse bug

Posted by Stephen Powis <sp...@salesforce.com>.
I believe anytime tuples are passed between bolts on the same jvm (either
in local mode or in remote mode where the upstream and downstream bolt both
reside on the same worker) serialization is bypassed by design.

On Tue, Dec 1, 2015 at 1:46 PM, Edward Zhang <yo...@apache.org>
wrote:

> Hi Storm developers,
>
> Today, I hit one possible storm issue which happens in local mode. In
> local mode, one event object is sent out of spout and looks it does not go
> through serialization/deserialization, instead this event object including
> its members is directly referenced by following bolts. So when one bolt
> modifies this event object then another bolt will also see the changes
> immediately.
>
> For example the event object sent by spout includes one java Map object,
> if there are 2 following bolts after this spout, then in one bolt if we
> modify this Map object, then the other bolt will see that or throw
> concurrentmodificationexception if it iterates the Map Object.
>
> Please let us know if this behavior should be corrected by storm framework
> or by storm application. In storm application, we can do deep copy if it's
> local mode, but in storm framework, probably serialization/deserialization
> should be always executed.
>
> Let me know your thoughts.
>
> Thanks
> Edward Zhang
>