You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by hjh <ap...@163.com> on 2015/01/05 01:18:48 UTC

about storm tuple

Hi, I am new to storm and I met a problem with tuple. In the local mode 
does tuple between connected bolts share the same object? For example 
BoltA emit a tuple to BoltB. If BoltB is processing the tuple (this 
tuple is assigned to a private variable, say VAR, in BoltB) and at the 
same time BoltA sends another tuple to BoltB, then VAR changed 
immediately. Does that mean in local mode BoltA and BoltB share the same 
tuple instance?? And how to deal with such situation?

PS. I use java.

Any suggestion is warmly welcomed


Thank you very much!!!


Re: about storm tuple

Posted by hjh <ap...@163.com>.
Thank you very much!! I will try that!!


On 01/05/2015 01:01 AM, Nathan Leung wrote:
> Out of the box storm does not allow you to determine which machine a 
> bolt or spout runs on.  However, it's possible to write a custom 
> scheduler to do this (see 
> http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ 
> for a guide).
>
> On Sun, Jan 4, 2015 at 11:51 PM, hjh <applyhhj@163.com 
> <ma...@163.com>> wrote:
>
>     Thank you very much for the explanation. I am new to storm so I
>     thought the whole topology runs like a message broker. So I design
>     the topology accordingly. By the way, in most cases, we can not
>     decide which bolt or spout runs in which machine or thread right?
>     Does that mean we have to take care of such situation carefully?
>     Thank you very much!!
>
>
>     On 01/04/2015 11:34 PM, Nathan Leung wrote:
>>     Every time you send a tuple, it should be a new tuple.  Secondly,
>>     when you send a tuple within the same process, the data is passed
>>     by reference, not serialized and deserialized. That means if you
>>     use local cluster, or even a remote cluster you will see that
>>     some of your data is sent by reference.  So when you send a
>>     static variable by reference and then change it, subsequent bolts
>>     will see this change.
>>
>>     Note that in remote cluster, localOrShuffleGrouping will send if
>>     at all possible within the same process, and thereby avoid
>>     network / serialization / deserialization costs.  ShuffleGrouping
>>     will send in process as well if there are any downstream bolts in
>>     the same process, because it uses round robin. FieldsGrouping is
>>     the same if your key has reasonable distribution.  It's possible
>>     that you could avoid passing by reference when using none
>>     grouping or direct grouping but I'm not really sure why you would
>>     explicitly try to avoid this.
>>
>>     Also I would note that passing then changing a static variable
>>     can be tricky; hopefully you have protected it against concurrent
>>     modification from other bolt tasks within the same process.
>>
>>     On Sun, Jan 4, 2015 at 7:18 PM, hjh <applyhhj@163.com
>>     <ma...@163.com>> wrote:
>>
>>         Hi, I am new to storm and I met a problem with tuple. In the
>>         local mode does tuple between connected bolts share the same
>>         object? For example BoltA emit a tuple to BoltB. If BoltB is
>>         processing the tuple (this tuple is assigned to a private
>>         variable, say VAR, in BoltB) and at the same time BoltA sends
>>         another tuple to BoltB, then VAR changed immediately. Does
>>         that mean in local mode BoltA and BoltB share the same tuple
>>         instance?? And how to deal with such situation?
>>
>>         PS. I use java.
>>
>>         Any suggestion is warmly welcomed
>>
>>
>>         Thank you very much!!!
>>
>>
>
>


Re: about storm tuple

Posted by Nathan Leung <nc...@gmail.com>.
Out of the box storm does not allow you to determine which machine a bolt
or spout runs on.  However, it's possible to write a custom scheduler to do
this (see
http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
for a guide).

On Sun, Jan 4, 2015 at 11:51 PM, hjh <ap...@163.com> wrote:

>  Thank you very much for the explanation. I am new to storm so I thought
> the whole topology runs like a message broker. So I design the topology
> accordingly. By the way, in most cases, we can not decide which bolt or
> spout runs in which machine or thread right? Does that mean we have to take
> care of such situation carefully? Thank you very much!!
>
>
> On 01/04/2015 11:34 PM, Nathan Leung wrote:
>
> Every time you send a tuple, it should be a new tuple.  Secondly, when you
> send a tuple within the same process, the data is passed by reference, not
> serialized and deserialized.  That means if you use local cluster, or even
> a remote cluster you will see that some of your data is sent by reference.
> So when you send a static variable by reference and then change it,
> subsequent bolts will see this change.
>
>  Note that in remote cluster, localOrShuffleGrouping will send if at all
> possible within the same process, and thereby avoid network / serialization
> / deserialization costs.  ShuffleGrouping will send in process as well if
> there are any downstream bolts in the same process, because it uses round
> robin.  FieldsGrouping is the same if your key has reasonable
> distribution.  It's possible that you could avoid passing by reference when
> using none grouping or direct grouping but I'm not really sure why you
> would explicitly try to avoid this.
>
>  Also I would note that passing then changing a static variable can be
> tricky; hopefully you have protected it against concurrent modification
> from other bolt tasks within the same process.
>
> On Sun, Jan 4, 2015 at 7:18 PM, hjh <ap...@163.com> wrote:
>
>> Hi, I am new to storm and I met a problem with tuple. In the local mode
>> does tuple between connected bolts share the same object? For example BoltA
>> emit a tuple to BoltB. If BoltB is processing the tuple (this tuple is
>> assigned to a private variable, say VAR, in BoltB) and at the same time
>> BoltA sends another tuple to BoltB, then VAR changed immediately. Does that
>> mean in local mode BoltA and BoltB share the same tuple instance?? And how
>> to deal with such situation?
>>
>> PS. I use java.
>>
>> Any suggestion is warmly welcomed
>>
>>
>> Thank you very much!!!
>>
>>
>
>

Re: about storm tuple

Posted by hjh <ap...@163.com>.
Thank you very much for the explanation. I am new to storm so I thought 
the whole topology runs like a message broker. So I design the topology 
accordingly. By the way, in most cases, we can not decide which bolt or 
spout runs in which machine or thread right? Does that mean we have to 
take care of such situation carefully? Thank you very much!!

On 01/04/2015 11:34 PM, Nathan Leung wrote:
> Every time you send a tuple, it should be a new tuple.  Secondly, when 
> you send a tuple within the same process, the data is passed by 
> reference, not serialized and deserialized.  That means if you use 
> local cluster, or even a remote cluster you will see that some of your 
> data is sent by reference.  So when you send a static variable by 
> reference and then change it, subsequent bolts will see this change.
>
> Note that in remote cluster, localOrShuffleGrouping will send if at 
> all possible within the same process, and thereby avoid network / 
> serialization / deserialization costs. ShuffleGrouping will send in 
> process as well if there are any downstream bolts in the same process, 
> because it uses round robin.  FieldsGrouping is the same if your key 
> has reasonable distribution.  It's possible that you could avoid 
> passing by reference when using none grouping or direct grouping but 
> I'm not really sure why you would explicitly try to avoid this.
>
> Also I would note that passing then changing a static variable can be 
> tricky; hopefully you have protected it against concurrent 
> modification from other bolt tasks within the same process.
>
> On Sun, Jan 4, 2015 at 7:18 PM, hjh <applyhhj@163.com 
> <ma...@163.com>> wrote:
>
>     Hi, I am new to storm and I met a problem with tuple. In the local
>     mode does tuple between connected bolts share the same object? For
>     example BoltA emit a tuple to BoltB. If BoltB is processing the
>     tuple (this tuple is assigned to a private variable, say VAR, in
>     BoltB) and at the same time BoltA sends another tuple to BoltB,
>     then VAR changed immediately. Does that mean in local mode BoltA
>     and BoltB share the same tuple instance?? And how to deal with
>     such situation?
>
>     PS. I use java.
>
>     Any suggestion is warmly welcomed
>
>
>     Thank you very much!!!
>
>


Re: about storm tuple

Posted by Nathan Leung <nc...@gmail.com>.
Every time you send a tuple, it should be a new tuple.  Secondly, when you
send a tuple within the same process, the data is passed by reference, not
serialized and deserialized.  That means if you use local cluster, or even
a remote cluster you will see that some of your data is sent by reference.
So when you send a static variable by reference and then change it,
subsequent bolts will see this change.

Note that in remote cluster, localOrShuffleGrouping will send if at all
possible within the same process, and thereby avoid network / serialization
/ deserialization costs.  ShuffleGrouping will send in process as well if
there are any downstream bolts in the same process, because it uses round
robin.  FieldsGrouping is the same if your key has reasonable
distribution.  It's possible that you could avoid passing by reference when
using none grouping or direct grouping but I'm not really sure why you
would explicitly try to avoid this.

Also I would note that passing then changing a static variable can be
tricky; hopefully you have protected it against concurrent modification
from other bolt tasks within the same process.

On Sun, Jan 4, 2015 at 7:18 PM, hjh <ap...@163.com> wrote:

> Hi, I am new to storm and I met a problem with tuple. In the local mode
> does tuple between connected bolts share the same object? For example BoltA
> emit a tuple to BoltB. If BoltB is processing the tuple (this tuple is
> assigned to a private variable, say VAR, in BoltB) and at the same time
> BoltA sends another tuple to BoltB, then VAR changed immediately. Does that
> mean in local mode BoltA and BoltB share the same tuple instance?? And how
> to deal with such situation?
>
> PS. I use java.
>
> Any suggestion is warmly welcomed
>
>
> Thank you very much!!!
>
>