You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Navin Ipe <na...@searchlighthealth.com> on 2016/04/21 07:29:16 UTC

Does Storm use Netty to access a class reference across worker JVM's?

In the below code,








*public static void main(String[] cmdArgs) {Config config = new
Config();config.setNumWorkers(5);        MongoManager mongoManager = new
MongoManager();TopologyBuilder builder = new
TopologyBuilder();builder.setSpout("someSpout", new
MongoSpout(mongoManger)));}*

Assuming there are many more spouts and blots created, I understand that each
worker will run in its own JVM
<http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
which means that it will have its own memory space.

*Questions:*
*1.* So when the mongoManager reference is passed to MongoSpout, will
MongoSpout always be able to access the initialized members of mongoManager?
*2.* Isn't it likely that main() runs in a different JVM and a MongoSpout
will be in another JVM? How would Storm access mongoManager? Using Netty?
*3.* (optional help) I have the Storm source code. Could anyone point me to
the part that Storm does the inter-worker communication for accessing class
references?

-- 
Regards,
Navin

Re: Does Storm use Netty to access a class reference across worker JVM's?

Posted by Nathan Leung <nc...@gmail.com>.
To clarify my point is that config is NOT suitable for passing database
configuration. It can be used to pass database connection configuration
though (eg host/port values).
On Apr 22, 2016 6:41 AM, "Navin Ipe" <na...@searchlighthealth.com>
wrote:

> Thank you very much for your time and help, Nathan and John.
> Followed up on serialization, and it looks like everything can be
> serialized: http://stackoverflow.com/a/16851174/453673
> Will verify the database connection serialization also during
> implementation.
>
> On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> mongoManager is serialized and sent to your spout.  If it's not something
>> that's easily serializable (e.g. a database connection) then you will need
>> to initialize it in spout prepare() instead of the constructor.
>>
>> On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe <
>> navin.ipe@searchlighthealth.com> wrote:
>>
>>> Thanks John, but that's odd...in the code I shared, there's a reference
>>> to mongoManager being used in the Spout (the MongoSpout internally stores a
>>> reference to mongoManager). If there are no object references shared
>>> between executors, then when the topology I created is submitted to Storm,
>>> would Storm serialize or clone and store an instance of mongoManager (and
>>> all initialized values inside it) inside the Spout? Storm would surely have
>>> to do *something *to ensure that references aren't cut off when workers
>>> operate in different JVM's...
>>>
>>> On Thu, Apr 21, 2016 at 4:45 PM, <ho...@gmail.com> wrote:
>>>
>>>> Netty is used for communication between workers and then the LMAX
>>>> disruptor queue is used to route messages between Netty and the individual
>>>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct
>>>> object references shared between executors because all executors
>>>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX
>>>> (localIrShuffleGrouping).
>>>>
>>>> --John
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <na...@searchlighthealth.com>
>>>> wrote:
>>>>
>>>> In the below code,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *public static void main(String[] cmdArgs) {Config config = new
>>>> Config();config.setNumWorkers(5);        MongoManager mongoManager = new
>>>> MongoManager();TopologyBuilder builder = new
>>>> TopologyBuilder();builder.setSpout("someSpout", new
>>>> MongoSpout(mongoManger)));}*
>>>>
>>>> Assuming there are many more spouts and blots created, I understand
>>>> that each worker will run in its own JVM
>>>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
>>>> which means that it will have its own memory space.
>>>>
>>>> *Questions:*
>>>> *1.* So when the mongoManager reference is passed to MongoSpout, will
>>>> MongoSpout always be able to access the initialized members of mongoManager?
>>>> *2.* Isn't it likely that main() runs in a different JVM and a
>>>> MongoSpout will be in another JVM? How would Storm access mongoManager?
>>>> Using Netty?
>>>> *3.* (optional help) I have the Storm source code. Could anyone point
>>>> me to the part that Storm does the inter-worker communication for accessing
>>>> class references?
>>>>
>>>> --
>>>> Regards,
>>>> Navin
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>
>>
>
>
> --
> Regards,
> Navin
>

Re: Does Storm use Netty to access a class reference across worker JVM's?

Posted by Navin Ipe <na...@searchlighthealth.com>.
Thank you very much for your time and help, Nathan and John.
Followed up on serialization, and it looks like everything can be
serialized: http://stackoverflow.com/a/16851174/453673
Will verify the database connection serialization also during
implementation.

On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung <nc...@gmail.com> wrote:

> mongoManager is serialized and sent to your spout.  If it's not something
> that's easily serializable (e.g. a database connection) then you will need
> to initialize it in spout prepare() instead of the constructor.
>
> On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe <
> navin.ipe@searchlighthealth.com> wrote:
>
>> Thanks John, but that's odd...in the code I shared, there's a reference
>> to mongoManager being used in the Spout (the MongoSpout internally stores a
>> reference to mongoManager). If there are no object references shared
>> between executors, then when the topology I created is submitted to Storm,
>> would Storm serialize or clone and store an instance of mongoManager (and
>> all initialized values inside it) inside the Spout? Storm would surely have
>> to do *something *to ensure that references aren't cut off when workers
>> operate in different JVM's...
>>
>> On Thu, Apr 21, 2016 at 4:45 PM, <ho...@gmail.com> wrote:
>>
>>> Netty is used for communication between workers and then the LMAX
>>> disruptor queue is used to route messages between Netty and the individual
>>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct
>>> object references shared between executors because all executors
>>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX
>>> (localIrShuffleGrouping).
>>>
>>> --John
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <na...@searchlighthealth.com>
>>> wrote:
>>>
>>> In the below code,
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *public static void main(String[] cmdArgs) {Config config = new
>>> Config();config.setNumWorkers(5);        MongoManager mongoManager = new
>>> MongoManager();TopologyBuilder builder = new
>>> TopologyBuilder();builder.setSpout("someSpout", new
>>> MongoSpout(mongoManger)));}*
>>>
>>> Assuming there are many more spouts and blots created, I understand that each
>>> worker will run in its own JVM
>>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
>>> which means that it will have its own memory space.
>>>
>>> *Questions:*
>>> *1.* So when the mongoManager reference is passed to MongoSpout, will
>>> MongoSpout always be able to access the initialized members of mongoManager?
>>> *2.* Isn't it likely that main() runs in a different JVM and a
>>> MongoSpout will be in another JVM? How would Storm access mongoManager?
>>> Using Netty?
>>> *3.* (optional help) I have the Storm source code. Could anyone point
>>> me to the part that Storm does the inter-worker communication for accessing
>>> class references?
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>
>


-- 
Regards,
Navin

Re: Does Storm use Netty to access a class reference across worker JVM's?

Posted by Nathan Leung <nc...@gmail.com>.
mongoManager is serialized and sent to your spout.  If it's not something
that's easily serializable (e.g. a database connection) then you will need
to initialize it in spout prepare() instead of the constructor.

On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe <na...@searchlighthealth.com>
wrote:

> Thanks John, but that's odd...in the code I shared, there's a reference to
> mongoManager being used in the Spout (the MongoSpout internally stores a
> reference to mongoManager). If there are no object references shared
> between executors, then when the topology I created is submitted to Storm,
> would Storm serialize or clone and store an instance of mongoManager (and
> all initialized values inside it) inside the Spout? Storm would surely have
> to do *something *to ensure that references aren't cut off when workers
> operate in different JVM's...
>
> On Thu, Apr 21, 2016 at 4:45 PM, <ho...@gmail.com> wrote:
>
>> Netty is used for communication between workers and then the LMAX
>> disruptor queue is used to route messages between Netty and the individual
>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct
>> object references shared between executors because all executors
>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX
>> (localIrShuffleGrouping).
>>
>> --John
>>
>> Sent from my iPhone
>>
>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <na...@searchlighthealth.com>
>> wrote:
>>
>> In the below code,
>>
>>
>>
>>
>>
>>
>>
>>
>> *public static void main(String[] cmdArgs) {Config config = new
>> Config();config.setNumWorkers(5);        MongoManager mongoManager = new
>> MongoManager();TopologyBuilder builder = new
>> TopologyBuilder();builder.setSpout("someSpout", new
>> MongoSpout(mongoManger)));}*
>>
>> Assuming there are many more spouts and blots created, I understand that each
>> worker will run in its own JVM
>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
>> which means that it will have its own memory space.
>>
>> *Questions:*
>> *1.* So when the mongoManager reference is passed to MongoSpout, will
>> MongoSpout always be able to access the initialized members of mongoManager?
>> *2.* Isn't it likely that main() runs in a different JVM and a
>> MongoSpout will be in another JVM? How would Storm access mongoManager?
>> Using Netty?
>> *3.* (optional help) I have the Storm source code. Could anyone point me
>> to the part that Storm does the inter-worker communication for accessing
>> class references?
>>
>> --
>> Regards,
>> Navin
>>
>>
>
>
> --
> Regards,
> Navin
>

Re: Does Storm use Netty to access a class reference across worker JVM's?

Posted by Navin Ipe <na...@searchlighthealth.com>.
Thanks John, but that's odd...in the code I shared, there's a reference to
mongoManager being used in the Spout (the MongoSpout internally stores a
reference to mongoManager). If there are no object references shared
between executors, then when the topology I created is submitted to Storm,
would Storm serialize or clone and store an instance of mongoManager (and
all initialized values inside it) inside the Spout? Storm would surely have
to do *something *to ensure that references aren't cut off when workers
operate in different JVM's...

On Thu, Apr 21, 2016 at 4:45 PM, <ho...@gmail.com> wrote:

> Netty is used for communication between workers and then the LMAX
> disruptor queue is used to route messages between Netty and the individual
> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct
> object references shared between executors because all executors
> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX
> (localIrShuffleGrouping).
>
> --John
>
> Sent from my iPhone
>
> On Apr 21, 2016, at 1:29 AM, Navin Ipe <na...@searchlighthealth.com>
> wrote:
>
> In the below code,
>
>
>
>
>
>
>
>
> *public static void main(String[] cmdArgs) {Config config = new
> Config();config.setNumWorkers(5);        MongoManager mongoManager = new
> MongoManager();TopologyBuilder builder = new
> TopologyBuilder();builder.setSpout("someSpout", new
> MongoSpout(mongoManger)));}*
>
> Assuming there are many more spouts and blots created, I understand that each
> worker will run in its own JVM
> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>,
> which means that it will have its own memory space.
>
> *Questions:*
> *1.* So when the mongoManager reference is passed to MongoSpout, will
> MongoSpout always be able to access the initialized members of mongoManager?
> *2.* Isn't it likely that main() runs in a different JVM and a MongoSpout
> will be in another JVM? How would Storm access mongoManager? Using Netty?
> *3.* (optional help) I have the Storm source code. Could anyone point me
> to the part that Storm does the inter-worker communication for accessing
> class references?
>
> --
> Regards,
> Navin
>
>


-- 
Regards,
Navin

Re: Does Storm use Netty to access a class reference across worker JVM's?

Posted by ho...@gmail.com.
Netty is used for communication between workers and then the LMAX disruptor queue is used to route messages between Netty and the individual executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct object references shared between executors because all executors communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX (localIrShuffleGrouping).

--John

Sent from my iPhone

> On Apr 21, 2016, at 1:29 AM, Navin Ipe <na...@searchlighthealth.com> wrote:
> 
> In the below code, 
> 
> public static void main(String[] cmdArgs) {
> Config config = new Config();
> config.setNumWorkers(5);        
> MongoManager mongoManager = new MongoManager();
> 
> TopologyBuilder builder = new TopologyBuilder();
> builder.setSpout("someSpout", new MongoSpout(mongoManger)));
> }
> 
> Assuming there are many more spouts and blots created, I understand that each worker will run in its own JVM, which means that it will have its own memory space. 
> 
> Questions:
> 1. So when the mongoManager reference is passed to MongoSpout, will MongoSpout always be able to access the initialized members of mongoManager?
> 2. Isn't it likely that main() runs in a different JVM and a MongoSpout will be in another JVM? How would Storm access mongoManager? Using Netty?
> 3. (optional help) I have the Storm source code. Could anyone point me to the part that Storm does the inter-worker communication for accessing class references?
> 
> -- 
> Regards,
> Navin