You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Tyson Norris <tn...@adobe.com> on 2014/03/28 18:44:05 UTC

bolts stop receiving when numWorkers > 1

Hi - 
I am using storm 0.9.1-incubating, and got a topology working locally that I am interesting in testing scaling with multiple nodes on. 

However, I when I set the number of workers on the topology config > 1, the some bolts stop receiving input. 

For example in my topology:
        builder.setBolt("testBolt1", new TestRichBolt(1), 10)             
                .fieldsGrouping(counterId, new Fields("rsid", "word"));   
        builder.setBolt("testBolt2", new TestRichBolt(2), 10)             
                .fieldsGrouping("testBolt1", new Fields("rsid", "word")); 

Which works fine when number of workers = 1, and when number of workers = 3, then testBolt2 does not receive any input. 

Any advice is appreciated. 

Thanks
Tyson

Re: bolts stop receiving when numWorkers > 1

Posted by Tyson Norris <tn...@adobe.com>.
More info on this:
I’ve been testing numerous combinations, and my current best guess is that it is working only when my bolts are assigned to the same worker (topology set to use 3 workers; single supervisor has 7 slots).

In a very simple case, with:
- testSpout on worker 6703
- testBolt1 on workers 6703, 6706
- testBolt2 on 6705, 6706

I see testBolt is receiving data on 6703

I see testBolt2 is not receiving anything, although running with single worker works consistently (which I think means that my component ids and stream ids are proper). I am testing with just String values, so I don’t think should be a serialization issue.

I have noticed that redeploying the topology with minor changes (bolt name, stream name changes) sometimes makes it work, I believe when the bolts end up running on the same worker.

Any ideas how to troubleshoot the route a tuple takes when emitting from a bolt then disappearing at boundary to another worker (when it should be getting routed to another bolt)?

Thanks
Tyson



On Mar 28, 2014, at 2:15 PM, Tyson Norris <tn...@adobe.com>> wrote:

Hi -
I see the same problem when running a single node with nimbus+supervisor as well as multiple nodes with extra supervisors.

Thanks
Tyson
On Mar 28, 2014, at 12:41 PM, Mikhail Davidov <si...@gmail.com>> wrote:

Tyson,

Are you running the topology with numWorkers > 1 locally or across several hosts?  I've had this happen when iptables were blocking workers from talking to each other on their assigned ports. There are no errors logged about this condition and the topology just seems to hang.


On Fri, Mar 28, 2014 at 11:49 AM, Tyson Norris <tn...@adobe.com>> wrote:
Some more info on this:
I did run into this bug: https://issues.apache.org/jira/browse/STORM-187
and tried the same on a source build of 0.9.2-incubating-SNAPSHOT
and have the same problem, I am guessing it is unrelated.

I don’t see any exception or other interesting items in the logs, just no indication of my bolts getting input (nothing shows in logs and no input showing in storm ui for the bolt that works fine with single worker).

I see some slightly different behavior depending on which grouping I use:
.fieldsGrouping(counterId, new Fields("rsid", "word")); //DOES NOT work (no input received)
.fieldsGrouping(counterId, new Fields("word")); //DOES work
.fieldsGrouping(counterId, new Fields(“rsid")); //DOES work

Is there some problem with using fieldsGrouping with multiple fields from the same component AND having multiple workers?

Should this instead be accomplished via:
.fieldsGrouping(counterId, new Fields("word”)).fieldsGrouping(counterId, new Fields(“rsid”));
(which seems to work, afaict)

My goal is to have each unique combination of word+rsid sent to the same task, and I thought that .fieldsGrouping(counterId, new Fields("rsid", "word")); would be sufficient.

Thanks
Tyson


On Mar 28, 2014, at 10:44 AM, Tyson Norris <tn...@adobe.com>> wrote:

Hi -
I am using storm 0.9.1-incubating, and got a topology working locally that I am interesting in testing scaling with multiple nodes on.

However, I when I set the number of workers on the topology config > 1, the some bolts stop receiving input.

For example in my topology:
       builder.setBolt("testBolt1", new TestRichBolt(1), 10)
               .fieldsGrouping(counterId, new Fields("rsid", "word"));
       builder.setBolt("testBolt2", new TestRichBolt(2), 10)
               .fieldsGrouping("testBolt1", new Fields("rsid", "word"));

Which works fine when number of workers = 1, and when number of workers = 3, then testBolt2 does not receive any input.

Any advice is appreciated.

Thanks
Tyson





Re: bolts stop receiving when numWorkers > 1

Posted by Tyson Norris <tn...@adobe.com>.
Hi -
I see the same problem when running a single node with nimbus+supervisor as well as multiple nodes with extra supervisors.

Thanks
Tyson
On Mar 28, 2014, at 12:41 PM, Mikhail Davidov <si...@gmail.com>> wrote:

Tyson,

Are you running the topology with numWorkers > 1 locally or across several hosts?  I've had this happen when iptables were blocking workers from talking to each other on their assigned ports. There are no errors logged about this condition and the topology just seems to hang.


On Fri, Mar 28, 2014 at 11:49 AM, Tyson Norris <tn...@adobe.com>> wrote:
Some more info on this:
I did run into this bug: https://issues.apache.org/jira/browse/STORM-187
and tried the same on a source build of 0.9.2-incubating-SNAPSHOT
and have the same problem, I am guessing it is unrelated.

I don’t see any exception or other interesting items in the logs, just no indication of my bolts getting input (nothing shows in logs and no input showing in storm ui for the bolt that works fine with single worker).

I see some slightly different behavior depending on which grouping I use:
.fieldsGrouping(counterId, new Fields("rsid", "word")); //DOES NOT work (no input received)
.fieldsGrouping(counterId, new Fields("word")); //DOES work
.fieldsGrouping(counterId, new Fields(“rsid")); //DOES work

Is there some problem with using fieldsGrouping with multiple fields from the same component AND having multiple workers?

Should this instead be accomplished via:
.fieldsGrouping(counterId, new Fields("word”)).fieldsGrouping(counterId, new Fields(“rsid”));
(which seems to work, afaict)

My goal is to have each unique combination of word+rsid sent to the same task, and I thought that .fieldsGrouping(counterId, new Fields("rsid", "word")); would be sufficient.

Thanks
Tyson


On Mar 28, 2014, at 10:44 AM, Tyson Norris <tn...@adobe.com>> wrote:

Hi -
I am using storm 0.9.1-incubating, and got a topology working locally that I am interesting in testing scaling with multiple nodes on.

However, I when I set the number of workers on the topology config > 1, the some bolts stop receiving input.

For example in my topology:
       builder.setBolt("testBolt1", new TestRichBolt(1), 10)
               .fieldsGrouping(counterId, new Fields("rsid", "word"));
       builder.setBolt("testBolt2", new TestRichBolt(2), 10)
               .fieldsGrouping("testBolt1", new Fields("rsid", "word"));

Which works fine when number of workers = 1, and when number of workers = 3, then testBolt2 does not receive any input.

Any advice is appreciated.

Thanks
Tyson




Re: bolts stop receiving when numWorkers > 1

Posted by Mikhail Davidov <si...@gmail.com>.
Tyson,

Are you running the topology with numWorkers > 1 locally or across several
hosts?  I've had this happen when iptables were blocking workers from
talking to each other on their assigned ports. There are no errors logged
about this condition and the topology just seems to hang.


On Fri, Mar 28, 2014 at 11:49 AM, Tyson Norris <tn...@adobe.com> wrote:

>  Some more info on this:
> I did run into this bug: https://issues.apache.org/jira/browse/STORM-187
> and tried the same on a source build of 0.9.2-incubating-SNAPSHOT
> and have the same problem, I am guessing it is unrelated.
>
>  I don't see any exception or other interesting items in the logs, just
> no indication of my bolts getting input (nothing shows in logs and no input
> showing in storm ui for the bolt that works fine with single worker).
>
>  I see some slightly different behavior depending on which grouping I use:
> .fieldsGrouping(counterId, new Fields("rsid", "word")); //DOES NOT work
> (no input received)
> .fieldsGrouping(counterId, new Fields("word")); //DOES work
> .fieldsGrouping(counterId, new Fields("rsid")); //DOES work
>
>  Is there some problem with using fieldsGrouping with multiple fields
> from the same component AND having multiple workers?
>
>  Should this instead be accomplished via:
> .fieldsGrouping(counterId, new Fields("word")).fieldsGrouping(counterId,
> new Fields("rsid"));
> (which seems to work, afaict)
>
>  My goal is to have each unique combination of word+rsid sent to the same
> task, and I thought that .fieldsGrouping(counterId, new Fields("rsid",
> "word")); would be sufficient.
>
>  Thanks
> Tyson
>
>
>  On Mar 28, 2014, at 10:44 AM, Tyson Norris <tn...@adobe.com> wrote:
>
> Hi -
> I am using storm 0.9.1-incubating, and got a topology working locally that
> I am interesting in testing scaling with multiple nodes on.
>
> However, I when I set the number of workers on the topology config > 1,
> the some bolts stop receiving input.
>
> For example in my topology:
>        builder.setBolt("testBolt1", new TestRichBolt(1), 10)
>                .fieldsGrouping(counterId, new Fields("rsid", "word"));
>        builder.setBolt("testBolt2", new TestRichBolt(2), 10)
>                .fieldsGrouping("testBolt1", new Fields("rsid", "word"));
>
> Which works fine when number of workers = 1, and when number of workers =
> 3, then testBolt2 does not receive any input.
>
> Any advice is appreciated.
>
> Thanks
> Tyson
>
>
>

Re: bolts stop receiving when numWorkers > 1

Posted by Tyson Norris <tn...@adobe.com>.
Some more info on this:
I did run into this bug: https://issues.apache.org/jira/browse/STORM-187
and tried the same on a source build of 0.9.2-incubating-SNAPSHOT
and have the same problem, I am guessing it is unrelated.

I don’t see any exception or other interesting items in the logs, just no indication of my bolts getting input (nothing shows in logs and no input showing in storm ui for the bolt that works fine with single worker).

I see some slightly different behavior depending on which grouping I use:
.fieldsGrouping(counterId, new Fields("rsid", "word")); //DOES NOT work (no input received)
.fieldsGrouping(counterId, new Fields("word")); //DOES work
.fieldsGrouping(counterId, new Fields(“rsid")); //DOES work

Is there some problem with using fieldsGrouping with multiple fields from the same component AND having multiple workers?

Should this instead be accomplished via:
.fieldsGrouping(counterId, new Fields("word”)).fieldsGrouping(counterId, new Fields(“rsid”));
(which seems to work, afaict)

My goal is to have each unique combination of word+rsid sent to the same task, and I thought that .fieldsGrouping(counterId, new Fields("rsid", "word")); would be sufficient.

Thanks
Tyson


On Mar 28, 2014, at 10:44 AM, Tyson Norris <tn...@adobe.com>> wrote:

Hi -
I am using storm 0.9.1-incubating, and got a topology working locally that I am interesting in testing scaling with multiple nodes on.

However, I when I set the number of workers on the topology config > 1, the some bolts stop receiving input.

For example in my topology:
       builder.setBolt("testBolt1", new TestRichBolt(1), 10)
               .fieldsGrouping(counterId, new Fields("rsid", "word"));
       builder.setBolt("testBolt2", new TestRichBolt(2), 10)
               .fieldsGrouping("testBolt1", new Fields("rsid", "word"));

Which works fine when number of workers = 1, and when number of workers = 3, then testBolt2 does not receive any input.

Any advice is appreciated.

Thanks
Tyson