You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Jan Sipke van der Veen <ja...@gmail.com> on 2014/10/07 11:44:48 UTC

Re: Automatic throttling of spout

Thanks for your reply Nathan.

I have experimented a bit more with this topology and have now set the
topology.max.spout.pending to a fixed value that makes sense for the
topology. I still see some strange behavior however.

The cluster consists of four Storm worker nodes with two processor cores
each. The topology is created with these settings:

config.setMaxSpoutPending(12);
config.setNumWorkers(12);
builder.setSpout("elasticspout", new ElasticSpout(), 1).setNumTasks(1);
builder.setBolt("elasticbolt", new ElasticBolt(),
24).setNumTasks(24).shuffleGrouping("elasticspout");

In the attached screenshot you can see that at first the CPU load of all
four workers is high (both cores are combined into the metric, so 1 means
that all cores are busy). But after a few minutes, the load drops on all
worker nodes because the number of emitted tuples is lower. The queue that
the spout fetches data from is filled with plenty of data, so there is no
throttling there.

Best regards,
Jan Sipke van der Veen

On Tue, Sep 30, 2014 at 5:26 PM, Nathan Leung <nc...@gmail.com> wrote:

> if you set topology.max.spout.pending and use reliable tuples then the
> spout will automatically throttle itself when its output queue grows to the
> configured size.
>
> On Tue, Sep 30, 2014 at 11:04 AM, Jan Sipke van der Veen <
> jansipke@gmail.com> wrote:
>
>> Hello,
>>
>> I am using a simple topology with a single spout and a single bolt to
>> test some ideas about automatically scaling the number of worker nodes in a
>> Storm cluster. The bolt is set up to use some processor time and the spout
>> sends out tuples at a rate which is slightly higher than the bolt can
>> process.
>>
>> The number of emitted tuples is indeed higher than the number of acked
>> tuples and some time later there are some failed tuples. Just what I
>> expected. However, after about 5 to 10 minutes, it seems that nextTuple()
>> isn't called as often as before and the number of emitted tuples drops to a
>> level that the bolt is able to keep up with.
>>
>> Is there some sort of automatic throttling of spouts that I'm not aware
>> of?
>>
>> Best regards,
>> Jan Sipke van der Veen
>>
>
>

Re: Automatic throttling of spout

Posted by Nathan Leung <nc...@gmail.com>.

12 is very low for max spout pending.  this means that the spout can have
at most 12 tuples that are un-acknowledged in the topology at a time.  With
24 elastic bolts, you cannot even keep all your bolts busy simultaneously.
While this may be OK (your bolts outnumber your CPU cores), if you are
using a large number of bolts to hide something like network latency, then
this is not good.  A general rule of thumb is to start with a value of 1024
and go up or down from there depending on the needs of your topology.  If
the value is too low, you will throttle your topology.  If the value is too
high, the overall latency of your topology will increase.

On Tue, Oct 7, 2014 at 5:44 AM, Jan Sipke van der Veen <ja...@gmail.com>
wrote:

> Thanks for your reply Nathan.
>
> I have experimented a bit more with this topology and have now set the
> topology.max.spout.pending to a fixed value that makes sense for the
> topology. I still see some strange behavior however.
>
> The cluster consists of four Storm worker nodes with two processor cores
> each. The topology is created with these settings:
>
> config.setMaxSpoutPending(12);
> config.setNumWorkers(12);
> builder.setSpout("elasticspout", new ElasticSpout(), 1).setNumTasks(1);
> builder.setBolt("elasticbolt", new ElasticBolt(),
> 24).setNumTasks(24).shuffleGrouping("elasticspout");
>
> In the attached screenshot you can see that at first the CPU load of all
> four workers is high (both cores are combined into the metric, so 1 means
> that all cores are busy). But after a few minutes, the load drops on all
> worker nodes because the number of emitted tuples is lower. The queue that
> the spout fetches data from is filled with plenty of data, so there is no
> throttling there.
>
> Best regards,
> Jan Sipke van der Veen
>
> On Tue, Sep 30, 2014 at 5:26 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> if you set topology.max.spout.pending and use reliable tuples then the
>> spout will automatically throttle itself when its output queue grows to the
>> configured size.
>>
>> On Tue, Sep 30, 2014 at 11:04 AM, Jan Sipke van der Veen <
>> jansipke@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am using a simple topology with a single spout and a single bolt to
>>> test some ideas about automatically scaling the number of worker nodes in a
>>> Storm cluster. The bolt is set up to use some processor time and the spout
>>> sends out tuples at a rate which is slightly higher than the bolt can
>>> process.
>>>
>>> The number of emitted tuples is indeed higher than the number of acked
>>> tuples and some time later there are some failed tuples. Just what I
>>> expected. However, after about 5 to 10 minutes, it seems that nextTuple()
>>> isn't called as often as before and the number of emitted tuples drops to a
>>> level that the bolt is able to keep up with.
>>>
>>> Is there some sort of automatic throttling of spouts that I'm not aware
>>> of?
>>>
>>> Best regards,
>>> Jan Sipke van der Veen
>>>
>>
>>
>