You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by David Crossland <da...@elastacloud.com> on 2014/03/16 12:59:55 UTC
Advice
Hi, I have a 3 node cluster 2 medium, 1 small instances (I'm probably going to up this to a medium). 10 Cores total. My main bottleneck is a service bus which has approx. 3.5mil json string messages published to it in a day. I don't seem to be consuming messages at a fast enough rate.
Ive tried modifying the parallelism hint to a number of values, I've tried 8/20/64/128.. all pretty much stabs in the dark.
I'm looking for some advice as to how to configure this in my environment. I assume there would be some relationship between the number of cores and the amount of parallelism I should specify that would ensure best performance and throughput.
I wonder also how the number of worker roles fits into this. Again, I'm taking a bit of a stab in choosing 12, one for each slot associated with the supervisors.
Any pointers you can give me would be appreciated.
Thanks
David
Re: Advice
Posted by David Crossland <da...@elastacloud.com>.
Hi,
Thanks for the reply. I played with the topology and I seem to have 'fixed' it. Still, any clarification you can make on this topic would be very useful to me. It would be good to have a definitive way to calculate how many workers/Executors/Tasks are most efficient given a set of cores.
I'm rereading the documentation to try and get a clearer picture too.
Regards
David
From: Lajos<ma...@protulae.com>
Sent: ?Sunday?, ?16? ?March? ?2014 ?13?:?31
To: user@storm.incubator.apache.org<ma...@storm.incubator.apache.org>
Hi David,
As general advice, you would want one thread per core. You have to then
divide up the threads according to what components have more work to do.
It seems mysterious, but just requires that you first understand what
your components are doing and then do some testing with stats collection.
For example, I've found that a 1:2 ratio works best in some of my
topologies, i.e. the parallelism of the spout is say 2, but that for the
bolt that is doing the work is 4.
Regarding worker processes versus threads, to be honest I haven't yet
seen enough data to say which is more important. At the end of the day,
you just want as little CPU contention as possible for the guys doing
more of the work.
I will post something on my site on this topic in the next day or so;
I'll reply back on this thread when I do.
Cheers,
Lajos
theconsultantcto.com
Enterprise Lucene/Solr
On 16/03/2014 12:59, David Crossland wrote:
> Hi, I have a 3 node cluster 2 medium, 1 small instances (I'm probably
> going to up this to a medium). 10 Cores total. My main bottleneck is a
> service bus which has approx. 3.5mil json string messages published to
> it in a day. I don't seem to be consuming messages at a fast enough rate.
>
> Ive tried modifying the parallelism hint to a number of values, I've
> tried 8/20/64/128.. all pretty much stabs in the dark.
>
> I'm looking for some advice as to how to configure this in my
> environment. I assume there would be some relationship between the
> number of cores and the amount of parallelism I should specify that
> would ensure best performance and throughput.
>
> I wonder also how the number of worker roles fits into this. Again, I'm
> taking a bit of a stab in choosing 12, one for each slot associated with
> the supervisors.
>
> Any pointers you can give me would be appreciated.
>
> Thanks
> David
Re: Advice
Posted by Lajos <la...@protulae.com>.
Hi David,
As general advice, you would want one thread per core. You have to then
divide up the threads according to what components have more work to do.
It seems mysterious, but just requires that you first understand what
your components are doing and then do some testing with stats collection.
For example, I've found that a 1:2 ratio works best in some of my
topologies, i.e. the parallelism of the spout is say 2, but that for the
bolt that is doing the work is 4.
Regarding worker processes versus threads, to be honest I haven't yet
seen enough data to say which is more important. At the end of the day,
you just want as little CPU contention as possible for the guys doing
more of the work.
I will post something on my site on this topic in the next day or so;
I'll reply back on this thread when I do.
Cheers,
Lajos
theconsultantcto.com
Enterprise Lucene/Solr
On 16/03/2014 12:59, David Crossland wrote:
> Hi, I have a 3 node cluster 2 medium, 1 small instances (I'm probably
> going to up this to a medium). 10 Cores total. My main bottleneck is a
> service bus which has approx. 3.5mil json string messages published to
> it in a day. I don't seem to be consuming messages at a fast enough rate.
>
> Ive tried modifying the parallelism hint to a number of values, I've
> tried 8/20/64/128.. all pretty much stabs in the dark.
>
> I'm looking for some advice as to how to configure this in my
> environment. I assume there would be some relationship between the
> number of cores and the amount of parallelism I should specify that
> would ensure best performance and throughput.
>
> I wonder also how the number of worker roles fits into this. Again, I'm
> taking a bit of a stab in choosing 12, one for each slot associated with
> the supervisors.
>
> Any pointers you can give me would be appreciated.
>
> Thanks
> David