You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Abhishek Raj <ab...@outlook.com> on 2015/07/01 21:15:54 UTC

Use case of localOrShuffleGrouping?

I understand the difference between "shuffle grouping" and "local or shuffle grouping". But I haven't yet come across any instances where I would prefer using localOrShuffleGrouping instead of shuffleGrouping. The way I see it, shuffleGrouping helps increasing the concurrency if the tasks of a given component are distributed across several workers.
Could anyone throw some light on why the additional grouping method "localOrShuffleGrouping" was created and probably some use cases?
Thanks!Abhishek. 		 	   		  

Re: Use case of localOrShuffleGrouping?

Posted by Nathan Leung <nc...@gmail.com>.
If your bolts are evenly spread and there is at least one on each worker
using localOrShuffleGrouping can be a huge optimization.

Consider this example.  You have 10 bolt A tasks, 10 bolt B tasks, and 10
workers.  Each worker has 1 of each task.  Bolt b subscribes to bolt a.  In
shuffle grouping, you have 1/10 chance that any given tuple will stay
within the worker.  90% of your inter-bolt traffic must be serialized, sent
over the network, and deserialized.  With localOrShuffleGrouping, that
number is 0%.

On Wed, Jul 1, 2015 at 3:15 PM, Abhishek Raj <ab...@outlook.com>
wrote:

> I understand the difference between "shuffle grouping" and "local or
> shuffle grouping". But I haven't yet come across any instances where I
> would prefer using localOrShuffleGrouping instead of shuffleGrouping. The
> way I see it, shuffleGrouping helps increasing the concurrency if the tasks
> of a given component are distributed across several workers.
>
> Could anyone throw some light on why the additional grouping method
> "localOrShuffleGrouping" was created and probably some use cases?
>
> Thanks!
> Abhishek.
>