You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Simon Cooper <si...@featurespace.co.uk> on 2014/01/27 14:14:21 UTC

Threading and partitioning on trident bolts

I've been looking at the trident code. If the following code is run on the same bolt:

TridentStream input = ...

TridentStream firstSection = input.each(new Fields(...), new FirstFilter());
TridentStream secondSection = input.each(new Fields(...), new SecondFilter());

>From looking at FreshCollector and AppendCollector, it looks like the two filters will be run sequentially. Are there plans/thoughts on involving threading when a stream splits like this? Running each split on a new thread in parallel? There may be some complications when joining streams together, but that's a simple solution that a couple of locks on the MultiReducerProcessor would solve. Has this been considered before?

SimonC

RE: Threading and partitioning on trident bolts

Posted by Simon Cooper <si...@featurespace.co.uk>.
I'm considering a situation in which a stream is split in two, as below. At the end of each stream is a statePersist that puts the tuple onto an external queue, and each statePersist output has very low-latency requirements wrt the input. If there's no changes in partitioning or parallelism, trident will put both streams onto the same bolt, which will delay the statePersist on the second stream until the first one is complete. Running both streams in separate threads will fix this.

I don't understand your point about synchronization around OutputCollector; the wiki Concepts page says OutputCollector is thread safe? Some locking on stream joins would be required, but that's easy to do by protecting calls to the MultiReducer with a lock in MultiReducerProcessor.

Unless there's some way of forcing trident to start a new bolt without changing the partitioning or parallelism that I've missed?

SimonC

From: nathan.marz@gmail.com [mailto:nathan.marz@gmail.com] On Behalf Of Nathan Marz
Sent: 27 January 2014 20:11
To: user@storm.incubator.apache.org
Subject: Re: Threading and partitioning on trident bolts

Correct, they will be run sequentially within the bolt. Threading within a bolt adds a ton of complication (needing to synchronize around output collector, for instance), so that's really bad. What are you trying to achieve by adding more threading? You won't get better resource usage because the rest of the topology is parallelized. You might get better latency, but in that case it would be better to just work on scheduler improvements to accomplish that and have the "split" tasks run as separate tasks within the same worker.

On Mon, Jan 27, 2014 at 5:14 AM, Simon Cooper <si...@featurespace.co.uk>> wrote:
I've been looking at the trident code. If the following code is run on the same bolt:

TridentStream input = ...

TridentStream firstSection = input.each(new Fields(...), new FirstFilter());
TridentStream secondSection = input.each(new Fields(...), new SecondFilter());

>From looking at FreshCollector and AppendCollector, it looks like the two filters will be run sequentially. Are there plans/thoughts on involving threading when a stream splits like this? Running each split on a new thread in parallel? There may be some complications when joining streams together, but that's a simple solution that a couple of locks on the MultiReducerProcessor would solve. Has this been considered before?

SimonC



--
Twitter: @nathanmarz
http://nathanmarz.com<http://nathanmarz.com/>

Re: Threading and partitioning on trident bolts

Posted by Nathan Marz <na...@nathanmarz.com>.
Correct, they will be run sequentially within the bolt. Threading within a
bolt adds a ton of complication (needing to synchronize around output
collector, for instance), so that's really bad. What are you trying to
achieve by adding more threading? You won't get better resource usage
because the rest of the topology is parallelized. You might get better
latency, but in that case it would be better to just work on scheduler
improvements to accomplish that and have the "split" tasks run as separate
tasks within the same worker.


On Mon, Jan 27, 2014 at 5:14 AM, Simon Cooper <
simon.cooper@featurespace.co.uk> wrote:

>  I’ve been looking at the trident code. If the following code is run on
> the same bolt:
>
>
>
> TridentStream input = …
>
>
>
> TridentStream firstSection = input.each(new Fields(…), new FirstFilter());
>
> TridentStream secondSection = input.each(new Fields(…), new
> SecondFilter());
>
>
>
> From looking at FreshCollector and AppendCollector, it looks like the two
> filters will be run sequentially. Are there plans/thoughts on involving
> threading when a stream splits like this? Running each split on a new
> thread in parallel? There may be some complications when joining streams
> together, but that’s a simple solution that a couple of locks on the
> MultiReducerProcessor would solve. Has this been considered before?
>
>
>
> SimonC
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com