You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by AMir Firouzi <fi...@gmail.com> on 2017/08/01 09:44:37 UTC

possible to have supervisors without _eventlogger and _acker tasks

hi guys
i'm working on my own scheduler for storm. i wonder what happens if i
create a worker process and put some tasks in it(bolt/spout tasks) but no
_eventlogger and _acker tasks. what happens? is it a problem? tuples
transferred/emitted from within tasks in this worker will be skipped or
they just use another _acker or _loggers in other workers?

thanks in advance

Re: possible to have supervisors without _eventlogger and _acker tasks

Posted by AMir Firouzi <fi...@gmail.com>.
thanks again Bobby,
that's exactly what i'm doing right now. i try to schedule the components
in a way that reduces network latency and after a while based on resource
usage scheduler tries to make wiser decisions.

On Thu, Aug 3, 2017 at 7:49 PM Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> It should, especially for the ackers.
> The ackers receive lots and lots of small messages, and those messages
> come from all over your topology.  What is more if you have
> max.spout.pending set how quickly the messages can get to them and back to
> the spouts determines the throughput of your topology to some degree.  But
> it all depends on what actually is the bottleneck in your topologies.  If
> it is the network/network ping time, then scheduling all of the components
> of your topology close to each other is important.  If it is the CPU or
> Memory then you need to spread them out more to get more free resources on
> other nodes.  This is kind of what RAS tries to do but it does it just from
> guesses supplied by the topology owner.   In future releases we expect to
> add in elasticity to RAS so that it can look at the actual resources being
> used and take that into account when scheduling, because each topology is
> different.
>
> - Bobby
>
>
> On Tuesday, August 1, 2017, 3:40:11 PM CDT, AMir Firouzi <
> firouziam@gmail.com> wrote:
>
> Thanks Bobby for your instant & informative reply,
> i actually respect these rules. i schedule all of these loggers and ackers,
> but right now my scheduler put all the system tasks(loggers and acker
> tasks) into one worker in one machine and i'm not getting the best
> performance! I think it's because all of the tasks should transfer data to
> these tasks in another machines and network latency slows down the storm.
> but i'm wondering if i put some of these system tasks near other
> (bolt/spout) tasks, would it effect the performance?
> thanks again for your answer.
>
> On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans <ev...@yahoo-inc.com.invalid>
> wrote:
>
> > By default there are no `_eventlogger` tasks.  To have this feature
> > enabled you need to turn it on by setting topology.eventlogger.executors
> to
> > a positive number.  Ackers are on by default, but can be disabled by
> > setting the number of topology.acker.executors to 0.  You should respect
> > these when scheduling a topology because if they are supposed to be there
> > and they are not scheduled messages will be sent to them, but they will
> be
> > lost.  In the case of acking all of the tuples will time out.  In the
> case
> > of the event logger the UI will show it working, but nothing will ever
> come
> > out.
> > Now that is on a per topology basis, not on a per worker basis.  These
> > bolts are like any other bolt.  They can be in any worker your scheduler
> > wants to put them in.  When inserting an acker bolt it is using a keyed
> > grouping connected to just about everything in your topology, so where
> you
> > place it is not that critical as it is going to be talking to everything.
> > The event logger bolts are similar, but using a fields grouping based off
> > of component id.
> >
> >
> https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
> > You could try to be smart to try and collocate the component with the
> > logger for it, but honestly this feature slows your topology down so much
> > already it is probably not worth trying to optimize it as it really will
> > only be used when you need to do some serious debugging.
> >
> >
> > - Bobby
> >
> >
> > On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi <
> > firouziam@gmail.com> wrote:
> >
> > hi guys
> > i'm working on my own scheduler for storm. i wonder what happens if i
> > create a worker process and put some tasks in it(bolt/spout tasks) but no
> > _eventlogger and _acker tasks. what happens? is it a problem? tuples
> > transferred/emitted from within tasks in this worker will be skipped or
> > they just use another _acker or _loggers in other workers?
> >
> > thanks in advance
> >
>

Re: possible to have supervisors without _eventlogger and _acker tasks

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
It should, especially for the ackers.
The ackers receive lots and lots of small messages, and those messages come from all over your topology.  What is more if you have max.spout.pending set how quickly the messages can get to them and back to the spouts determines the throughput of your topology to some degree.  But it all depends on what actually is the bottleneck in your topologies.  If it is the network/network ping time, then scheduling all of the components of your topology close to each other is important.  If it is the CPU or Memory then you need to spread them out more to get more free resources on other nodes.  This is kind of what RAS tries to do but it does it just from guesses supplied by the topology owner.   In future releases we expect to add in elasticity to RAS so that it can look at the actual resources being used and take that into account when scheduling, because each topology is different.

- Bobby


On Tuesday, August 1, 2017, 3:40:11 PM CDT, AMir Firouzi <fi...@gmail.com> wrote:

Thanks Bobby for your instant & informative reply,
i actually respect these rules. i schedule all of these loggers and ackers,
but right now my scheduler put all the system tasks(loggers and acker
tasks) into one worker in one machine and i'm not getting the best
performance! I think it's because all of the tasks should transfer data to
these tasks in another machines and network latency slows down the storm.
but i'm wondering if i put some of these system tasks near other
(bolt/spout) tasks, would it effect the performance?
thanks again for your answer.

On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> By default there are no `_eventlogger` tasks.  To have this feature
> enabled you need to turn it on by setting topology.eventlogger.executors to
> a positive number.  Ackers are on by default, but can be disabled by
> setting the number of topology.acker.executors to 0.  You should respect
> these when scheduling a topology because if they are supposed to be there
> and they are not scheduled messages will be sent to them, but they will be
> lost.  In the case of acking all of the tuples will time out.  In the case
> of the event logger the UI will show it working, but nothing will ever come
> out.
> Now that is on a per topology basis, not on a per worker basis.  These
> bolts are like any other bolt.  They can be in any worker your scheduler
> wants to put them in.  When inserting an acker bolt it is using a keyed
> grouping connected to just about everything in your topology, so where you
> place it is not that critical as it is going to be talking to everything.
> The event logger bolts are similar, but using a fields grouping based off
> of component id.
>
> https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
> You could try to be smart to try and collocate the component with the
> logger for it, but honestly this feature slows your topology down so much
> already it is probably not worth trying to optimize it as it really will
> only be used when you need to do some serious debugging.
>
>
> - Bobby
>
>
> On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi <
> firouziam@gmail.com> wrote:
>
> hi guys
> i'm working on my own scheduler for storm. i wonder what happens if i
> create a worker process and put some tasks in it(bolt/spout tasks) but no
> _eventlogger and _acker tasks. what happens? is it a problem? tuples
> transferred/emitted from within tasks in this worker will be skipped or
> they just use another _acker or _loggers in other workers?
>
> thanks in advance
>

Re: possible to have supervisors without _eventlogger and _acker tasks

Posted by AMir Firouzi <fi...@gmail.com>.
Thanks Bobby for your instant & informative reply,
i actually respect these rules. i schedule all of these loggers and ackers,
but right now my scheduler put all the system tasks(loggers and acker
tasks) into one worker in one machine and i'm not getting the best
performance! I think it's because all of the tasks should transfer data to
these tasks in another machines and network latency slows down the storm.
but i'm wondering if i put some of these system tasks near other
(bolt/spout) tasks, would it effect the performance?
thanks again for your answer.

On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> By default there are no `_eventlogger` tasks.  To have this feature
> enabled you need to turn it on by setting topology.eventlogger.executors to
> a positive number.  Ackers are on by default, but can be disabled by
> setting the number of topology.acker.executors to 0.  You should respect
> these when scheduling a topology because if they are supposed to be there
> and they are not scheduled messages will be sent to them, but they will be
> lost.  In the case of acking all of the tuples will time out.  In the case
> of the event logger the UI will show it working, but nothing will ever come
> out.
> Now that is on a per topology basis, not on a per worker basis.  These
> bolts are like any other bolt.  They can be in any worker your scheduler
> wants to put them in.  When inserting an acker bolt it is using a keyed
> grouping connected to just about everything in your topology, so where you
> place it is not that critical as it is going to be talking to everything.
> The event logger bolts are similar, but using a fields grouping based off
> of component id.
>
> https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
> You could try to be smart to try and collocate the component with the
> logger for it, but honestly this feature slows your topology down so much
> already it is probably not worth trying to optimize it as it really will
> only be used when you need to do some serious debugging.
>
>
> - Bobby
>
>
> On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi <
> firouziam@gmail.com> wrote:
>
> hi guys
> i'm working on my own scheduler for storm. i wonder what happens if i
> create a worker process and put some tasks in it(bolt/spout tasks) but no
> _eventlogger and _acker tasks. what happens? is it a problem? tuples
> transferred/emitted from within tasks in this worker will be skipped or
> they just use another _acker or _loggers in other workers?
>
> thanks in advance
>

Re: possible to have supervisors without _eventlogger and _acker tasks

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
By default there are no `_eventlogger` tasks.  To have this feature enabled you need to turn it on by setting topology.eventlogger.executors to a positive number.  Ackers are on by default, but can be disabled by setting the number of topology.acker.executors to 0.  You should respect these when scheduling a topology because if they are supposed to be there and they are not scheduled messages will be sent to them, but they will be lost.  In the case of acking all of the tuples will time out.  In the case of the event logger the UI will show it working, but nothing will ever come out.
Now that is on a per topology basis, not on a per worker basis.  These bolts are like any other bolt.  They can be in any worker your scheduler wants to put them in.  When inserting an acker bolt it is using a keyed grouping connected to just about everything in your topology, so where you place it is not that critical as it is going to be talking to everything.  The event logger bolts are similar, but using a fields grouping based off of component id.  
https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357
You could try to be smart to try and collocate the component with the logger for it, but honestly this feature slows your topology down so much already it is probably not worth trying to optimize it as it really will only be used when you need to do some serious debugging.


- Bobby


On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi <fi...@gmail.com> wrote:

hi guys
i'm working on my own scheduler for storm. i wonder what happens if i
create a worker process and put some tasks in it(bolt/spout tasks) but no
_eventlogger and _acker tasks. what happens? is it a problem? tuples
transferred/emitted from within tasks in this worker will be skipped or
they just use another _acker or _loggers in other workers?

thanks in advance