You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by I PVP <ip...@hotmail.com> on 2016/04/28 03:35:35 UTC

Slots vs. Topology

Hi everyone,

Do I need to have one slot(supervisor.slots.ports:) at the storm.yaml  for each Topology?

What is the impact of having a number of Slots smaller than the number of Topologies ?

So far I am understanding that  the impact is that some topologies will never run. I have 12 Topologies and  they all only run fine when the number of slots is equal or higher  to the number of Topologies. When the number os Slots is smaller some topologies never start and uptime  for these topologies is blank.

Thanks

--
IPVP


Re: Slots vs. Topology

Posted by Nathan Leung <nc...@gmail.com>.
I recommend against aligning the number of slots on a supervisor to the
number of cores in its CPU/s.  In general I think this is too high a
granularity, but then I was also used to supervisor machines that are
pretty big (12+ cores).

On Thu, Apr 28, 2016 at 10:48 AM, Matthias J. Sax <mj...@apache.org> wrote:

> @Nathan: I am not sure what you recommend against? I did not recommend
> anything so far...
>
>
> From my point of view, I doubt there is a good general recommendation to
> configure the slots per supervisor.
>
> As Nathan mentioned correctly, if you have less workers, a single worker
> needs to do more work. All executor threads of your topology are
> distributes evenly over all used workers.
>
> Thus, it is a trade-off between network overhead and fault-tolerance.
>
> 1) If you use only a single worker (as extreme example) there is no
> network involved. If you have a machine with many cores, the single
> worker JVM can use all those cores and utilize your machine quite well.
> Of course, if something goes wrong, your whole topology crashes
> resulting in expensive recover.
>
> 2) If you use as many workers as executors (as the other extreme
> example), each worker will run a small portion of you topology. It is
> unclear if a single executor thread can utilize a full core (this
> depends heavily on the work to be performed by the executor as well as
> your expected throughput). Thus, you could have one or multiple workers
> per core. The advantage is, that a faulty worker is recovered more
> easily -- furthermore, all other parts of your topology keep processing
> data. The disadvantage is, that you have inter process communication for
> worker JVMs on the same supervisor machine and most likely a lot of
> network I/O for worker JVMs on different supervisor machines.
>
> So, it is hard to say in general, what the workload of a single worker
> is -- it can range from only a single thread to all threads of a whole
> topology. And thus, it is hard to say, how many workers you should
> configure per supervisor.
>
> Hope this helps.
>
>
> -Matthias
>
>
> On 04/28/2016 04:01 PM, Nathan Leung wrote:
> > I would recommend against this.  Storm will automatically run multiple
> > threads for you, especially if you have more than 1 executor / worker.
> > Every time data transfers between workers, it must be serialized and
> > deserialized.  On the other hand, if you have larger workers and one
> > goes down, your topology will have to do more work to recover.
> >
> > On Thu, Apr 28, 2016 at 9:48 AM, I PVP <ipvp@hotmail.com
> > <ma...@hotmail.com>> wrote:
> >
> >     Matthias ,
> >
> >     Thanks for the clear explanation.
> >
> >     Is there any initial guidance to align the number of slots a
> >     supervisor could handle based on the machine # of cpus/cores ?
> >
> >
> >     --
> >     IPVP
> >
> >
> >     From: Matthias J. Sax <mj...@apache.org> <ma...@apache.org>
> >     Reply: user@storm.apache.org <ma...@storm.apache.org>
> >     <us...@storm.apache.org>> <ma...@storm.apache.org>
> >     Date: April 28, 2016 at 4:26:53 AM
> >     To: user@storm.apache.org <ma...@storm.apache.org>
> >     <us...@storm.apache.org>> <ma...@storm.apache.org>
> >     Subject: Re: Slots vs. Topology
> >
> >>     The number of slots defines the number of worker JVM a supervisor
> can
> >>     start. And a single worker JVM only executes code of a single
> >>     topology
> >>     (to isolate topologies for fault-tolerance reasons).
> >>
> >>     Thus, you need to have a least a single worker for each topology
> >>     in your
> >>     cluster (ie, sum of all slots over all supervisors)---assuming a
> >>     topology uses only a single worker.
> >>
> >>     It is not required to have a slot per supervisor per topology per
> se.
> >>
> >>     However, take into account the parameter "numberOfWorkers" that
> >>     you can
> >>     set per topology. This is the maximum number of slots a topology can
> >>     occupy. If less workers are present, the topology will run using the
> >>     available once. If you want all your topologies to be able to use
> >>     this
> >>     max number of workers, you need to have enough slots in your
> cluster.
> >>     Otherwise, the first topologies will occupy as much workers as
> >>     they are
> >>     allowed, and for later deployed topologies not slots might be left
> >>     over.
> >>
> >>     -Matthias
> >>
> >>
> >>     On 04/28/2016 03:35 AM, I PVP wrote:
> >>     > Hi everyone,
> >>     >
> >>     > Do I need to have one slot(supervisor.slots.ports:) at the
> storm.yaml
> >>     > for each Topology?
> >>     >
> >>     > What is the impact of having a number of Slots smaller than the
> number
> >>     > of Topologies ?
> >>     >
> >>     > So far I am understanding that the impact is that some topologies
> will
> >>     > never run. I have 12 Topologies and they all only run fine when
> the
> >>     > number of slots is equal or higher to the number of Topologies.
> When
> >>     > the number os Slots is smaller some topologies never start and
> uptime
> >>     > for these topologies is blank.
> >>     >
> >>     > Thanks
> >>     >
> >>     > --
> >>     > IPVP
> >>     >
> >>
> >>
>  ------------------------------------------------------------------------
> >
> >
>
>

Re: Slots vs. Topology

Posted by "Matthias J. Sax" <mj...@apache.org>.
@Nathan: I am not sure what you recommend against? I did not recommend
anything so far...


From my point of view, I doubt there is a good general recommendation to
configure the slots per supervisor.

As Nathan mentioned correctly, if you have less workers, a single worker
needs to do more work. All executor threads of your topology are
distributes evenly over all used workers.

Thus, it is a trade-off between network overhead and fault-tolerance.

1) If you use only a single worker (as extreme example) there is no
network involved. If you have a machine with many cores, the single
worker JVM can use all those cores and utilize your machine quite well.
Of course, if something goes wrong, your whole topology crashes
resulting in expensive recover.

2) If you use as many workers as executors (as the other extreme
example), each worker will run a small portion of you topology. It is
unclear if a single executor thread can utilize a full core (this
depends heavily on the work to be performed by the executor as well as
your expected throughput). Thus, you could have one or multiple workers
per core. The advantage is, that a faulty worker is recovered more
easily -- furthermore, all other parts of your topology keep processing
data. The disadvantage is, that you have inter process communication for
worker JVMs on the same supervisor machine and most likely a lot of
network I/O for worker JVMs on different supervisor machines.

So, it is hard to say in general, what the workload of a single worker
is -- it can range from only a single thread to all threads of a whole
topology. And thus, it is hard to say, how many workers you should
configure per supervisor.

Hope this helps.


-Matthias


On 04/28/2016 04:01 PM, Nathan Leung wrote:
> I would recommend against this.  Storm will automatically run multiple
> threads for you, especially if you have more than 1 executor / worker. 
> Every time data transfers between workers, it must be serialized and
> deserialized.  On the other hand, if you have larger workers and one
> goes down, your topology will have to do more work to recover.
> 
> On Thu, Apr 28, 2016 at 9:48 AM, I PVP <ipvp@hotmail.com
> <ma...@hotmail.com>> wrote:
> 
>     Matthias ,
> 
>     Thanks for the clear explanation.
> 
>     Is there any initial guidance to align the number of slots a
>     supervisor could handle based on the machine # of cpus/cores ?
>      
>      
>     -- 
>     IPVP
> 
> 
>     From: Matthias J. Sax <mj...@apache.org> <ma...@apache.org>
>     Reply: user@storm.apache.org <ma...@storm.apache.org>
>     <us...@storm.apache.org>> <ma...@storm.apache.org>
>     Date: April 28, 2016 at 4:26:53 AM
>     To: user@storm.apache.org <ma...@storm.apache.org>
>     <us...@storm.apache.org>> <ma...@storm.apache.org>
>     Subject: Re: Slots vs. Topology
> 
>>     The number of slots defines the number of worker JVM a supervisor can
>>     start. And a single worker JVM only executes code of a single
>>     topology
>>     (to isolate topologies for fault-tolerance reasons).
>>
>>     Thus, you need to have a least a single worker for each topology
>>     in your
>>     cluster (ie, sum of all slots over all supervisors)---assuming a
>>     topology uses only a single worker.
>>
>>     It is not required to have a slot per supervisor per topology per se.
>>
>>     However, take into account the parameter "numberOfWorkers" that
>>     you can
>>     set per topology. This is the maximum number of slots a topology can
>>     occupy. If less workers are present, the topology will run using the
>>     available once. If you want all your topologies to be able to use
>>     this
>>     max number of workers, you need to have enough slots in your cluster.
>>     Otherwise, the first topologies will occupy as much workers as
>>     they are
>>     allowed, and for later deployed topologies not slots might be left
>>     over.
>>
>>     -Matthias
>>
>>
>>     On 04/28/2016 03:35 AM, I PVP wrote:
>>     > Hi everyone, 
>>     > 
>>     > Do I need to have one slot(supervisor.slots.ports:) at the storm.yaml 
>>     > for each Topology? 
>>     > 
>>     > What is the impact of having a number of Slots smaller than the number 
>>     > of Topologies ? 
>>     > 
>>     > So far I am understanding that the impact is that some topologies will 
>>     > never run. I have 12 Topologies and they all only run fine when the 
>>     > number of slots is equal or higher to the number of Topologies. When 
>>     > the number os Slots is smaller some topologies never start and uptime 
>>     > for these topologies is blank. 
>>     > 
>>     > Thanks 
>>     > 
>>     > -- 
>>     > IPVP 
>>     > 
>>
>>     ------------------------------------------------------------------------
> 
> 


Re: Slots vs. Topology

Posted by Nathan Leung <nc...@gmail.com>.
I would recommend against this.  Storm will automatically run multiple
threads for you, especially if you have more than 1 executor / worker.
Every time data transfers between workers, it must be serialized and
deserialized.  On the other hand, if you have larger workers and one goes
down, your topology will have to do more work to recover.

On Thu, Apr 28, 2016 at 9:48 AM, I PVP <ip...@hotmail.com> wrote:

> Matthias ,
>
> Thanks for the clear explanation.
>
> Is there any initial guidance to align the number of slots a supervisor
> could handle based on the machine # of cpus/cores ?
>
>
> --
> IPVP
>
>
> From: Matthias J. Sax <mj...@apache.org> <mj...@apache.org>
> Reply: user@storm.apache.org <us...@storm.apache.org>>
> <us...@storm.apache.org>
> Date: April 28, 2016 at 4:26:53 AM
> To: user@storm.apache.org <us...@storm.apache.org>> <us...@storm.apache.org>
> Subject:  Re: Slots vs. Topology
>
> The number of slots defines the number of worker JVM a supervisor can
> start. And a single worker JVM only executes code of a single topology
> (to isolate topologies for fault-tolerance reasons).
>
> Thus, you need to have a least a single worker for each topology in your
> cluster (ie, sum of all slots over all supervisors)---assuming a
> topology uses only a single worker.
>
> It is not required to have a slot per supervisor per topology per se.
>
> However, take into account the parameter "numberOfWorkers" that you can
> set per topology. This is the maximum number of slots a topology can
> occupy. If less workers are present, the topology will run using the
> available once. If you want all your topologies to be able to use this
> max number of workers, you need to have enough slots in your cluster.
> Otherwise, the first topologies will occupy as much workers as they are
> allowed, and for later deployed topologies not slots might be left over.
>
> -Matthias
>
>
> On 04/28/2016 03:35 AM, I PVP wrote:
> > Hi everyone,
> >
> > Do I need to have one slot(supervisor.slots.ports:) at the storm.yaml
> > for each Topology?
> >
> > What is the impact of having a number of Slots smaller than the number
> > of Topologies ?
> >
> > So far I am understanding that the impact is that some topologies will
> > never run. I have 12 Topologies and they all only run fine when the
> > number of slots is equal or higher to the number of Topologies. When
> > the number os Slots is smaller some topologies never start and uptime
> > for these topologies is blank.
> >
> > Thanks
> >
> > --
> > IPVP
> >
>
> ------------------------------
>
>

Re: Slots vs. Topology

Posted by I PVP <ip...@hotmail.com>.
Matthias ,

Thanks for the clear explanation.

Is there any initial guidance to align the number of slots a supervisor could handle based on the machine # of cpus/cores ?


--
IPVP


From: Matthias J. Sax <mj...@apache.org>
Reply: user@storm.apache.org <us...@storm.apache.org>
Date: April 28, 2016 at 4:26:53 AM
To: user@storm.apache.org <us...@storm.apache.org>
Subject:  Re: Slots vs. Topology

The number of slots defines the number of worker JVM a supervisor can
start. And a single worker JVM only executes code of a single topology
(to isolate topologies for fault-tolerance reasons).

Thus, you need to have a least a single worker for each topology in your
cluster (ie, sum of all slots over all supervisors)---assuming a
topology uses only a single worker.

It is not required to have a slot per supervisor per topology per se.

However, take into account the parameter "numberOfWorkers" that you can
set per topology. This is the maximum number of slots a topology can
occupy. If less workers are present, the topology will run using the
available once. If you want all your topologies to be able to use this
max number of workers, you need to have enough slots in your cluster.
Otherwise, the first topologies will occupy as much workers as they are
allowed, and for later deployed topologies not slots might be left over.

-Matthias


On 04/28/2016 03:35 AM, I PVP wrote:
> Hi everyone,
>
> Do I need to have one slot(supervisor.slots.ports:) at the storm.yaml
> for each Topology?
>
> What is the impact of having a number of Slots smaller than the number
> of Topologies ?
>
> So far I am understanding that the impact is that some topologies will
> never run. I have 12 Topologies and they all only run fine when the
> number of slots is equal or higher to the number of Topologies. When
> the number os Slots is smaller some topologies never start and uptime
> for these topologies is blank.
>
> Thanks
>
> --
> IPVP
>

________________________________

Re: Slots vs. Topology

Posted by "Matthias J. Sax" <mj...@apache.org>.
The number of slots defines the number of worker JVM a supervisor can
start. And a single worker JVM only executes code of a single topology
(to isolate topologies for fault-tolerance reasons).

Thus, you need to have a least a single worker for each topology in your
cluster (ie, sum of all slots over all supervisors)---assuming a
topology uses only a single worker.

It is not required to have a slot per supervisor per topology per se.

However, take into account the parameter "numberOfWorkers" that you can
set per topology. This is the maximum number of slots a topology can
occupy. If less workers are present, the topology will run using the
available once. If you want all your topologies to be able to use this
max number of workers, you need to have enough slots in your cluster.
Otherwise, the first topologies will occupy as much workers as they are
allowed, and for later deployed topologies not slots might be left over.

-Matthias


On 04/28/2016 03:35 AM, I PVP wrote:
> Hi everyone,
> 
> Do I need to have one slot(supervisor.slots.ports:) at the storm.yaml
>  for each Topology? 
> 
> What is the impact of having a number of Slots smaller than the number
> of Topologies ?  
> 
> So far I am understanding that  the impact is that some topologies will
> never run. I have 12 Topologies and  they all only run fine when the
> number of slots is equal or higher  to the number of Topologies. When
> the number os Slots is smaller some topologies never start and uptime
>  for these topologies is blank.
> 
> Thanks
> 
> -- 
> IPVP
>