You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Nick R. Katsipoulakis" <ni...@gmail.com> on 2015/09/02 20:02:04 UTC

Tasks are not starting

Hello all,

I am working on a project in which I submit a topology to my Storm cluster,
but for some reason, some of my tasks do not start executing.

I can see that the above is happening because every bolt I have needs to
connect to an external server and do a registration to a service. However,
some of the bolts do not seem to connect.

I have to say that the number of tasks I have is larger than the number of
workers of my cluster. Also, I check my worker log files, and I see that
the workers that do not register, are also not writing some initialization
messages I have them print in the beginning.

Any idea why this is happening? Can it be because my resources are not
enough to start off all of the tasks?

Thank you,
Nick

Re: Tasks are not starting

Posted by Abhishek Agarwal <ab...@gmail.com>.

can you take the stack trace and post here?  May be that could give clue to
where the task is stuck.

On Thu, Sep 3, 2015 at 6:42 PM, John Yost <so...@gmail.com> wrote:

> Hi Nick,
>
> Gotcha, OK, the worker starts but the task does not.  That's interesting.
> I am pretty sure that, in my experience, if tasks assigned to a worker
> don't start, then the worker is killed by the supervisor. Admittedly, I am
> still a Storm nube, so I may be missing something. :) Please confirm if the
> supervisor is killing the workers if the tasks don't start.
>
> --John
>
> On Thu, Sep 3, 2015 at 9:09 AM, Nick R. Katsipoulakis <
> nick.katsip@gmail.com> wrote:
>
>> Hello all,
>>
>> @John: The supervisor logs say nothing, except from the normal startup
>> messages.
>>
>> @Abhishek: No, the worker starts properly. The task itself does not start.
>>
>> Thanks,
>> Nick
>>
>> 2015-09-03 9:00 GMT-04:00 Abhishek Agarwal <ab...@gmail.com>:
>>
>>> When you say that tasks do not start, do you mean that worker process
>>> itself is not starting?
>>>
>>> On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com>
>>> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> What do the nimbus and supervisor logs say? One or both may contain
>>>> clues as to why your workers are not starting up.
>>>>
>>>> --John
>>>>
>>>> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org>
>>>> wrote:
>>>>
>>>>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>>>>> the behavior you describe. If I submit a sample topology with 1 spout
>>>>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>>>>> supervisor available (each with 12 worker slots), each of the 11
>>>>> executors is running on a single worker of a single supervisor (host).
>>>>>
>>>>> I am not idea why you observe a different behavior...
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>>>>> > When I say co-locate, what I have seen in my experiments is the
>>>>> following:
>>>>> >
>>>>> > If the executor's number can be served by workers on one node, the
>>>>> > scheduler spawns all the executors in the workers of one node. I have
>>>>> > also seen that behavior in that the default scheduler tries to fill
>>>>> up
>>>>> > one node before provisioning an additional one for the topology.
>>>>> >
>>>>> > Going back to your following sentence "and the executors should be
>>>>> > evenly distributed over all available workers." I have to say that I
>>>>> do
>>>>> > not see that often in my experiments. Actually, I often come across
>>>>> with
>>>>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>>>>> > missing something? Is it just a coincidence that happened in my
>>>>> experiments?
>>>>> >
>>>>> > Thank you,
>>>>> > Nick
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>>> > <ma...@apache.org>>:
>>>>> >
>>>>> >     I agree. The load is not high.
>>>>> >
>>>>> >     About higher latencies. How many ackers did you configure? As a
>>>>> rule of
>>>>> >     thumb there should be one acker per executor. If you have less
>>>>> ackers,
>>>>> >     and an increasing number of executors, this might cause the
>>>>> increased
>>>>> >     latency as the ackers could become a bottleneck.
>>>>> >
>>>>> >     What do you mean by "trying to co-locate tasks and executors as
>>>>> much as
>>>>> >     possible"? Tasks a logical units of works that are processed by
>>>>> >     executors (which are threads). Furthermore (as far as I know),
>>>>> the
>>>>> >     default scheduler does a evenly distributed assignment for tasks
>>>>> and
>>>>> >     executor to the available workers. In you case, as you set the
>>>>> number of
>>>>> >     task equal to the number of executors, each executors processes
>>>>> a single
>>>>> >     task, and the executors should be evenly distributed over all
>>>>> available
>>>>> >     workers.
>>>>> >
>>>>> >     However, you are right: intra-worker channels are "cheaper" than
>>>>> >     inter-worker channels. In order to exploit this, you should use
>>>>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>>>>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>>>>> ensures
>>>>> >     good load balancing.
>>>>> >
>>>>> >
>>>>> >     -Matthias
>>>>> >
>>>>> >
>>>>> >
>>>>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>>>>> >     > Well, my input load is 4 streams at 4000 tuples per second,
>>>>> and each
>>>>> >     > tuple is about 128 bytes long. Therefore, I do not think my
>>>>> load is too
>>>>> >     > much for my hardware.
>>>>> >     >
>>>>> >     > No, I am running only this topology in my cluster.
>>>>> >     >
>>>>> >     > For some reason, when I set the task to executor ratio to 1,
>>>>> my topology
>>>>> >     > does not hang at all. The strange thing now is that I see
>>>>> higher latency
>>>>> >     > with more executors and I am trying to figure this out. Also,
>>>>> I see that
>>>>> >     > the default scheduler is trying to co-locate tasks and
>>>>> executors as much
>>>>> >     > as possible. Is this true? If yes, is it because the
>>>>> intra-worker
>>>>> >     > latencies are much lower than the inter-worker latencies?
>>>>> >     >
>>>>> >     > Thanks,
>>>>> >     > Nick
>>>>> >     >
>>>>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>>> <ma...@apache.org>
>>>>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>>>>> >     >
>>>>> >     >     So (for each node) you have 4 cores available for 1
>>>>> supervisor JVM, 2
>>>>> >     >     worker JVMs that execute up to 5 thread each (if 40
>>>>> executors are
>>>>> >     >     distributed evenly over all workers. Thus, about 12
>>>>> threads for 4 cores.
>>>>> >     >     Or course, Storm starts a few more threads within each
>>>>> >     >     worker/supervisor.
>>>>> >     >
>>>>> >     >     If your load is not huge, this might be sufficient.
>>>>> However, having high
>>>>> >     >     data rate, it might be problematic.
>>>>> >     >
>>>>> >     >     One more question: do you run a single topology in your
>>>>> cluster or
>>>>> >     >     multiple? Storm isolates topologies for fault-tolerance
>>>>> reasons. Thus, a
>>>>> >     >     single worker cannot process executors from different
>>>>> topologies. If you
>>>>> >     >     run out of workers, a topology might not start up
>>>>> completely.
>>>>> >     >
>>>>> >     >     -Matthias
>>>>> >     >
>>>>> >     >
>>>>> >     >
>>>>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>>>>> >     >     > Hello Matthias and thank you for your reply. See my
>>>>> answers below:
>>>>> >     >     >
>>>>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>>>>> m4.xlarge instances
>>>>> >     >     > (4 cores per node). On top of that I have 3 more nodes
>>>>> for zookeeper and
>>>>> >     >     > nimbus.
>>>>> >     >     > - 2 worker nodes per supervisor node
>>>>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>>>>> use 1:1 task to
>>>>> >     >     > executor assignment.
>>>>> >     >     > - The number of executors in total for the topology
>>>>> ranges from 14 to 41
>>>>> >     >     >
>>>>> >     >     > Thanks,
>>>>> >     >     > Nick
>>>>> >     >     >
>>>>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>>>>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>>>>> >     <ma...@apache.org>>
>>>>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>>>>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>>>>> >     >     >
>>>>> >     >     >     Without any exception/error message it is hard to
>>>>> tell.
>>>>> >     >     >
>>>>> >     >     >     What is your cluster setup
>>>>> >     >     >       - Hardware, ie, number of cores per node?
>>>>> >     >     >       - How many node/supervisor are available?
>>>>> >     >     >       - Configured number of workers for the topology?
>>>>> >     >     >       - What is the number of task for each spout/bolt?
>>>>> >     >     >       - What is the number of executors for each
>>>>> spout/bolt?
>>>>> >     >     >
>>>>> >     >     >     -Matthias
>>>>> >     >     >
>>>>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>>>>> >     >     >     > Hello all,
>>>>> >     >     >     >
>>>>> >     >     >     > I am working on a project in which I submit a
>>>>> topology
>>>>> >     to my
>>>>> >     >     Storm
>>>>> >     >     >     > cluster, but for some reason, some of my tasks do
>>>>> not
>>>>> >     start
>>>>> >     >     executing.
>>>>> >     >     >     >
>>>>> >     >     >     > I can see that the above is happening because every
>>>>> >     bolt I have
>>>>> >     >     >     needs to
>>>>> >     >     >     > connect to an external server and do a
>>>>> registration to a
>>>>> >     >     service.
>>>>> >     >     >     > However, some of the bolts do not seem to connect.
>>>>> >     >     >     >
>>>>> >     >     >     > I have to say that the number of tasks I have is
>>>>> >     larger than the
>>>>> >     >     >     number
>>>>> >     >     >     > of workers of my cluster. Also, I check my worker
>>>>> log
>>>>> >     files,
>>>>> >     >     and I see
>>>>> >     >     >     > that the workers that do not register, are also not
>>>>> >     writing some
>>>>> >     >     >     > initialization messages I have them print in the
>>>>> >     beginning.
>>>>> >     >     >     >
>>>>> >     >     >     > Any idea why this is happening? Can it be because
>>>>> my
>>>>> >     >     resources are not
>>>>> >     >     >     > enough to start off all of the tasks?
>>>>> >     >     >     >
>>>>> >     >     >     > Thank you,
>>>>> >     >     >     > Nick
>>>>> >     >     >
>>>>> >     >     >
>>>>> >     >     >
>>>>> >     >     >
>>>>> >     >     > --
>>>>> >     >     > Nikolaos Romanos Katsipoulakis,
>>>>> >     >     > University of Pittsburgh, PhD candidate
>>>>> >     >
>>>>> >     >
>>>>> >     >
>>>>> >     >
>>>>> >     > --
>>>>> >     > Nikolaos Romanos Katsipoulakis,
>>>>> >     > University of Pittsburgh, PhD candidate
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Nikolaos Romanos Katsipoulakis,
>>>>> > University of Pittsburgh, PhD candidate
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Abhishek Agarwal
>>>
>>>
>>
>>
>> --
>> Nikolaos Romanos Katsipoulakis,
>> University of Pittsburgh, PhD candidate
>>
>
>


-- 
Regards,
Abhishek Agarwal

Re: Tasks are not starting

Posted by John Yost <so...@gmail.com>.

Hi Nick,

Gotcha, OK, the worker starts but the task does not.  That's interesting. I
am pretty sure that, in my experience, if tasks assigned to a worker don't
start, then the worker is killed by the supervisor. Admittedly, I am still
a Storm nube, so I may be missing something. :) Please confirm if the
supervisor is killing the workers if the tasks don't start.

--John

On Thu, Sep 3, 2015 at 9:09 AM, Nick R. Katsipoulakis <nick.katsip@gmail.com
> wrote:

> Hello all,
>
> @John: The supervisor logs say nothing, except from the normal startup
> messages.
>
> @Abhishek: No, the worker starts properly. The task itself does not start.
>
> Thanks,
> Nick
>
> 2015-09-03 9:00 GMT-04:00 Abhishek Agarwal <ab...@gmail.com>:
>
>> When you say that tasks do not start, do you mean that worker process
>> itself is not starting?
>>
>> On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com>
>> wrote:
>>
>>> Hi Nick,
>>>
>>> What do the nimbus and supervisor logs say? One or both may contain
>>> clues as to why your workers are not starting up.
>>>
>>> --John
>>>
>>> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org>
>>> wrote:
>>>
>>>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>>>> the behavior you describe. If I submit a sample topology with 1 spout
>>>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>>>> supervisor available (each with 12 worker slots), each of the 11
>>>> executors is running on a single worker of a single supervisor (host).
>>>>
>>>> I am not idea why you observe a different behavior...
>>>>
>>>> -Matthias
>>>>
>>>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>>>> > When I say co-locate, what I have seen in my experiments is the
>>>> following:
>>>> >
>>>> > If the executor's number can be served by workers on one node, the
>>>> > scheduler spawns all the executors in the workers of one node. I have
>>>> > also seen that behavior in that the default scheduler tries to fill up
>>>> > one node before provisioning an additional one for the topology.
>>>> >
>>>> > Going back to your following sentence "and the executors should be
>>>> > evenly distributed over all available workers." I have to say that I
>>>> do
>>>> > not see that often in my experiments. Actually, I often come across
>>>> with
>>>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>>>> > missing something? Is it just a coincidence that happened in my
>>>> experiments?
>>>> >
>>>> > Thank you,
>>>> > Nick
>>>> >
>>>> >
>>>> >
>>>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>> > <ma...@apache.org>>:
>>>> >
>>>> >     I agree. The load is not high.
>>>> >
>>>> >     About higher latencies. How many ackers did you configure? As a
>>>> rule of
>>>> >     thumb there should be one acker per executor. If you have less
>>>> ackers,
>>>> >     and an increasing number of executors, this might cause the
>>>> increased
>>>> >     latency as the ackers could become a bottleneck.
>>>> >
>>>> >     What do you mean by "trying to co-locate tasks and executors as
>>>> much as
>>>> >     possible"? Tasks a logical units of works that are processed by
>>>> >     executors (which are threads). Furthermore (as far as I know), the
>>>> >     default scheduler does a evenly distributed assignment for tasks
>>>> and
>>>> >     executor to the available workers. In you case, as you set the
>>>> number of
>>>> >     task equal to the number of executors, each executors processes a
>>>> single
>>>> >     task, and the executors should be evenly distributed over all
>>>> available
>>>> >     workers.
>>>> >
>>>> >     However, you are right: intra-worker channels are "cheaper" than
>>>> >     inter-worker channels. In order to exploit this, you should use
>>>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>>>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>>>> ensures
>>>> >     good load balancing.
>>>> >
>>>> >
>>>> >     -Matthias
>>>> >
>>>> >
>>>> >
>>>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>>>> >     > Well, my input load is 4 streams at 4000 tuples per second, and
>>>> each
>>>> >     > tuple is about 128 bytes long. Therefore, I do not think my
>>>> load is too
>>>> >     > much for my hardware.
>>>> >     >
>>>> >     > No, I am running only this topology in my cluster.
>>>> >     >
>>>> >     > For some reason, when I set the task to executor ratio to 1, my
>>>> topology
>>>> >     > does not hang at all. The strange thing now is that I see
>>>> higher latency
>>>> >     > with more executors and I am trying to figure this out. Also, I
>>>> see that
>>>> >     > the default scheduler is trying to co-locate tasks and
>>>> executors as much
>>>> >     > as possible. Is this true? If yes, is it because the
>>>> intra-worker
>>>> >     > latencies are much lower than the inter-worker latencies?
>>>> >     >
>>>> >     > Thanks,
>>>> >     > Nick
>>>> >     >
>>>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>> <ma...@apache.org>
>>>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>>>> >     >
>>>> >     >     So (for each node) you have 4 cores available for 1
>>>> supervisor JVM, 2
>>>> >     >     worker JVMs that execute up to 5 thread each (if 40
>>>> executors are
>>>> >     >     distributed evenly over all workers. Thus, about 12 threads
>>>> for 4 cores.
>>>> >     >     Or course, Storm starts a few more threads within each
>>>> >     >     worker/supervisor.
>>>> >     >
>>>> >     >     If your load is not huge, this might be sufficient.
>>>> However, having high
>>>> >     >     data rate, it might be problematic.
>>>> >     >
>>>> >     >     One more question: do you run a single topology in your
>>>> cluster or
>>>> >     >     multiple? Storm isolates topologies for fault-tolerance
>>>> reasons. Thus, a
>>>> >     >     single worker cannot process executors from different
>>>> topologies. If you
>>>> >     >     run out of workers, a topology might not start up
>>>> completely.
>>>> >     >
>>>> >     >     -Matthias
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>>>> >     >     > Hello Matthias and thank you for your reply. See my
>>>> answers below:
>>>> >     >     >
>>>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>>>> m4.xlarge instances
>>>> >     >     > (4 cores per node). On top of that I have 3 more nodes
>>>> for zookeeper and
>>>> >     >     > nimbus.
>>>> >     >     > - 2 worker nodes per supervisor node
>>>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>>>> use 1:1 task to
>>>> >     >     > executor assignment.
>>>> >     >     > - The number of executors in total for the topology
>>>> ranges from 14 to 41
>>>> >     >     >
>>>> >     >     > Thanks,
>>>> >     >     > Nick
>>>> >     >     >
>>>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>>>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>>>> >     <ma...@apache.org>>
>>>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>>>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>>>> >     >     >
>>>> >     >     >     Without any exception/error message it is hard to
>>>> tell.
>>>> >     >     >
>>>> >     >     >     What is your cluster setup
>>>> >     >     >       - Hardware, ie, number of cores per node?
>>>> >     >     >       - How many node/supervisor are available?
>>>> >     >     >       - Configured number of workers for the topology?
>>>> >     >     >       - What is the number of task for each spout/bolt?
>>>> >     >     >       - What is the number of executors for each
>>>> spout/bolt?
>>>> >     >     >
>>>> >     >     >     -Matthias
>>>> >     >     >
>>>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>>>> >     >     >     > Hello all,
>>>> >     >     >     >
>>>> >     >     >     > I am working on a project in which I submit a
>>>> topology
>>>> >     to my
>>>> >     >     Storm
>>>> >     >     >     > cluster, but for some reason, some of my tasks do
>>>> not
>>>> >     start
>>>> >     >     executing.
>>>> >     >     >     >
>>>> >     >     >     > I can see that the above is happening because every
>>>> >     bolt I have
>>>> >     >     >     needs to
>>>> >     >     >     > connect to an external server and do a registration
>>>> to a
>>>> >     >     service.
>>>> >     >     >     > However, some of the bolts do not seem to connect.
>>>> >     >     >     >
>>>> >     >     >     > I have to say that the number of tasks I have is
>>>> >     larger than the
>>>> >     >     >     number
>>>> >     >     >     > of workers of my cluster. Also, I check my worker
>>>> log
>>>> >     files,
>>>> >     >     and I see
>>>> >     >     >     > that the workers that do not register, are also not
>>>> >     writing some
>>>> >     >     >     > initialization messages I have them print in the
>>>> >     beginning.
>>>> >     >     >     >
>>>> >     >     >     > Any idea why this is happening? Can it be because my
>>>> >     >     resources are not
>>>> >     >     >     > enough to start off all of the tasks?
>>>> >     >     >     >
>>>> >     >     >     > Thank you,
>>>> >     >     >     > Nick
>>>> >     >     >
>>>> >     >     >
>>>> >     >     >
>>>> >     >     >
>>>> >     >     > --
>>>> >     >     > Nikolaos Romanos Katsipoulakis,
>>>> >     >     > University of Pittsburgh, PhD candidate
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     > --
>>>> >     > Nikolaos Romanos Katsipoulakis,
>>>> >     > University of Pittsburgh, PhD candidate
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Nikolaos Romanos Katsipoulakis,
>>>> > University of Pittsburgh, PhD candidate
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Abhishek Agarwal
>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello all,

@John: The supervisor logs say nothing, except from the normal startup
messages.

@Abhishek: No, the worker starts properly. The task itself does not start.

Thanks,
Nick

2015-09-03 9:00 GMT-04:00 Abhishek Agarwal <ab...@gmail.com>:

> When you say that tasks do not start, do you mean that worker process
> itself is not starting?
>
> On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com>
> wrote:
>
>> Hi Nick,
>>
>> What do the nimbus and supervisor logs say? One or both may contain clues
>> as to why your workers are not starting up.
>>
>> --John
>>
>> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>
>>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>>> the behavior you describe. If I submit a sample topology with 1 spout
>>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>>> supervisor available (each with 12 worker slots), each of the 11
>>> executors is running on a single worker of a single supervisor (host).
>>>
>>> I am not idea why you observe a different behavior...
>>>
>>> -Matthias
>>>
>>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>>> > When I say co-locate, what I have seen in my experiments is the
>>> following:
>>> >
>>> > If the executor's number can be served by workers on one node, the
>>> > scheduler spawns all the executors in the workers of one node. I have
>>> > also seen that behavior in that the default scheduler tries to fill up
>>> > one node before provisioning an additional one for the topology.
>>> >
>>> > Going back to your following sentence "and the executors should be
>>> > evenly distributed over all available workers." I have to say that I do
>>> > not see that often in my experiments. Actually, I often come across
>>> with
>>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>>> > missing something? Is it just a coincidence that happened in my
>>> experiments?
>>> >
>>> > Thank you,
>>> > Nick
>>> >
>>> >
>>> >
>>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>> > <ma...@apache.org>>:
>>> >
>>> >     I agree. The load is not high.
>>> >
>>> >     About higher latencies. How many ackers did you configure? As a
>>> rule of
>>> >     thumb there should be one acker per executor. If you have less
>>> ackers,
>>> >     and an increasing number of executors, this might cause the
>>> increased
>>> >     latency as the ackers could become a bottleneck.
>>> >
>>> >     What do you mean by "trying to co-locate tasks and executors as
>>> much as
>>> >     possible"? Tasks a logical units of works that are processed by
>>> >     executors (which are threads). Furthermore (as far as I know), the
>>> >     default scheduler does a evenly distributed assignment for tasks
>>> and
>>> >     executor to the available workers. In you case, as you set the
>>> number of
>>> >     task equal to the number of executors, each executors processes a
>>> single
>>> >     task, and the executors should be evenly distributed over all
>>> available
>>> >     workers.
>>> >
>>> >     However, you are right: intra-worker channels are "cheaper" than
>>> >     inter-worker channels. In order to exploit this, you should use
>>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>>> ensures
>>> >     good load balancing.
>>> >
>>> >
>>> >     -Matthias
>>> >
>>> >
>>> >
>>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>>> >     > Well, my input load is 4 streams at 4000 tuples per second, and
>>> each
>>> >     > tuple is about 128 bytes long. Therefore, I do not think my load
>>> is too
>>> >     > much for my hardware.
>>> >     >
>>> >     > No, I am running only this topology in my cluster.
>>> >     >
>>> >     > For some reason, when I set the task to executor ratio to 1, my
>>> topology
>>> >     > does not hang at all. The strange thing now is that I see higher
>>> latency
>>> >     > with more executors and I am trying to figure this out. Also, I
>>> see that
>>> >     > the default scheduler is trying to co-locate tasks and executors
>>> as much
>>> >     > as possible. Is this true? If yes, is it because the intra-worker
>>> >     > latencies are much lower than the inter-worker latencies?
>>> >     >
>>> >     > Thanks,
>>> >     > Nick
>>> >     >
>>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>> <ma...@apache.org>
>>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>>> >     >
>>> >     >     So (for each node) you have 4 cores available for 1
>>> supervisor JVM, 2
>>> >     >     worker JVMs that execute up to 5 thread each (if 40
>>> executors are
>>> >     >     distributed evenly over all workers. Thus, about 12 threads
>>> for 4 cores.
>>> >     >     Or course, Storm starts a few more threads within each
>>> >     >     worker/supervisor.
>>> >     >
>>> >     >     If your load is not huge, this might be sufficient. However,
>>> having high
>>> >     >     data rate, it might be problematic.
>>> >     >
>>> >     >     One more question: do you run a single topology in your
>>> cluster or
>>> >     >     multiple? Storm isolates topologies for fault-tolerance
>>> reasons. Thus, a
>>> >     >     single worker cannot process executors from different
>>> topologies. If you
>>> >     >     run out of workers, a topology might not start up completely.
>>> >     >
>>> >     >     -Matthias
>>> >     >
>>> >     >
>>> >     >
>>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>>> >     >     > Hello Matthias and thank you for your reply. See my
>>> answers below:
>>> >     >     >
>>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>>> m4.xlarge instances
>>> >     >     > (4 cores per node). On top of that I have 3 more nodes for
>>> zookeeper and
>>> >     >     > nimbus.
>>> >     >     > - 2 worker nodes per supervisor node
>>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>>> use 1:1 task to
>>> >     >     > executor assignment.
>>> >     >     > - The number of executors in total for the topology ranges
>>> from 14 to 41
>>> >     >     >
>>> >     >     > Thanks,
>>> >     >     > Nick
>>> >     >     >
>>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>>> >     <ma...@apache.org>>
>>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>>> >     >     >
>>> >     >     >     Without any exception/error message it is hard to tell.
>>> >     >     >
>>> >     >     >     What is your cluster setup
>>> >     >     >       - Hardware, ie, number of cores per node?
>>> >     >     >       - How many node/supervisor are available?
>>> >     >     >       - Configured number of workers for the topology?
>>> >     >     >       - What is the number of task for each spout/bolt?
>>> >     >     >       - What is the number of executors for each
>>> spout/bolt?
>>> >     >     >
>>> >     >     >     -Matthias
>>> >     >     >
>>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>>> >     >     >     > Hello all,
>>> >     >     >     >
>>> >     >     >     > I am working on a project in which I submit a
>>> topology
>>> >     to my
>>> >     >     Storm
>>> >     >     >     > cluster, but for some reason, some of my tasks do not
>>> >     start
>>> >     >     executing.
>>> >     >     >     >
>>> >     >     >     > I can see that the above is happening because every
>>> >     bolt I have
>>> >     >     >     needs to
>>> >     >     >     > connect to an external server and do a registration
>>> to a
>>> >     >     service.
>>> >     >     >     > However, some of the bolts do not seem to connect.
>>> >     >     >     >
>>> >     >     >     > I have to say that the number of tasks I have is
>>> >     larger than the
>>> >     >     >     number
>>> >     >     >     > of workers of my cluster. Also, I check my worker log
>>> >     files,
>>> >     >     and I see
>>> >     >     >     > that the workers that do not register, are also not
>>> >     writing some
>>> >     >     >     > initialization messages I have them print in the
>>> >     beginning.
>>> >     >     >     >
>>> >     >     >     > Any idea why this is happening? Can it be because my
>>> >     >     resources are not
>>> >     >     >     > enough to start off all of the tasks?
>>> >     >     >     >
>>> >     >     >     > Thank you,
>>> >     >     >     > Nick
>>> >     >     >
>>> >     >     >
>>> >     >     >
>>> >     >     >
>>> >     >     > --
>>> >     >     > Nikolaos Romanos Katsipoulakis,
>>> >     >     > University of Pittsburgh, PhD candidate
>>> >     >
>>> >     >
>>> >     >
>>> >     >
>>> >     > --
>>> >     > Nikolaos Romanos Katsipoulakis,
>>> >     > University of Pittsburgh, PhD candidate
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Nikolaos Romanos Katsipoulakis,
>>> > University of Pittsburgh, PhD candidate
>>>
>>>
>>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello all,

For some reason, when I set the ratio of executors to threads equal to 1:1
I do not see the problem of non-starting tasks.

Maybe, from what I understand is that when I have an executor (thread)
handling 2 tasks, and the first task blocks in an I/O (in my custom code),
the other task is not executed by the executor, because it waits for that
I/O.

Cheers,
Nick

2015-09-03 9:08 GMT-04:00 Ritesh Sinha <ku...@gmail.com>:

> If your workers are not starting try executing the command which
> supervisor executes to start the workers.You can get those commands from
> the supervisor logs. Execute the command to manually start the workers and
> see if you are getting any error.
>
> On Thu, Sep 3, 2015 at 6:30 PM, Abhishek Agarwal <ab...@gmail.com>
> wrote:
>
>> When you say that tasks do not start, do you mean that worker process
>> itself is not starting?
>>
>> On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com>
>> wrote:
>>
>>> Hi Nick,
>>>
>>> What do the nimbus and supervisor logs say? One or both may contain
>>> clues as to why your workers are not starting up.
>>>
>>> --John
>>>
>>> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org>
>>> wrote:
>>>
>>>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>>>> the behavior you describe. If I submit a sample topology with 1 spout
>>>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>>>> supervisor available (each with 12 worker slots), each of the 11
>>>> executors is running on a single worker of a single supervisor (host).
>>>>
>>>> I am not idea why you observe a different behavior...
>>>>
>>>> -Matthias
>>>>
>>>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>>>> > When I say co-locate, what I have seen in my experiments is the
>>>> following:
>>>> >
>>>> > If the executor's number can be served by workers on one node, the
>>>> > scheduler spawns all the executors in the workers of one node. I have
>>>> > also seen that behavior in that the default scheduler tries to fill up
>>>> > one node before provisioning an additional one for the topology.
>>>> >
>>>> > Going back to your following sentence "and the executors should be
>>>> > evenly distributed over all available workers." I have to say that I
>>>> do
>>>> > not see that often in my experiments. Actually, I often come across
>>>> with
>>>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>>>> > missing something? Is it just a coincidence that happened in my
>>>> experiments?
>>>> >
>>>> > Thank you,
>>>> > Nick
>>>> >
>>>> >
>>>> >
>>>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>> > <ma...@apache.org>>:
>>>> >
>>>> >     I agree. The load is not high.
>>>> >
>>>> >     About higher latencies. How many ackers did you configure? As a
>>>> rule of
>>>> >     thumb there should be one acker per executor. If you have less
>>>> ackers,
>>>> >     and an increasing number of executors, this might cause the
>>>> increased
>>>> >     latency as the ackers could become a bottleneck.
>>>> >
>>>> >     What do you mean by "trying to co-locate tasks and executors as
>>>> much as
>>>> >     possible"? Tasks a logical units of works that are processed by
>>>> >     executors (which are threads). Furthermore (as far as I know), the
>>>> >     default scheduler does a evenly distributed assignment for tasks
>>>> and
>>>> >     executor to the available workers. In you case, as you set the
>>>> number of
>>>> >     task equal to the number of executors, each executors processes a
>>>> single
>>>> >     task, and the executors should be evenly distributed over all
>>>> available
>>>> >     workers.
>>>> >
>>>> >     However, you are right: intra-worker channels are "cheaper" than
>>>> >     inter-worker channels. In order to exploit this, you should use
>>>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>>>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>>>> ensures
>>>> >     good load balancing.
>>>> >
>>>> >
>>>> >     -Matthias
>>>> >
>>>> >
>>>> >
>>>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>>>> >     > Well, my input load is 4 streams at 4000 tuples per second, and
>>>> each
>>>> >     > tuple is about 128 bytes long. Therefore, I do not think my
>>>> load is too
>>>> >     > much for my hardware.
>>>> >     >
>>>> >     > No, I am running only this topology in my cluster.
>>>> >     >
>>>> >     > For some reason, when I set the task to executor ratio to 1, my
>>>> topology
>>>> >     > does not hang at all. The strange thing now is that I see
>>>> higher latency
>>>> >     > with more executors and I am trying to figure this out. Also, I
>>>> see that
>>>> >     > the default scheduler is trying to co-locate tasks and
>>>> executors as much
>>>> >     > as possible. Is this true? If yes, is it because the
>>>> intra-worker
>>>> >     > latencies are much lower than the inter-worker latencies?
>>>> >     >
>>>> >     > Thanks,
>>>> >     > Nick
>>>> >     >
>>>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>>> <ma...@apache.org>
>>>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>>>> >     >
>>>> >     >     So (for each node) you have 4 cores available for 1
>>>> supervisor JVM, 2
>>>> >     >     worker JVMs that execute up to 5 thread each (if 40
>>>> executors are
>>>> >     >     distributed evenly over all workers. Thus, about 12 threads
>>>> for 4 cores.
>>>> >     >     Or course, Storm starts a few more threads within each
>>>> >     >     worker/supervisor.
>>>> >     >
>>>> >     >     If your load is not huge, this might be sufficient.
>>>> However, having high
>>>> >     >     data rate, it might be problematic.
>>>> >     >
>>>> >     >     One more question: do you run a single topology in your
>>>> cluster or
>>>> >     >     multiple? Storm isolates topologies for fault-tolerance
>>>> reasons. Thus, a
>>>> >     >     single worker cannot process executors from different
>>>> topologies. If you
>>>> >     >     run out of workers, a topology might not start up
>>>> completely.
>>>> >     >
>>>> >     >     -Matthias
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>>>> >     >     > Hello Matthias and thank you for your reply. See my
>>>> answers below:
>>>> >     >     >
>>>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>>>> m4.xlarge instances
>>>> >     >     > (4 cores per node). On top of that I have 3 more nodes
>>>> for zookeeper and
>>>> >     >     > nimbus.
>>>> >     >     > - 2 worker nodes per supervisor node
>>>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>>>> use 1:1 task to
>>>> >     >     > executor assignment.
>>>> >     >     > - The number of executors in total for the topology
>>>> ranges from 14 to 41
>>>> >     >     >
>>>> >     >     > Thanks,
>>>> >     >     > Nick
>>>> >     >     >
>>>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>>>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>>>> >     <ma...@apache.org>>
>>>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>>>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>>>> >     >     >
>>>> >     >     >     Without any exception/error message it is hard to
>>>> tell.
>>>> >     >     >
>>>> >     >     >     What is your cluster setup
>>>> >     >     >       - Hardware, ie, number of cores per node?
>>>> >     >     >       - How many node/supervisor are available?
>>>> >     >     >       - Configured number of workers for the topology?
>>>> >     >     >       - What is the number of task for each spout/bolt?
>>>> >     >     >       - What is the number of executors for each
>>>> spout/bolt?
>>>> >     >     >
>>>> >     >     >     -Matthias
>>>> >     >     >
>>>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>>>> >     >     >     > Hello all,
>>>> >     >     >     >
>>>> >     >     >     > I am working on a project in which I submit a
>>>> topology
>>>> >     to my
>>>> >     >     Storm
>>>> >     >     >     > cluster, but for some reason, some of my tasks do
>>>> not
>>>> >     start
>>>> >     >     executing.
>>>> >     >     >     >
>>>> >     >     >     > I can see that the above is happening because every
>>>> >     bolt I have
>>>> >     >     >     needs to
>>>> >     >     >     > connect to an external server and do a registration
>>>> to a
>>>> >     >     service.
>>>> >     >     >     > However, some of the bolts do not seem to connect.
>>>> >     >     >     >
>>>> >     >     >     > I have to say that the number of tasks I have is
>>>> >     larger than the
>>>> >     >     >     number
>>>> >     >     >     > of workers of my cluster. Also, I check my worker
>>>> log
>>>> >     files,
>>>> >     >     and I see
>>>> >     >     >     > that the workers that do not register, are also not
>>>> >     writing some
>>>> >     >     >     > initialization messages I have them print in the
>>>> >     beginning.
>>>> >     >     >     >
>>>> >     >     >     > Any idea why this is happening? Can it be because my
>>>> >     >     resources are not
>>>> >     >     >     > enough to start off all of the tasks?
>>>> >     >     >     >
>>>> >     >     >     > Thank you,
>>>> >     >     >     > Nick
>>>> >     >     >
>>>> >     >     >
>>>> >     >     >
>>>> >     >     >
>>>> >     >     > --
>>>> >     >     > Nikolaos Romanos Katsipoulakis,
>>>> >     >     > University of Pittsburgh, PhD candidate
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     > --
>>>> >     > Nikolaos Romanos Katsipoulakis,
>>>> >     > University of Pittsburgh, PhD candidate
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Nikolaos Romanos Katsipoulakis,
>>>> > University of Pittsburgh, PhD candidate
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Abhishek Agarwal
>>
>>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by Ritesh Sinha <ku...@gmail.com>.

If your workers are not starting try executing the command which supervisor
executes to start the workers.You can get those commands from the
supervisor logs. Execute the command to manually start the workers and see
if you are getting any error.

On Thu, Sep 3, 2015 at 6:30 PM, Abhishek Agarwal <ab...@gmail.com>
wrote:

> When you say that tasks do not start, do you mean that worker process
> itself is not starting?
>
> On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com>
> wrote:
>
>> Hi Nick,
>>
>> What do the nimbus and supervisor logs say? One or both may contain clues
>> as to why your workers are not starting up.
>>
>> --John
>>
>> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>
>>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>>> the behavior you describe. If I submit a sample topology with 1 spout
>>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>>> supervisor available (each with 12 worker slots), each of the 11
>>> executors is running on a single worker of a single supervisor (host).
>>>
>>> I am not idea why you observe a different behavior...
>>>
>>> -Matthias
>>>
>>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>>> > When I say co-locate, what I have seen in my experiments is the
>>> following:
>>> >
>>> > If the executor's number can be served by workers on one node, the
>>> > scheduler spawns all the executors in the workers of one node. I have
>>> > also seen that behavior in that the default scheduler tries to fill up
>>> > one node before provisioning an additional one for the topology.
>>> >
>>> > Going back to your following sentence "and the executors should be
>>> > evenly distributed over all available workers." I have to say that I do
>>> > not see that often in my experiments. Actually, I often come across
>>> with
>>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>>> > missing something? Is it just a coincidence that happened in my
>>> experiments?
>>> >
>>> > Thank you,
>>> > Nick
>>> >
>>> >
>>> >
>>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>> > <ma...@apache.org>>:
>>> >
>>> >     I agree. The load is not high.
>>> >
>>> >     About higher latencies. How many ackers did you configure? As a
>>> rule of
>>> >     thumb there should be one acker per executor. If you have less
>>> ackers,
>>> >     and an increasing number of executors, this might cause the
>>> increased
>>> >     latency as the ackers could become a bottleneck.
>>> >
>>> >     What do you mean by "trying to co-locate tasks and executors as
>>> much as
>>> >     possible"? Tasks a logical units of works that are processed by
>>> >     executors (which are threads). Furthermore (as far as I know), the
>>> >     default scheduler does a evenly distributed assignment for tasks
>>> and
>>> >     executor to the available workers. In you case, as you set the
>>> number of
>>> >     task equal to the number of executors, each executors processes a
>>> single
>>> >     task, and the executors should be evenly distributed over all
>>> available
>>> >     workers.
>>> >
>>> >     However, you are right: intra-worker channels are "cheaper" than
>>> >     inter-worker channels. In order to exploit this, you should use
>>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>>> ensures
>>> >     good load balancing.
>>> >
>>> >
>>> >     -Matthias
>>> >
>>> >
>>> >
>>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>>> >     > Well, my input load is 4 streams at 4000 tuples per second, and
>>> each
>>> >     > tuple is about 128 bytes long. Therefore, I do not think my load
>>> is too
>>> >     > much for my hardware.
>>> >     >
>>> >     > No, I am running only this topology in my cluster.
>>> >     >
>>> >     > For some reason, when I set the task to executor ratio to 1, my
>>> topology
>>> >     > does not hang at all. The strange thing now is that I see higher
>>> latency
>>> >     > with more executors and I am trying to figure this out. Also, I
>>> see that
>>> >     > the default scheduler is trying to co-locate tasks and executors
>>> as much
>>> >     > as possible. Is this true? If yes, is it because the intra-worker
>>> >     > latencies are much lower than the inter-worker latencies?
>>> >     >
>>> >     > Thanks,
>>> >     > Nick
>>> >     >
>>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>>> <ma...@apache.org>
>>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>>> >     >
>>> >     >     So (for each node) you have 4 cores available for 1
>>> supervisor JVM, 2
>>> >     >     worker JVMs that execute up to 5 thread each (if 40
>>> executors are
>>> >     >     distributed evenly over all workers. Thus, about 12 threads
>>> for 4 cores.
>>> >     >     Or course, Storm starts a few more threads within each
>>> >     >     worker/supervisor.
>>> >     >
>>> >     >     If your load is not huge, this might be sufficient. However,
>>> having high
>>> >     >     data rate, it might be problematic.
>>> >     >
>>> >     >     One more question: do you run a single topology in your
>>> cluster or
>>> >     >     multiple? Storm isolates topologies for fault-tolerance
>>> reasons. Thus, a
>>> >     >     single worker cannot process executors from different
>>> topologies. If you
>>> >     >     run out of workers, a topology might not start up completely.
>>> >     >
>>> >     >     -Matthias
>>> >     >
>>> >     >
>>> >     >
>>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>>> >     >     > Hello Matthias and thank you for your reply. See my
>>> answers below:
>>> >     >     >
>>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>>> m4.xlarge instances
>>> >     >     > (4 cores per node). On top of that I have 3 more nodes for
>>> zookeeper and
>>> >     >     > nimbus.
>>> >     >     > - 2 worker nodes per supervisor node
>>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>>> use 1:1 task to
>>> >     >     > executor assignment.
>>> >     >     > - The number of executors in total for the topology ranges
>>> from 14 to 41
>>> >     >     >
>>> >     >     > Thanks,
>>> >     >     > Nick
>>> >     >     >
>>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>>> >     <ma...@apache.org>>
>>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>>> >     >     >
>>> >     >     >     Without any exception/error message it is hard to tell.
>>> >     >     >
>>> >     >     >     What is your cluster setup
>>> >     >     >       - Hardware, ie, number of cores per node?
>>> >     >     >       - How many node/supervisor are available?
>>> >     >     >       - Configured number of workers for the topology?
>>> >     >     >       - What is the number of task for each spout/bolt?
>>> >     >     >       - What is the number of executors for each
>>> spout/bolt?
>>> >     >     >
>>> >     >     >     -Matthias
>>> >     >     >
>>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>>> >     >     >     > Hello all,
>>> >     >     >     >
>>> >     >     >     > I am working on a project in which I submit a
>>> topology
>>> >     to my
>>> >     >     Storm
>>> >     >     >     > cluster, but for some reason, some of my tasks do not
>>> >     start
>>> >     >     executing.
>>> >     >     >     >
>>> >     >     >     > I can see that the above is happening because every
>>> >     bolt I have
>>> >     >     >     needs to
>>> >     >     >     > connect to an external server and do a registration
>>> to a
>>> >     >     service.
>>> >     >     >     > However, some of the bolts do not seem to connect.
>>> >     >     >     >
>>> >     >     >     > I have to say that the number of tasks I have is
>>> >     larger than the
>>> >     >     >     number
>>> >     >     >     > of workers of my cluster. Also, I check my worker log
>>> >     files,
>>> >     >     and I see
>>> >     >     >     > that the workers that do not register, are also not
>>> >     writing some
>>> >     >     >     > initialization messages I have them print in the
>>> >     beginning.
>>> >     >     >     >
>>> >     >     >     > Any idea why this is happening? Can it be because my
>>> >     >     resources are not
>>> >     >     >     > enough to start off all of the tasks?
>>> >     >     >     >
>>> >     >     >     > Thank you,
>>> >     >     >     > Nick
>>> >     >     >
>>> >     >     >
>>> >     >     >
>>> >     >     >
>>> >     >     > --
>>> >     >     > Nikolaos Romanos Katsipoulakis,
>>> >     >     > University of Pittsburgh, PhD candidate
>>> >     >
>>> >     >
>>> >     >
>>> >     >
>>> >     > --
>>> >     > Nikolaos Romanos Katsipoulakis,
>>> >     > University of Pittsburgh, PhD candidate
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Nikolaos Romanos Katsipoulakis,
>>> > University of Pittsburgh, PhD candidate
>>>
>>>
>>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>

Re: Tasks are not starting

Posted by Abhishek Agarwal <ab...@gmail.com>.

When you say that tasks do not start, do you mean that worker process
itself is not starting?

On Thu, Sep 3, 2015 at 5:20 PM, John Yost <so...@gmail.com> wrote:

> Hi Nick,
>
> What do the nimbus and supervisor logs say? One or both may contain clues
> as to why your workers are not starting up.
>
> --John
>
> On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org> wrote:
>
>> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
>> the behavior you describe. If I submit a sample topology with 1 spout
>> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
>> supervisor available (each with 12 worker slots), each of the 11
>> executors is running on a single worker of a single supervisor (host).
>>
>> I am not idea why you observe a different behavior...
>>
>> -Matthias
>>
>> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
>> > When I say co-locate, what I have seen in my experiments is the
>> following:
>> >
>> > If the executor's number can be served by workers on one node, the
>> > scheduler spawns all the executors in the workers of one node. I have
>> > also seen that behavior in that the default scheduler tries to fill up
>> > one node before provisioning an additional one for the topology.
>> >
>> > Going back to your following sentence "and the executors should be
>> > evenly distributed over all available workers." I have to say that I do
>> > not see that often in my experiments. Actually, I often come across with
>> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
>> > missing something? Is it just a coincidence that happened in my
>> experiments?
>> >
>> > Thank you,
>> > Nick
>> >
>> >
>> >
>> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>> > <ma...@apache.org>>:
>> >
>> >     I agree. The load is not high.
>> >
>> >     About higher latencies. How many ackers did you configure? As a
>> rule of
>> >     thumb there should be one acker per executor. If you have less
>> ackers,
>> >     and an increasing number of executors, this might cause the
>> increased
>> >     latency as the ackers could become a bottleneck.
>> >
>> >     What do you mean by "trying to co-locate tasks and executors as
>> much as
>> >     possible"? Tasks a logical units of works that are processed by
>> >     executors (which are threads). Furthermore (as far as I know), the
>> >     default scheduler does a evenly distributed assignment for tasks and
>> >     executor to the available workers. In you case, as you set the
>> number of
>> >     task equal to the number of executors, each executors processes a
>> single
>> >     task, and the executors should be evenly distributed over all
>> available
>> >     workers.
>> >
>> >     However, you are right: intra-worker channels are "cheaper" than
>> >     inter-worker channels. In order to exploit this, you should use
>> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
>> >     shuffle-or-local might be missing load-balancing. Shuffle always
>> ensures
>> >     good load balancing.
>> >
>> >
>> >     -Matthias
>> >
>> >
>> >
>> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>> >     > Well, my input load is 4 streams at 4000 tuples per second, and
>> each
>> >     > tuple is about 128 bytes long. Therefore, I do not think my load
>> is too
>> >     > much for my hardware.
>> >     >
>> >     > No, I am running only this topology in my cluster.
>> >     >
>> >     > For some reason, when I set the task to executor ratio to 1, my
>> topology
>> >     > does not hang at all. The strange thing now is that I see higher
>> latency
>> >     > with more executors and I am trying to figure this out. Also, I
>> see that
>> >     > the default scheduler is trying to co-locate tasks and executors
>> as much
>> >     > as possible. Is this true? If yes, is it because the intra-worker
>> >     > latencies are much lower than the inter-worker latencies?
>> >     >
>> >     > Thanks,
>> >     > Nick
>> >     >
>> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
>> <ma...@apache.org>
>> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>> >     >
>> >     >     So (for each node) you have 4 cores available for 1
>> supervisor JVM, 2
>> >     >     worker JVMs that execute up to 5 thread each (if 40 executors
>> are
>> >     >     distributed evenly over all workers. Thus, about 12 threads
>> for 4 cores.
>> >     >     Or course, Storm starts a few more threads within each
>> >     >     worker/supervisor.
>> >     >
>> >     >     If your load is not huge, this might be sufficient. However,
>> having high
>> >     >     data rate, it might be problematic.
>> >     >
>> >     >     One more question: do you run a single topology in your
>> cluster or
>> >     >     multiple? Storm isolates topologies for fault-tolerance
>> reasons. Thus, a
>> >     >     single worker cannot process executors from different
>> topologies. If you
>> >     >     run out of workers, a topology might not start up completely.
>> >     >
>> >     >     -Matthias
>> >     >
>> >     >
>> >     >
>> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>> >     >     > Hello Matthias and thank you for your reply. See my answers
>> below:
>> >     >     >
>> >     >     > - I have a 4 supervisor nodes in my AWS cluster of
>> m4.xlarge instances
>> >     >     > (4 cores per node). On top of that I have 3 more nodes for
>> zookeeper and
>> >     >     > nimbus.
>> >     >     > - 2 worker nodes per supervisor node
>> >     >     > - The task number for each bolt ranges from 1 to 4 and I
>> use 1:1 task to
>> >     >     > executor assignment.
>> >     >     > - The number of executors in total for the topology ranges
>> from 14 to 41
>> >     >     >
>> >     >     > Thanks,
>> >     >     > Nick
>> >     >     >
>> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <
>> mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>> >     <ma...@apache.org>>
>> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>> >     >     >
>> >     >     >     Without any exception/error message it is hard to tell.
>> >     >     >
>> >     >     >     What is your cluster setup
>> >     >     >       - Hardware, ie, number of cores per node?
>> >     >     >       - How many node/supervisor are available?
>> >     >     >       - Configured number of workers for the topology?
>> >     >     >       - What is the number of task for each spout/bolt?
>> >     >     >       - What is the number of executors for each spout/bolt?
>> >     >     >
>> >     >     >     -Matthias
>> >     >     >
>> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>> >     >     >     > Hello all,
>> >     >     >     >
>> >     >     >     > I am working on a project in which I submit a topology
>> >     to my
>> >     >     Storm
>> >     >     >     > cluster, but for some reason, some of my tasks do not
>> >     start
>> >     >     executing.
>> >     >     >     >
>> >     >     >     > I can see that the above is happening because every
>> >     bolt I have
>> >     >     >     needs to
>> >     >     >     > connect to an external server and do a registration
>> to a
>> >     >     service.
>> >     >     >     > However, some of the bolts do not seem to connect.
>> >     >     >     >
>> >     >     >     > I have to say that the number of tasks I have is
>> >     larger than the
>> >     >     >     number
>> >     >     >     > of workers of my cluster. Also, I check my worker log
>> >     files,
>> >     >     and I see
>> >     >     >     > that the workers that do not register, are also not
>> >     writing some
>> >     >     >     > initialization messages I have them print in the
>> >     beginning.
>> >     >     >     >
>> >     >     >     > Any idea why this is happening? Can it be because my
>> >     >     resources are not
>> >     >     >     > enough to start off all of the tasks?
>> >     >     >     >
>> >     >     >     > Thank you,
>> >     >     >     > Nick
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     > --
>> >     >     > Nikolaos Romanos Katsipoulakis,
>> >     >     > University of Pittsburgh, PhD candidate
>> >     >
>> >     >
>> >     >
>> >     >
>> >     > --
>> >     > Nikolaos Romanos Katsipoulakis,
>> >     > University of Pittsburgh, PhD candidate
>> >
>> >
>> >
>> >
>> > --
>> > Nikolaos Romanos Katsipoulakis,
>> > University of Pittsburgh, PhD candidate
>>
>>
>


-- 
Regards,
Abhishek Agarwal

Re: Tasks are not starting

Posted by John Yost <so...@gmail.com>.

Hi Nick,

What do the nimbus and supervisor logs say? One or both may contain clues
as to why your workers are not starting up.

--John

On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <mj...@apache.org> wrote:

> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
> the behavior you describe. If I submit a sample topology with 1 spout
> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
> supervisor available (each with 12 worker slots), each of the 11
> executors is running on a single worker of a single supervisor (host).
>
> I am not idea why you observe a different behavior...
>
> -Matthias
>
> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
> > When I say co-locate, what I have seen in my experiments is the
> following:
> >
> > If the executor's number can be served by workers on one node, the
> > scheduler spawns all the executors in the workers of one node. I have
> > also seen that behavior in that the default scheduler tries to fill up
> > one node before provisioning an additional one for the topology.
> >
> > Going back to your following sentence "and the executors should be
> > evenly distributed over all available workers." I have to say that I do
> > not see that often in my experiments. Actually, I often come across with
> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
> > missing something? Is it just a coincidence that happened in my
> experiments?
> >
> > Thank you,
> > Nick
> >
> >
> >
> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> > <ma...@apache.org>>:
> >
> >     I agree. The load is not high.
> >
> >     About higher latencies. How many ackers did you configure? As a rule
> of
> >     thumb there should be one acker per executor. If you have less
> ackers,
> >     and an increasing number of executors, this might cause the increased
> >     latency as the ackers could become a bottleneck.
> >
> >     What do you mean by "trying to co-locate tasks and executors as much
> as
> >     possible"? Tasks a logical units of works that are processed by
> >     executors (which are threads). Furthermore (as far as I know), the
> >     default scheduler does a evenly distributed assignment for tasks and
> >     executor to the available workers. In you case, as you set the
> number of
> >     task equal to the number of executors, each executors processes a
> single
> >     task, and the executors should be evenly distributed over all
> available
> >     workers.
> >
> >     However, you are right: intra-worker channels are "cheaper" than
> >     inter-worker channels. In order to exploit this, you should use
> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
> >     shuffle-or-local might be missing load-balancing. Shuffle always
> ensures
> >     good load balancing.
> >
> >
> >     -Matthias
> >
> >
> >
> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
> >     > Well, my input load is 4 streams at 4000 tuples per second, and
> each
> >     > tuple is about 128 bytes long. Therefore, I do not think my load
> is too
> >     > much for my hardware.
> >     >
> >     > No, I am running only this topology in my cluster.
> >     >
> >     > For some reason, when I set the task to executor ratio to 1, my
> topology
> >     > does not hang at all. The strange thing now is that I see higher
> latency
> >     > with more executors and I am trying to figure this out. Also, I
> see that
> >     > the default scheduler is trying to co-locate tasks and executors
> as much
> >     > as possible. Is this true? If yes, is it because the intra-worker
> >     > latencies are much lower than the inter-worker latencies?
> >     >
> >     > Thanks,
> >     > Nick
> >     >
> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>
> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
> >     >
> >     >     So (for each node) you have 4 cores available for 1 supervisor
> JVM, 2
> >     >     worker JVMs that execute up to 5 thread each (if 40 executors
> are
> >     >     distributed evenly over all workers. Thus, about 12 threads
> for 4 cores.
> >     >     Or course, Storm starts a few more threads within each
> >     >     worker/supervisor.
> >     >
> >     >     If your load is not huge, this might be sufficient. However,
> having high
> >     >     data rate, it might be problematic.
> >     >
> >     >     One more question: do you run a single topology in your
> cluster or
> >     >     multiple? Storm isolates topologies for fault-tolerance
> reasons. Thus, a
> >     >     single worker cannot process executors from different
> topologies. If you
> >     >     run out of workers, a topology might not start up completely.
> >     >
> >     >     -Matthias
> >     >
> >     >
> >     >
> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> >     >     > Hello Matthias and thank you for your reply. See my answers
> below:
> >     >     >
> >     >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge
> instances
> >     >     > (4 cores per node). On top of that I have 3 more nodes for
> zookeeper and
> >     >     > nimbus.
> >     >     > - 2 worker nodes per supervisor node
> >     >     > - The task number for each bolt ranges from 1 to 4 and I use
> 1:1 task to
> >     >     > executor assignment.
> >     >     > - The number of executors in total for the topology ranges
> from 14 to 41
> >     >     >
> >     >     > Thanks,
> >     >     > Nick
> >     >     >
> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org> <mailto:mjsax@apache.org
> >     <ma...@apache.org>>
> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
> >     >     >
> >     >     >     Without any exception/error message it is hard to tell.
> >     >     >
> >     >     >     What is your cluster setup
> >     >     >       - Hardware, ie, number of cores per node?
> >     >     >       - How many node/supervisor are available?
> >     >     >       - Configured number of workers for the topology?
> >     >     >       - What is the number of task for each spout/bolt?
> >     >     >       - What is the number of executors for each spout/bolt?
> >     >     >
> >     >     >     -Matthias
> >     >     >
> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> >     >     >     > Hello all,
> >     >     >     >
> >     >     >     > I am working on a project in which I submit a topology
> >     to my
> >     >     Storm
> >     >     >     > cluster, but for some reason, some of my tasks do not
> >     start
> >     >     executing.
> >     >     >     >
> >     >     >     > I can see that the above is happening because every
> >     bolt I have
> >     >     >     needs to
> >     >     >     > connect to an external server and do a registration to
> a
> >     >     service.
> >     >     >     > However, some of the bolts do not seem to connect.
> >     >     >     >
> >     >     >     > I have to say that the number of tasks I have is
> >     larger than the
> >     >     >     number
> >     >     >     > of workers of my cluster. Also, I check my worker log
> >     files,
> >     >     and I see
> >     >     >     > that the workers that do not register, are also not
> >     writing some
> >     >     >     > initialization messages I have them print in the
> >     beginning.
> >     >     >     >
> >     >     >     > Any idea why this is happening? Can it be because my
> >     >     resources are not
> >     >     >     > enough to start off all of the tasks?
> >     >     >     >
> >     >     >     > Thank you,
> >     >     >     > Nick
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > --
> >     >     > Nikolaos Romanos Katsipoulakis,
> >     >     > University of Pittsburgh, PhD candidate
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Nikolaos Romanos Katsipoulakis,
> >     > University of Pittsburgh, PhD candidate
> >
> >
> >
> >
> > --
> > Nikolaos Romanos Katsipoulakis,
> > University of Pittsburgh, PhD candidate
>
>

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello Matthias,

After carefully reviewing my log files, I have to say you were right. I was
wrong to think that all the tasks were spawned in the same node (worker),
and I was confused by the fact that only one node's metrics file was
populated. However, that does not mean that all my tasks were spawned on
workers on one node. It simply correlates with the fact that I had one
metrics accumulation task in my topology through the following config
parameter:

conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 1);


Thanks again for your helpful comments.

Cheers,
Nick

2015-09-03 4:44 GMT-04:00 Matthias J. Sax <mj...@apache.org>:

> I am currently working with version 0.11.0-SNAPSHOT and cannot observe
> the behavior you describe. If I submit a sample topology with 1 spout
> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
> supervisor available (each with 12 worker slots), each of the 11
> executors is running on a single worker of a single supervisor (host).
>
> I am not idea why you observe a different behavior...
>
> -Matthias
>
> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
> > When I say co-locate, what I have seen in my experiments is the
> following:
> >
> > If the executor's number can be served by workers on one node, the
> > scheduler spawns all the executors in the workers of one node. I have
> > also seen that behavior in that the default scheduler tries to fill up
> > one node before provisioning an additional one for the topology.
> >
> > Going back to your following sentence "and the executors should be
> > evenly distributed over all available workers." I have to say that I do
> > not see that often in my experiments. Actually, I often come across with
> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
> > missing something? Is it just a coincidence that happened in my
> experiments?
> >
> > Thank you,
> > Nick
> >
> >
> >
> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> > <ma...@apache.org>>:
> >
> >     I agree. The load is not high.
> >
> >     About higher latencies. How many ackers did you configure? As a rule
> of
> >     thumb there should be one acker per executor. If you have less
> ackers,
> >     and an increasing number of executors, this might cause the increased
> >     latency as the ackers could become a bottleneck.
> >
> >     What do you mean by "trying to co-locate tasks and executors as much
> as
> >     possible"? Tasks a logical units of works that are processed by
> >     executors (which are threads). Furthermore (as far as I know), the
> >     default scheduler does a evenly distributed assignment for tasks and
> >     executor to the available workers. In you case, as you set the
> number of
> >     task equal to the number of executors, each executors processes a
> single
> >     task, and the executors should be evenly distributed over all
> available
> >     workers.
> >
> >     However, you are right: intra-worker channels are "cheaper" than
> >     inter-worker channels. In order to exploit this, you should use
> >     shuffle-or-local grouping instead of shuffle. The disadvantage of
> >     shuffle-or-local might be missing load-balancing. Shuffle always
> ensures
> >     good load balancing.
> >
> >
> >     -Matthias
> >
> >
> >
> >     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
> >     > Well, my input load is 4 streams at 4000 tuples per second, and
> each
> >     > tuple is about 128 bytes long. Therefore, I do not think my load
> is too
> >     > much for my hardware.
> >     >
> >     > No, I am running only this topology in my cluster.
> >     >
> >     > For some reason, when I set the task to executor ratio to 1, my
> topology
> >     > does not hang at all. The strange thing now is that I see higher
> latency
> >     > with more executors and I am trying to figure this out. Also, I
> see that
> >     > the default scheduler is trying to co-locate tasks and executors
> as much
> >     > as possible. Is this true? If yes, is it because the intra-worker
> >     > latencies are much lower than the inter-worker latencies?
> >     >
> >     > Thanks,
> >     > Nick
> >     >
> >     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>
> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
> >     >
> >     >     So (for each node) you have 4 cores available for 1 supervisor
> JVM, 2
> >     >     worker JVMs that execute up to 5 thread each (if 40 executors
> are
> >     >     distributed evenly over all workers. Thus, about 12 threads
> for 4 cores.
> >     >     Or course, Storm starts a few more threads within each
> >     >     worker/supervisor.
> >     >
> >     >     If your load is not huge, this might be sufficient. However,
> having high
> >     >     data rate, it might be problematic.
> >     >
> >     >     One more question: do you run a single topology in your
> cluster or
> >     >     multiple? Storm isolates topologies for fault-tolerance
> reasons. Thus, a
> >     >     single worker cannot process executors from different
> topologies. If you
> >     >     run out of workers, a topology might not start up completely.
> >     >
> >     >     -Matthias
> >     >
> >     >
> >     >
> >     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> >     >     > Hello Matthias and thank you for your reply. See my answers
> below:
> >     >     >
> >     >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge
> instances
> >     >     > (4 cores per node). On top of that I have 3 more nodes for
> zookeeper and
> >     >     > nimbus.
> >     >     > - 2 worker nodes per supervisor node
> >     >     > - The task number for each bolt ranges from 1 to 4 and I use
> 1:1 task to
> >     >     > executor assignment.
> >     >     > - The number of executors in total for the topology ranges
> from 14 to 41
> >     >     >
> >     >     > Thanks,
> >     >     > Nick
> >     >     >
> >     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org> <mailto:mjsax@apache.org
> >     <ma...@apache.org>>
> >     >     > <mailto:mjsax@apache.org <ma...@apache.org>
> >     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
> >     >     >
> >     >     >     Without any exception/error message it is hard to tell.
> >     >     >
> >     >     >     What is your cluster setup
> >     >     >       - Hardware, ie, number of cores per node?
> >     >     >       - How many node/supervisor are available?
> >     >     >       - Configured number of workers for the topology?
> >     >     >       - What is the number of task for each spout/bolt?
> >     >     >       - What is the number of executors for each spout/bolt?
> >     >     >
> >     >     >     -Matthias
> >     >     >
> >     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> >     >     >     > Hello all,
> >     >     >     >
> >     >     >     > I am working on a project in which I submit a topology
> >     to my
> >     >     Storm
> >     >     >     > cluster, but for some reason, some of my tasks do not
> >     start
> >     >     executing.
> >     >     >     >
> >     >     >     > I can see that the above is happening because every
> >     bolt I have
> >     >     >     needs to
> >     >     >     > connect to an external server and do a registration to
> a
> >     >     service.
> >     >     >     > However, some of the bolts do not seem to connect.
> >     >     >     >
> >     >     >     > I have to say that the number of tasks I have is
> >     larger than the
> >     >     >     number
> >     >     >     > of workers of my cluster. Also, I check my worker log
> >     files,
> >     >     and I see
> >     >     >     > that the workers that do not register, are also not
> >     writing some
> >     >     >     > initialization messages I have them print in the
> >     beginning.
> >     >     >     >
> >     >     >     > Any idea why this is happening? Can it be because my
> >     >     resources are not
> >     >     >     > enough to start off all of the tasks?
> >     >     >     >
> >     >     >     > Thank you,
> >     >     >     > Nick
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > --
> >     >     > Nikolaos Romanos Katsipoulakis,
> >     >     > University of Pittsburgh, PhD candidate
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Nikolaos Romanos Katsipoulakis,
> >     > University of Pittsburgh, PhD candidate
> >
> >
> >
> >
> > --
> > Nikolaos Romanos Katsipoulakis,
> > University of Pittsburgh, PhD candidate
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Matthias J. Sax" <mj...@apache.org>.

I am currently working with version 0.11.0-SNAPSHOT and cannot observe
the behavior you describe. If I submit a sample topology with 1 spout
(dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
supervisor available (each with 12 worker slots), each of the 11
executors is running on a single worker of a single supervisor (host).

I am not idea why you observe a different behavior...

-Matthias

On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
> When I say co-locate, what I have seen in my experiments is the following:
> 
> If the executor's number can be served by workers on one node, the
> scheduler spawns all the executors in the workers of one node. I have
> also seen that behavior in that the default scheduler tries to fill up
> one node before provisioning an additional one for the topology.
> 
> Going back to your following sentence "and the executors should be
> evenly distributed over all available workers." I have to say that I do
> not see that often in my experiments. Actually, I often come across with
> workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
> missing something? Is it just a coincidence that happened in my experiments?
> 
> Thank you,
> Nick
> 
> 
> 
> 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>>:
> 
>     I agree. The load is not high.
> 
>     About higher latencies. How many ackers did you configure? As a rule of
>     thumb there should be one acker per executor. If you have less ackers,
>     and an increasing number of executors, this might cause the increased
>     latency as the ackers could become a bottleneck.
> 
>     What do you mean by "trying to co-locate tasks and executors as much as
>     possible"? Tasks a logical units of works that are processed by
>     executors (which are threads). Furthermore (as far as I know), the
>     default scheduler does a evenly distributed assignment for tasks and
>     executor to the available workers. In you case, as you set the number of
>     task equal to the number of executors, each executors processes a single
>     task, and the executors should be evenly distributed over all available
>     workers.
> 
>     However, you are right: intra-worker channels are "cheaper" than
>     inter-worker channels. In order to exploit this, you should use
>     shuffle-or-local grouping instead of shuffle. The disadvantage of
>     shuffle-or-local might be missing load-balancing. Shuffle always ensures
>     good load balancing.
> 
> 
>     -Matthias
> 
> 
> 
>     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>     > Well, my input load is 4 streams at 4000 tuples per second, and each
>     > tuple is about 128 bytes long. Therefore, I do not think my load is too
>     > much for my hardware.
>     >
>     > No, I am running only this topology in my cluster.
>     >
>     > For some reason, when I set the task to executor ratio to 1, my topology
>     > does not hang at all. The strange thing now is that I see higher latency
>     > with more executors and I am trying to figure this out. Also, I see that
>     > the default scheduler is trying to co-locate tasks and executors as much
>     > as possible. Is this true? If yes, is it because the intra-worker
>     > latencies are much lower than the inter-worker latencies?
>     >
>     > Thanks,
>     > Nick
>     >
>     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org <ma...@apache.org>
>     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>     >
>     >     So (for each node) you have 4 cores available for 1 supervisor JVM, 2
>     >     worker JVMs that execute up to 5 thread each (if 40 executors are
>     >     distributed evenly over all workers. Thus, about 12 threads for 4 cores.
>     >     Or course, Storm starts a few more threads within each
>     >     worker/supervisor.
>     >
>     >     If your load is not huge, this might be sufficient. However, having high
>     >     data rate, it might be problematic.
>     >
>     >     One more question: do you run a single topology in your cluster or
>     >     multiple? Storm isolates topologies for fault-tolerance reasons. Thus, a
>     >     single worker cannot process executors from different topologies. If you
>     >     run out of workers, a topology might not start up completely.
>     >
>     >     -Matthias
>     >
>     >
>     >
>     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>     >     > Hello Matthias and thank you for your reply. See my answers below:
>     >     >
>     >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances
>     >     > (4 cores per node). On top of that I have 3 more nodes for zookeeper and
>     >     > nimbus.
>     >     > - 2 worker nodes per supervisor node
>     >     > - The task number for each bolt ranges from 1 to 4 and I use 1:1 task to
>     >     > executor assignment.
>     >     > - The number of executors in total for the topology ranges from 14 to 41
>     >     >
>     >     > Thanks,
>     >     > Nick
>     >     >
>     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org <ma...@apache.org> <mailto:mjsax@apache.org
>     <ma...@apache.org>>
>     >     > <mailto:mjsax@apache.org <ma...@apache.org>
>     <mailto:mjsax@apache.org <ma...@apache.org>>>>:
>     >     >
>     >     >     Without any exception/error message it is hard to tell.
>     >     >
>     >     >     What is your cluster setup
>     >     >       - Hardware, ie, number of cores per node?
>     >     >       - How many node/supervisor are available?
>     >     >       - Configured number of workers for the topology?
>     >     >       - What is the number of task for each spout/bolt?
>     >     >       - What is the number of executors for each spout/bolt?
>     >     >
>     >     >     -Matthias
>     >     >
>     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>     >     >     > Hello all,
>     >     >     >
>     >     >     > I am working on a project in which I submit a topology
>     to my
>     >     Storm
>     >     >     > cluster, but for some reason, some of my tasks do not
>     start
>     >     executing.
>     >     >     >
>     >     >     > I can see that the above is happening because every
>     bolt I have
>     >     >     needs to
>     >     >     > connect to an external server and do a registration to a
>     >     service.
>     >     >     > However, some of the bolts do not seem to connect.
>     >     >     >
>     >     >     > I have to say that the number of tasks I have is
>     larger than the
>     >     >     number
>     >     >     > of workers of my cluster. Also, I check my worker log
>     files,
>     >     and I see
>     >     >     > that the workers that do not register, are also not
>     writing some
>     >     >     > initialization messages I have them print in the
>     beginning.
>     >     >     >
>     >     >     > Any idea why this is happening? Can it be because my
>     >     resources are not
>     >     >     > enough to start off all of the tasks?
>     >     >     >
>     >     >     > Thank you,
>     >     >     > Nick
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > --
>     >     > Nikolaos Romanos Katsipoulakis,
>     >     > University of Pittsburgh, PhD candidate
>     >
>     >
>     >
>     >
>     > --
>     > Nikolaos Romanos Katsipoulakis,
>     > University of Pittsburgh, PhD candidate
> 
> 
> 
> 
> -- 
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

When I say co-locate, what I have seen in my experiments is the following:

If the executor's number can be served by workers on one node, the
scheduler spawns all the executors in the workers of one node. I have also
seen that behavior in that the default scheduler tries to fill up one node
before provisioning an additional one for the topology.

Going back to your following sentence "and the executors should be evenly
distributed over all available workers." I have to say that I do not see
that often in my experiments. Actually, I often come across with workers
handling 2 - 3 executors/tasks, and other doing nothing. Am I missing
something? Is it just a coincidence that happened in my experiments?

Thank you,
Nick



2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mj...@apache.org>:

> I agree. The load is not high.
>
> About higher latencies. How many ackers did you configure? As a rule of
> thumb there should be one acker per executor. If you have less ackers,
> and an increasing number of executors, this might cause the increased
> latency as the ackers could become a bottleneck.
>
> What do you mean by "trying to co-locate tasks and executors as much as
> possible"? Tasks a logical units of works that are processed by
> executors (which are threads). Furthermore (as far as I know), the
> default scheduler does a evenly distributed assignment for tasks and
> executor to the available workers. In you case, as you set the number of
> task equal to the number of executors, each executors processes a single
> task, and the executors should be evenly distributed over all available
> workers.
>
> However, you are right: intra-worker channels are "cheaper" than
> inter-worker channels. In order to exploit this, you should use
> shuffle-or-local grouping instead of shuffle. The disadvantage of
> shuffle-or-local might be missing load-balancing. Shuffle always ensures
> good load balancing.
>
>
> -Matthias
>
>
>
> On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
> > Well, my input load is 4 streams at 4000 tuples per second, and each
> > tuple is about 128 bytes long. Therefore, I do not think my load is too
> > much for my hardware.
> >
> > No, I am running only this topology in my cluster.
> >
> > For some reason, when I set the task to executor ratio to 1, my topology
> > does not hang at all. The strange thing now is that I see higher latency
> > with more executors and I am trying to figure this out. Also, I see that
> > the default scheduler is trying to co-locate tasks and executors as much
> > as possible. Is this true? If yes, is it because the intra-worker
> > latencies are much lower than the inter-worker latencies?
> >
> > Thanks,
> > Nick
> >
> > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> > <ma...@apache.org>>:
> >
> >     So (for each node) you have 4 cores available for 1 supervisor JVM, 2
> >     worker JVMs that execute up to 5 thread each (if 40 executors are
> >     distributed evenly over all workers. Thus, about 12 threads for 4
> cores.
> >     Or course, Storm starts a few more threads within each
> >     worker/supervisor.
> >
> >     If your load is not huge, this might be sufficient. However, having
> high
> >     data rate, it might be problematic.
> >
> >     One more question: do you run a single topology in your cluster or
> >     multiple? Storm isolates topologies for fault-tolerance reasons.
> Thus, a
> >     single worker cannot process executors from different topologies. If
> you
> >     run out of workers, a topology might not start up completely.
> >
> >     -Matthias
> >
> >
> >
> >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> >     > Hello Matthias and thank you for your reply. See my answers below:
> >     >
> >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge
> instances
> >     > (4 cores per node). On top of that I have 3 more nodes for
> zookeeper and
> >     > nimbus.
> >     > - 2 worker nodes per supervisor node
> >     > - The task number for each bolt ranges from 1 to 4 and I use 1:1
> task to
> >     > executor assignment.
> >     > - The number of executors in total for the topology ranges from 14
> to 41
> >     >
> >     > Thanks,
> >     > Nick
> >     >
> >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>
> >     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
> >     >
> >     >     Without any exception/error message it is hard to tell.
> >     >
> >     >     What is your cluster setup
> >     >       - Hardware, ie, number of cores per node?
> >     >       - How many node/supervisor are available?
> >     >       - Configured number of workers for the topology?
> >     >       - What is the number of task for each spout/bolt?
> >     >       - What is the number of executors for each spout/bolt?
> >     >
> >     >     -Matthias
> >     >
> >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> >     >     > Hello all,
> >     >     >
> >     >     > I am working on a project in which I submit a topology to my
> >     Storm
> >     >     > cluster, but for some reason, some of my tasks do not start
> >     executing.
> >     >     >
> >     >     > I can see that the above is happening because every bolt I
> have
> >     >     needs to
> >     >     > connect to an external server and do a registration to a
> >     service.
> >     >     > However, some of the bolts do not seem to connect.
> >     >     >
> >     >     > I have to say that the number of tasks I have is larger than
> the
> >     >     number
> >     >     > of workers of my cluster. Also, I check my worker log files,
> >     and I see
> >     >     > that the workers that do not register, are also not writing
> some
> >     >     > initialization messages I have them print in the beginning.
> >     >     >
> >     >     > Any idea why this is happening? Can it be because my
> >     resources are not
> >     >     > enough to start off all of the tasks?
> >     >     >
> >     >     > Thank you,
> >     >     > Nick
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Nikolaos Romanos Katsipoulakis,
> >     > University of Pittsburgh, PhD candidate
> >
> >
> >
> >
> > --
> > Nikolaos Romanos Katsipoulakis,
> > University of Pittsburgh, PhD candidate
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Matthias J. Sax" <mj...@apache.org>.

I agree. The load is not high.

About higher latencies. How many ackers did you configure? As a rule of
thumb there should be one acker per executor. If you have less ackers,
and an increasing number of executors, this might cause the increased
latency as the ackers could become a bottleneck.

What do you mean by "trying to co-locate tasks and executors as much as
possible"? Tasks a logical units of works that are processed by
executors (which are threads). Furthermore (as far as I know), the
default scheduler does a evenly distributed assignment for tasks and
executor to the available workers. In you case, as you set the number of
task equal to the number of executors, each executors processes a single
task, and the executors should be evenly distributed over all available
workers.

However, you are right: intra-worker channels are "cheaper" than
inter-worker channels. In order to exploit this, you should use
shuffle-or-local grouping instead of shuffle. The disadvantage of
shuffle-or-local might be missing load-balancing. Shuffle always ensures
good load balancing.


-Matthias



On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
> Well, my input load is 4 streams at 4000 tuples per second, and each
> tuple is about 128 bytes long. Therefore, I do not think my load is too
> much for my hardware.
> 
> No, I am running only this topology in my cluster.
> 
> For some reason, when I set the task to executor ratio to 1, my topology
> does not hang at all. The strange thing now is that I see higher latency
> with more executors and I am trying to figure this out. Also, I see that
> the default scheduler is trying to co-locate tasks and executors as much
> as possible. Is this true? If yes, is it because the intra-worker
> latencies are much lower than the inter-worker latencies?
> 
> Thanks, 
> Nick
> 
> 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>>:
> 
>     So (for each node) you have 4 cores available for 1 supervisor JVM, 2
>     worker JVMs that execute up to 5 thread each (if 40 executors are
>     distributed evenly over all workers. Thus, about 12 threads for 4 cores.
>     Or course, Storm starts a few more threads within each
>     worker/supervisor.
> 
>     If your load is not huge, this might be sufficient. However, having high
>     data rate, it might be problematic.
> 
>     One more question: do you run a single topology in your cluster or
>     multiple? Storm isolates topologies for fault-tolerance reasons. Thus, a
>     single worker cannot process executors from different topologies. If you
>     run out of workers, a topology might not start up completely.
> 
>     -Matthias
> 
> 
> 
>     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>     > Hello Matthias and thank you for your reply. See my answers below:
>     >
>     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances
>     > (4 cores per node). On top of that I have 3 more nodes for zookeeper and
>     > nimbus.
>     > - 2 worker nodes per supervisor node
>     > - The task number for each bolt ranges from 1 to 4 and I use 1:1 task to
>     > executor assignment.
>     > - The number of executors in total for the topology ranges from 14 to 41
>     >
>     > Thanks,
>     > Nick
>     >
>     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org <ma...@apache.org>
>     > <mailto:mjsax@apache.org <ma...@apache.org>>>:
>     >
>     >     Without any exception/error message it is hard to tell.
>     >
>     >     What is your cluster setup
>     >       - Hardware, ie, number of cores per node?
>     >       - How many node/supervisor are available?
>     >       - Configured number of workers for the topology?
>     >       - What is the number of task for each spout/bolt?
>     >       - What is the number of executors for each spout/bolt?
>     >
>     >     -Matthias
>     >
>     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>     >     > Hello all,
>     >     >
>     >     > I am working on a project in which I submit a topology to my
>     Storm
>     >     > cluster, but for some reason, some of my tasks do not start
>     executing.
>     >     >
>     >     > I can see that the above is happening because every bolt I have
>     >     needs to
>     >     > connect to an external server and do a registration to a
>     service.
>     >     > However, some of the bolts do not seem to connect.
>     >     >
>     >     > I have to say that the number of tasks I have is larger than the
>     >     number
>     >     > of workers of my cluster. Also, I check my worker log files,
>     and I see
>     >     > that the workers that do not register, are also not writing some
>     >     > initialization messages I have them print in the beginning.
>     >     >
>     >     > Any idea why this is happening? Can it be because my
>     resources are not
>     >     > enough to start off all of the tasks?
>     >     >
>     >     > Thank you,
>     >     > Nick
>     >
>     >
>     >
>     >
>     > --
>     > Nikolaos Romanos Katsipoulakis,
>     > University of Pittsburgh, PhD candidate
> 
> 
> 
> 
> -- 
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Well, my input load is 4 streams at 4000 tuples per second, and each tuple
is about 128 bytes long. Therefore, I do not think my load is too much for
my hardware.

No, I am running only this topology in my cluster.

For some reason, when I set the task to executor ratio to 1, my topology
does not hang at all. The strange thing now is that I see higher latency
with more executors and I am trying to figure this out. Also, I see that
the default scheduler is trying to co-locate tasks and executors as much as
possible. Is this true? If yes, is it because the intra-worker latencies
are much lower than the inter-worker latencies?

Thanks,
Nick

2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mj...@apache.org>:

> So (for each node) you have 4 cores available for 1 supervisor JVM, 2
> worker JVMs that execute up to 5 thread each (if 40 executors are
> distributed evenly over all workers. Thus, about 12 threads for 4 cores.
> Or course, Storm starts a few more threads within each worker/supervisor.
>
> If your load is not huge, this might be sufficient. However, having high
> data rate, it might be problematic.
>
> One more question: do you run a single topology in your cluster or
> multiple? Storm isolates topologies for fault-tolerance reasons. Thus, a
> single worker cannot process executors from different topologies. If you
> run out of workers, a topology might not start up completely.
>
> -Matthias
>
>
>
> On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> > Hello Matthias and thank you for your reply. See my answers below:
> >
> > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances
> > (4 cores per node). On top of that I have 3 more nodes for zookeeper and
> > nimbus.
> > - 2 worker nodes per supervisor node
> > - The task number for each bolt ranges from 1 to 4 and I use 1:1 task to
> > executor assignment.
> > - The number of executors in total for the topology ranges from 14 to 41
> >
> > Thanks,
> > Nick
> >
> > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> > <ma...@apache.org>>:
> >
> >     Without any exception/error message it is hard to tell.
> >
> >     What is your cluster setup
> >       - Hardware, ie, number of cores per node?
> >       - How many node/supervisor are available?
> >       - Configured number of workers for the topology?
> >       - What is the number of task for each spout/bolt?
> >       - What is the number of executors for each spout/bolt?
> >
> >     -Matthias
> >
> >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> >     > Hello all,
> >     >
> >     > I am working on a project in which I submit a topology to my Storm
> >     > cluster, but for some reason, some of my tasks do not start
> executing.
> >     >
> >     > I can see that the above is happening because every bolt I have
> >     needs to
> >     > connect to an external server and do a registration to a service.
> >     > However, some of the bolts do not seem to connect.
> >     >
> >     > I have to say that the number of tasks I have is larger than the
> >     number
> >     > of workers of my cluster. Also, I check my worker log files, and I
> see
> >     > that the workers that do not register, are also not writing some
> >     > initialization messages I have them print in the beginning.
> >     >
> >     > Any idea why this is happening? Can it be because my resources are
> not
> >     > enough to start off all of the tasks?
> >     >
> >     > Thank you,
> >     > Nick
> >
> >
> >
> >
> > --
> > Nikolaos Romanos Katsipoulakis,
> > University of Pittsburgh, PhD candidate
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Matthias J. Sax" <mj...@apache.org>.

So (for each node) you have 4 cores available for 1 supervisor JVM, 2
worker JVMs that execute up to 5 thread each (if 40 executors are
distributed evenly over all workers. Thus, about 12 threads for 4 cores.
Or course, Storm starts a few more threads within each worker/supervisor.

If your load is not huge, this might be sufficient. However, having high
data rate, it might be problematic.

One more question: do you run a single topology in your cluster or
multiple? Storm isolates topologies for fault-tolerance reasons. Thus, a
single worker cannot process executors from different topologies. If you
run out of workers, a topology might not start up completely.

-Matthias



On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
> Hello Matthias and thank you for your reply. See my answers below:
> 
> - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances
> (4 cores per node). On top of that I have 3 more nodes for zookeeper and
> nimbus.
> - 2 worker nodes per supervisor node
> - The task number for each bolt ranges from 1 to 4 and I use 1:1 task to
> executor assignment.
> - The number of executors in total for the topology ranges from 14 to 41
> 
> Thanks,
> Nick
> 
> 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>>:
> 
>     Without any exception/error message it is hard to tell.
> 
>     What is your cluster setup
>       - Hardware, ie, number of cores per node?
>       - How many node/supervisor are available?
>       - Configured number of workers for the topology?
>       - What is the number of task for each spout/bolt?
>       - What is the number of executors for each spout/bolt?
> 
>     -Matthias
> 
>     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>     > Hello all,
>     >
>     > I am working on a project in which I submit a topology to my Storm
>     > cluster, but for some reason, some of my tasks do not start executing.
>     >
>     > I can see that the above is happening because every bolt I have
>     needs to
>     > connect to an external server and do a registration to a service.
>     > However, some of the bolts do not seem to connect.
>     >
>     > I have to say that the number of tasks I have is larger than the
>     number
>     > of workers of my cluster. Also, I check my worker log files, and I see
>     > that the workers that do not register, are also not writing some
>     > initialization messages I have them print in the beginning.
>     >
>     > Any idea why this is happening? Can it be because my resources are not
>     > enough to start off all of the tasks?
>     >
>     > Thank you,
>     > Nick
> 
> 
> 
> 
> -- 
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello Matthias and thank you for your reply. See my answers below:

- I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances (4
cores per node). On top of that I have 3 more nodes for zookeeper and
nimbus.
- 2 worker nodes per supervisor node
- The task number for each bolt ranges from 1 to 4 and I use 1:1 task to
executor assignment.
- The number of executors in total for the topology ranges from 14 to 41

Thanks,
Nick

2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mj...@apache.org>:

> Without any exception/error message it is hard to tell.
>
> What is your cluster setup
>   - Hardware, ie, number of cores per node?
>   - How many node/supervisor are available?
>   - Configured number of workers for the topology?
>   - What is the number of task for each spout/bolt?
>   - What is the number of executors for each spout/bolt?
>
> -Matthias
>
> On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> > Hello all,
> >
> > I am working on a project in which I submit a topology to my Storm
> > cluster, but for some reason, some of my tasks do not start executing.
> >
> > I can see that the above is happening because every bolt I have needs to
> > connect to an external server and do a registration to a service.
> > However, some of the bolts do not seem to connect.
> >
> > I have to say that the number of tasks I have is larger than the number
> > of workers of my cluster. Also, I check my worker log files, and I see
> > that the workers that do not register, are also not writing some
> > initialization messages I have them print in the beginning.
> >
> > Any idea why this is happening? Can it be because my resources are not
> > enough to start off all of the tasks?
> >
> > Thank you,
> > Nick
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Tasks are not starting

Posted by "Matthias J. Sax" <mj...@apache.org>.

Without any exception/error message it is hard to tell.

What is your cluster setup
  - Hardware, ie, number of cores per node?
  - How many node/supervisor are available?
  - Configured number of workers for the topology?
  - What is the number of task for each spout/bolt?
  - What is the number of executors for each spout/bolt?

-Matthias

On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
> Hello all, 
> 
> I am working on a project in which I submit a topology to my Storm
> cluster, but for some reason, some of my tasks do not start executing. 
> 
> I can see that the above is happening because every bolt I have needs to
> connect to an external server and do a registration to a service.
> However, some of the bolts do not seem to connect.
> 
> I have to say that the number of tasks I have is larger than the number
> of workers of my cluster. Also, I check my worker log files, and I see
> that the workers that do not register, are also not writing some
> initialization messages I have them print in the beginning.
> 
> Any idea why this is happening? Can it be because my resources are not
> enough to start off all of the tasks?
> 
> Thank you,
> Nick