You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Girish Joshi <gj...@groupon.com> on 2015/03/20 00:11:45 UTC

Need help with troubleshooting a worker not processing messages

I am trying to troubleshoot an issue with our storm cluster where a worker
process on one of the machines in the cluster does not perform any work.
All the counts(emitted/transferred/executed) for all executors in that
worker are 0 as shown below. Even if I restart the worker, storm supervisor
starts a new one and that does not process any work either.

[120-120]26m 17sstorm6-prod6702
<http://watson-storm6-prod.lup1:8000/log?file=worker-6702.log>000.000
0.00000.00000

Supervisor logs shows that the worker is started and the worker log just
has a bunch of zookeeper messages printed every minute.

2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Refreshing partition manager
connections
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Deleted partition managers: []
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] New partition managers: []
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Finished refreshing

I am looking for some debugging help and have following questions. If you
have any suggestions , I will appreciate that.

- From the storm UI, it looks like the worker process is up and running and
is assigned to executing tasks from all bolts and spouts in the topology.
But it does not get any messages to work on. Is there a way I can find out
why is storm infrastructure routing any messages to any of the bolts
running in that process? For spouts, since they are reading from kafka, I
could understand that there are no partitions left for this worker to read
from and so it does not have anything to read. But I would expect messages
from other kafka spouts to be routed to bolts in this worker process.

- Is there a way I can enable debug logging for storm which can tell me why
a particular worker process is not getting any messages/tuples to execute?

Thanks,

Girish.

Re: Need help with troubleshooting a worker not processing messages

Posted by Girish Joshi <gj...@groupon.com>.
Thanks Taylor for your response.

In my case, I have seen that 4 of my 15 kafka executors do not process any
data; I will check what the kafka # of partitions is but looks like it may
be just 11 in which case I should reduce the number of kafka executors.

around 50 of the 550 mapperBoltExecutors I have do not process anything and
I am now guessing that is because if my maxSpoutPending (500) is low and so
there are not enough tuples to be processed in 550 mapperBoltExecutors.

Do you know if maxSpoutPending is the maximum number of unacked tuples from
a single spout executor or from all spout executors combined? If it is the
later, then my guess makes sense since if there are only 500 tuples
unacked, they will need only 500 more bolts to process them.

kafkaSpoutExecutors: 15
mapperBoltExecutors: 550
workers: 9
maxSpoutPending: 500



On Thu, Mar 19, 2015 at 8:55 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> More information about your topology would help, but..
>
> I’ll assume you’re using a core API topology (spouts/bolts).
>
> On the kafka spout side, does the spout parallelism == the # of kafka
> partitions? (It should.)
>
>  On the bolt side, are you using fields groupings at all, and if so, what
> does the distribution of those fields look like?
>
> To changel the logging level, edit the log back config files in
> .//storm/logback, if running locally, add or edit a logback config file in
> your project.
>
> -Taylor
>
> On Mar 19, 2015, at 7:11 PM, Girish Joshi <gj...@groupon.com> wrote:
>
> I am trying to troubleshoot an issue with our storm cluster where a worker
> process on one of the machines in the cluster does not perform any work.
> All the counts(emitted/transferred/executed) for all executors in that
> worker are 0 as shown below. Even if I restart the worker, storm supervisor
> starts a new one and that does not process any work either.
>
> [120-120]26m 17sstorm6-prod6702
> <http://watson-storm6-prod.lup1:8000/log?file=worker-6702.log>000.000
> 0.00000.00000
>
> Supervisor logs shows that the worker is started and the worker log just
> has a bunch of zookeeper messages printed every minute.
>
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Refreshing partition manager
> connections
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Deleted partition managers: []
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] New partition managers: []
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Finished refreshing
>
> I am looking for some debugging help and have following questions. If you
> have any suggestions , I will appreciate that.
>
> - From the storm UI, it looks like the worker process is up and running
> and is assigned to executing tasks from all bolts and spouts in the
> topology. But it does not get any messages to work on. Is there a way I can
> find out why is storm infrastructure routing any messages to any of the
> bolts running in that process? For spouts, since they are reading from
> kafka, I could understand that there are no partitions left for this worker
> to read from and so it does not have anything to read. But I would expect
> messages from other kafka spouts to be routed to bolts in this worker
> process.
>
> - Is there a way I can enable debug logging for storm which can tell me
> why a particular worker process is not getting any messages/tuples to
> execute?
>
> Thanks,
>
> Girish.
>
>
>

Re: Need help with troubleshooting a worker not processing messages

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
More information about your topology would help, but..

I’ll assume you’re using a core API topology (spouts/bolts).

On the kafka spout side, does the spout parallelism == the # of kafka partitions? (It should.)

 On the bolt side, are you using fields groupings at all, and if so, what does the distribution of those fields look like?

To changel the logging level, edit the log back config files in .//storm/logback, if running locally, add or edit a logback config file in your project.

-Taylor
On Mar 19, 2015, at 7:11 PM, Girish Joshi <gj...@groupon.com> wrote:

> I am trying to troubleshoot an issue with our storm cluster where a worker process on one of the machines in the cluster does not perform any work. All the counts(emitted/transferred/executed) for all executors in that worker are 0 as shown below. Even if I restart the worker, storm supervisor starts a new one and that does not process any work either. 
> 
> [120-120]26m 17sstorm6-prod6702000.0000.00000.00000
> 
> Supervisor logs shows that the worker is started and the worker log just has a bunch of zookeeper messages printed every minute. 
> 
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Refreshing partition manager connections
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Deleted partition managers: []
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] New partition managers: []
> 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Finished refreshing
> 
> I am looking for some debugging help and have following questions. If you have any suggestions , I will appreciate that.
> 
> - From the storm UI, it looks like the worker process is up and running and is assigned to executing tasks from all bolts and spouts in the topology. But it does not get any messages to work on. Is there a way I can find out why is storm infrastructure routing any messages to any of the bolts running in that process? For spouts, since they are reading from kafka, I could understand that there are no partitions left for this worker to read from and so it does not have anything to read. But I would expect messages from other kafka spouts to be routed to bolts in this worker process.
> 
> - Is there a way I can enable debug logging for storm which can tell me why a particular worker process is not getting any messages/tuples to execute?
> 
> Thanks,
> 
> Girish.