You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <mr...@bloomberg.net> on 2019/05/17 16:38:36 UTC

Start command not propagated by Nimbus

We have a topology that was never started, even though nimbus received the start command. Supervisor never received a command to start this topology, so the issue wasn't in our topology code. In the logs, I see:

2019-05-11 10:07:28,087 INFO  nimbus [pool-14-thread-16] Activating WingmanTopology4246: WingmanTopology4246-251-1557583643

There were a bunch of topologies started around the same time, and most of them had the following message occur next:

[timer] Setting new assignment for topology id <Topology Name>:................

However, we did not see this logged for the topology that wasn't started. When the cluster was stopped, we saw:

2019-05-11 10:36:04,447 INFO  nimbus [pool-14-thread-4] Delaying event :remove for 5 secs for WingmanTopology4246-251-1557583643
2019-05-11 10:36:04,457 INFO  nimbus [pool-14-thread-4] Adding topo to history log: WingmanTopology4246-251-1557583643


What could have caused this? There were 16 topologies submitted to be run in total, our storm.yaml file allocates more than enough slots under supervisor.slots.ports.

Re: Start command not propagated by Nimbus

Posted by Ethan Li <et...@gmail.com>.
Hi Mitchell,


Does the UI show that this topology is fully scheduled? Are you using ResourceAwareScheduler? Because it’s possible that the scheduler cannot find enough resources to schedule this topology. 

Also after starting the topology, you could login to zookeeper and check if there is a assignment belong to this topology. You can also read the content of it to get some better idea. But you will need some coding to do it since it’s serialized. 

It’s really hard to tell the root cause without more information. Is it possible for you to provide all the related nimbus.log, supervisor.log so I can take a look?

Best,
Ethan

> On May 17, 2019, at 11:38 AM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) <mr...@bloomberg.net> wrote:
> 
> We have a topology that was never started, even though nimbus received the start command. Supervisor never received a command to start this topology, so the issue wasn't in our topology code. In the logs, I see:
> 
> 2019-05-11 10:07:28,087 INFO  nimbus [pool-14-thread-16] Activating WingmanTopology4246: WingmanTopology4246-251-1557583643
> 
> There were a bunch of topologies started around the same time, and most of them had the following message occur next:
> 
> [timer] Setting new assignment for topology id <Topology Name>:................
> 
> However, we did not see this logged for the topology that wasn't started. When the cluster was stopped, we saw:
> 
> 2019-05-11 10:36:04,447 INFO  nimbus [pool-14-thread-4] Delaying event :remove for 5 secs for WingmanTopology4246-251-1557583643
> 2019-05-11 10:36:04,457 INFO  nimbus [pool-14-thread-4] Adding topo to history log: WingmanTopology4246-251-1557583643
> 
> 
> What could have caused this? There were 16 topologies submitted to be run in total, our storm.yaml file allocates more than enough slots under supervisor.slots.ports.