You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2017/08/17 00:35:00 UTC

[jira] [Created] (SLIDER-1236) 10 second sleep before installation

Sergey Shelukhin created SLIDER-1236:
----------------------------------------

             Summary: 10 second sleep before installation
                 Key: SLIDER-1236
                 URL: https://issues.apache.org/jira/browse/SLIDER-1236
             Project: Slider
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Noticed when starting LLAP on a 2-node cluster. Slider AM logs:
{noformat}
2017-05-22 22:04:33,047 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
...
2017-05-22 22:04:34,946 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
{noformat}
Then nothing useful goes on for a while, until:
{noformat}
2017-05-22 22:04:43,099 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Installing LLAP on container_1495490227300_0002_01_000002.
{noformat}

If you look at the corresponding logs from both agents, you can see that they both have a gap that's pretty much exactly 10sec.
After the gap, they talk back to AM; after ~30ms for each container (corresponding to the end of its gap), presumably after hearing from it, the AM starts installing LLAP.



{noformat}
INFO 2017-05-22 22:04:33,055 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
INFO 2017-05-22 22:04:33,055 Controller.py:630 - Response from server = OK
INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490683064. Command(s) in progress: False. Components mapped: False

INFO 2017-05-22 22:04:34,948 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
INFO 2017-05-22 22:04:34,948 Controller.py:630 - Response from server = OK
INFO 2017-05-22 22:04:44,959 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
INFO 2017-05-22 22:04:44,960 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490684959. Command(s) in progress: False. Components mapped: False
{noformat}


I've observed the same on multiple different clusters.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)