You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Gour Saha (JIRA)" <ji...@apache.org> on 2017/08/17 07:05:00 UTC

[jira] [Resolved] (SLIDER-1236) Unnecessary 10 second sleep before installation

     [ https://issues.apache.org/jira/browse/SLIDER-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gour Saha resolved SLIDER-1236.
-------------------------------
    Resolution: Fixed

> Unnecessary 10 second sleep before installation
> -----------------------------------------------
>
>                 Key: SLIDER-1236
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1236
>             Project: Slider
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Gour Saha
>
> Noticed when starting LLAP on a 2-node cluster. Slider AM logs:
> {noformat}
> 2017-05-22 22:04:33,047 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
> ...
> 2017-05-22 22:04:34,946 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
> {noformat}
> Then nothing useful goes on for a while, until:
> {noformat}
> 2017-05-22 22:04:43,099 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Installing LLAP on container_1495490227300_0002_01_000002.
> {noformat}
> If you look at the corresponding logs from both agents, you can see that they both have a gap that's pretty much exactly 10sec.
> After the gap, they talk back to AM; after ~30ms for each container (corresponding to the end of its gap), presumably after hearing from it, the AM starts installing LLAP.
> {noformat}
> INFO 2017-05-22 22:04:33,055 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
> INFO 2017-05-22 22:04:33,055 Controller.py:630 - Response from server = OK
> INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
> INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490683064. Command(s) in progress: False. Components mapped: False
> INFO 2017-05-22 22:04:34,948 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
> INFO 2017-05-22 22:04:34,948 Controller.py:630 - Response from server = OK
> INFO 2017-05-22 22:04:44,959 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
> INFO 2017-05-22 22:04:44,960 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490684959. Command(s) in progress: False. Components mapped: False
> {noformat}
> I've observed the same on multiple different clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)