You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Gour Saha (JIRA)" <ji...@apache.org> on 2017/08/17 07:03:00 UTC
[jira] [Commented] (SLIDER-1236) Unnecessary 10 second sleep before installation

    [ https://issues.apache.org/jira/browse/SLIDER-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130026#comment-16130026 ] 

Gour Saha commented on SLIDER-1236:
-----------------------------------

I am planning to use a separate constant for the delay after registration and before heartbeat starts and set that to 10ms (really no need for agents to be sitting here for 10secs).

I also observed that the delay between subsequent heartbeats are 10 secs which is fairly high as well. I am going to bring that down to 1 sec. I tested this configuration and all apps I tested with are coming up must faster (in ~7-8 secs all app containers were up and running). All UTs are also passing.

> Unnecessary 10 second sleep before installation
> -----------------------------------------------
>
>                 Key: SLIDER-1236
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1236
>             Project: Slider
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Gour Saha
>
> Noticed when starting LLAP on a 2-node cluster. Slider AM logs:
> {noformat}
> 2017-05-22 22:04:33,047 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
> ...
> 2017-05-22 22:04:34,946 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
> {noformat}
> Then nothing useful goes on for a while, until:
> {noformat}
> 2017-05-22 22:04:43,099 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Installing LLAP on container_1495490227300_0002_01_000002.
> {noformat}
> If you look at the corresponding logs from both agents, you can see that they both have a gap that's pretty much exactly 10sec.
> After the gap, they talk back to AM; after ~30ms for each container (corresponding to the end of its gap), presumably after hearing from it, the AM starts installing LLAP.
> {noformat}
> INFO 2017-05-22 22:04:33,055 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
> INFO 2017-05-22 22:04:33,055 Controller.py:630 - Response from server = OK
> INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
> INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490683064. Command(s) in progress: False. Components mapped: False
> INFO 2017-05-22 22:04:34,948 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
> INFO 2017-05-22 22:04:34,948 Controller.py:630 - Response from server = OK
> INFO 2017-05-22 22:04:44,959 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
> INFO 2017-05-22 22:04:44,960 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490684959. Command(s) in progress: False. Components mapped: False
> {noformat}
> I've observed the same on multiple different clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)