You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "amarnath reddy pappu (JIRA)" <ji...@apache.org> on 2019/01/03 19:13:00 UTC

[jira] [Created] (AMBARI-25087) Heartbeat lost - Heartbeat building is taking time

amarnath reddy pappu created AMBARI-25087:
---------------------------------------------

             Summary: Heartbeat lost - Heartbeat building is taking time
                 Key: AMBARI-25087
                 URL: https://issues.apache.org/jira/browse/AMBARI-25087
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.6.2
            Reporter: amarnath reddy pappu


In the past XXXX has reported several issues where they complained about Agents loosing the heartbeat but re-starting the agent is fixing the issue.

In the logs it shows that building heart beat message and sending to request is taking close to 40-50seconds.


{noformat}
INFO 2018-12-17 22:30:48,746 Controller.py:304 - Heartbeat (response id = 82) with server is running...
INFO 2018-12-17 22:30:48,851 Controller.py:311 - Building heartbeat message
INFO 2018-12-17 22:31:29,374 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-12-17 22:31:30,017 Hardware.py:188 - Some mount points were ignored: /run, /dev/shm, /run/lock, /sys/fs/cgroup, /run/user/2008, /run/user/202, /run/user/2002, /run/user/2001, /run/user/203, /run/user/2007, /run/user/2019
INFO 2018-12-17 22:31:51,672 Controller.py:333 - Heartbeat response received (id = 83)
INFO 2018-12-17 22:28:44,155 Controller.py:311 - Building heartbeat message
INFO 2018-12-17 22:29:14,883 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-12-17 22:29:17,051 Hardware.py:188 - Some mount points were ignored: /run, /dev/shm, /run/lock, /sys/fs/cgroup, /run/user/2008, /run/user/202, /run/user/2002, /run/user/2001, /run/user/203, /run/user/2007, /run/user/2019
{noformat}


We have generated multiple thread dumps to see what is happening while thread is waiting at that point. in multiple dumps thread is kept waiting at below log line.


{noformat}
# ThreadID: 139846973957888
File: "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
  self.run()
File: "/usr/lib/ambari-agent/lib/ambari_agent/Controller.py", line 501, in run
  self.registerAndHeartbeat()
File: "/usr/lib/ambari-agent/lib/ambari_agent/Controller.py", line 529, in registerAndHeartbeat
  self.heartbeatWithServer()
File: "/usr/lib/ambari-agent/lib/ambari_agent/Controller.py", line 313, in heartbeatWithServer
  data = json.dumps(self.heartbeat.build(self.responseId, send_state, self.hasMappedComponents))
File: "/usr/lib/ambari-agent/lib/ambari_agent/Heartbeat.py", line 83, in build
  logger.debug("Heartbeat: %s", pformat(heartbeat))
File: "/usr/lib/python2.7/pprint.py", line 63, in pformat
  return PrettyPrinter(indent=indent, width=width, depth=depth).pformat(object)
File: "/usr/lib/python2.7/pprint.py", line 122, in pformat
  self._format(object, sio, 0, 0, {}, 0)
File: "/usr/lib/python2.7/pprint.py", line 164, in _format
  allowance + 1, context, level)
{noformat}


So for some reason pformat is taking lot of time ( 30-35 seconds ) to format the message ( Have not enabled the DEBUG logging)


{noformat}
    if int(id) == 0:
      componentsMapped = False

    logger.debug("Building Heartbeat: {responseId = %s, timestamp = %s, "
                "commandsInProgress = %s, componentsMapped = %s,"
                "recoveryTimestamp = %s}",
        str(id), str(timestamp), repr(commandsInProgress), repr(componentsMapped), str(recovery_timestamp))

    logger.debug("Heartbeat: %s", pformat(heartbeat))

    hostInfo = HostInfo(self.config)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)