You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2015/05/07 23:54:59 UTC

[jira] [Created] (HIVE-10649) LLAP: AM gets stuck completely if one node is dead

Sergey Shelukhin created HIVE-10649:
---------------------------------------

             Summary: LLAP: AM gets stuck completely if one node is dead
                 Key: HIVE-10649
                 URL: https://issues.apache.org/jira/browse/HIVE-10649
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin
            Assignee: Siddharth Seth


See HIVE-10648.
When AM cannot connect to a node, that appears to cause it to stall.
{noformat}
2015-05-07 12:13:46,679 INFO [Dispatcher thread: Central] impl.TaskImpl: task_1429683757595_0784_1_00_000276 Task Transitioned from SCHEDULED to RUNNING due to event T_ATTEMPT_LAUNCHED
2015-05-07 12:13:46,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:46,955 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:47,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:48,812 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:49,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:50,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:51,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 15 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:52,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 16 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:53,815 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 17 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:54,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 18 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:55,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 19 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 20 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,971 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:57,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 21 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:58,818 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 22 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:59,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 23 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:00,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 24 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:01,820 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 25 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:02,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 26 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:03,821 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 27 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:04,822 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 28 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:05,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 29 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,823 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 30 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:06,984 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:07,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 31 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:08,824 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 32 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:09,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 33 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:10,825 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 34 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:11,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 35 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:12,826 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 36 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:13,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 37 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:14,827 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 38 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:15,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 39 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,828 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 40 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:16,996 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:14:17,829 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 41 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:18,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 42 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:19,830 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 43 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:20,831 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 44 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:21,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 45 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:22,832 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 46 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:23,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 47 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:24,833 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 48 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,834 INFO [TaskCommunicator # 3] ipc.Client: Retrying connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already tried 49 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:25,836 INFO [TaskCommunicator # 3] tezplugins.LlapTaskCommunicator: Unable to run task: attempt_1429683757595_0784_1_00_000017_0 on containerId: container_222212222_0784_01_000018, Communication Error
2015-05-07 12:14:25,841 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1429683757595_0784_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1429683757595_0784_1_00_000017_0, startTime=1431026014322, finishTime=1431026065838, timeTaken=51516, status=KILLED, errorEnum=COMMUNICATION_ERROR, diagnostics=Communication Error, counters=Counters: 1, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)