You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "xieguiming (Commented) (JIRA)" <ji...@apache.org> on 2012/04/11 17:31:19 UTC

[jira] [Commented] (MAPREDUCE-4074) Client continuously retries to RM When RM goes down before launching Application Master

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251674#comment-13251674 ] 

xieguiming commented on MAPREDUCE-4074:
---------------------------------------

The reason is below function:

{code:title=ClientServiceDelegate.java|borderStyle=solid}

  private synchronized Object invoke(String method, Class argClass,
      Object args) throws YarnRemoteException {
    Method methodOb = null;
    try {
      methodOb = MRClientProtocol.class.getMethod(method, argClass);
    } catch (SecurityException e) {
      throw new YarnException(e);
    } catch (NoSuchMethodException e) {
      throw new YarnException("Method name mismatch", e);
    }
    while (true) {
      try {
        return methodOb.invoke(getProxy(), args);
      } catch (YarnRemoteException yre) {
        LOG.warn("Exception thrown by remote end.", yre);
        throw yre;
      } catch (InvocationTargetException e) {
        if (e.getTargetException() instanceof YarnRemoteException) {
          LOG.warn("Error from remote end: " + e
              .getTargetException().getLocalizedMessage());
          LOG.debug("Tracing remote error ", e.getTargetException());
          throw (YarnRemoteException) e.getTargetException();
        }
        LOG.debug("Failed to contact AM/History for job " + jobId + 
            " retrying..", e.getTargetException());
        // Force reconnection by setting the proxy to null.
        realProxy = null;
      } catch (Exception e) {
        LOG.debug("Failed to contact AM/History for job " + jobId
            + "  Will retry..", e);
        // Force reconnection by setting the proxy to null.
        realProxy = null;
      }
    }
  }
{code}

When RM goes down, and will throw the java.lang.reflect.UndeclaredThrowableException, and will continuously retry.

                
> Client continuously retries to RM When RM goes down before launching Application Master
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4074
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4074
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.1
>            Reporter: Devaraj K
>
> Client continuously tries to RM and logs the below messages when the RM goes down before launching App Master. 
> I feel exception should be thrown or break the loop after finite no of retries.
> {code:xml}
> 28/03/12 07:15:03 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
> 28/03/12 07:15:04 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
> 28/03/12 07:15:05 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
> 28/03/12 07:15:06 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
> 28/03/12 07:15:07 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
> 28/03/12 07:15:08 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 5 time(s).
> 28/03/12 07:15:09 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 6 time(s).
> 28/03/12 07:15:10 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 7 time(s).
> 28/03/12 07:15:11 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 8 time(s).
> 28/03/12 07:15:12 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 9 time(s).
> 28/03/12 07:15:13 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
> 28/03/12 07:15:14 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
> 28/03/12 07:15:15 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
> 28/03/12 07:15:16 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
> 28/03/12 07:15:17 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira