You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Alex Ivanov (JIRA)" <ji...@apache.org> on 2016/09/25 08:56:20 UTC

[jira] [Created] (HADOOP-13652) ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts

Alex Ivanov created HADOOP-13652:
------------------------------------

             Summary: ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts
                 Key: HADOOP-13652
                 URL: https://issues.apache.org/jira/browse/HADOOP-13652
             Project: Hadoop Common
          Issue Type: Bug
          Components: kms
            Reporter: Alex Ivanov


Looking at some of the errors I've seen due to Zookeeper connection issues from KMS, it doesn't seem like the following timeouts are picked up.
{code}
package org.apache.hadoop.security.token.delegation;

public abstract class ZKDelegationTokenSecretManager<TokenIdent extends AbstractDelegationTokenIdentifier>
    extends AbstractDelegationTokenSecretManager<TokenIdent> {
  public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 10000;
  public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 10000;
...
}
{code}

Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: the curator defaults.
{code}
package org.apache.curator.framework;

public class CuratorFrameworkFactory
{
    private static final int DEFAULT_SESSION_TIMEOUT_MS = Integer.getInteger("curator-default-session-timeout", 60 * 1000);
    private static final int DEFAULT_CONNECTION_TIMEOUT_MS = Integer.getInteger("curator-default-connection-timeout", 15 * 1000);
...
}
{code}

It looks like DelegationTokenAuthenticationFilter is setting curator, and that may cause an issue:
{code}
package org.apache.hadoop.security.token.delegation.web;

public class DelegationTokenAuthenticationFilter
    extends AuthenticationFilter {

  protected void initializeAuthHandler(String authHandlerClassName,
      FilterConfig filterConfig) throws ServletException {
    ZKDelegationTokenSecretManager.setCurator((CuratorFramework)
        filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider.
            ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE));
    super.initializeAuthHandler(authHandlerClassName, filterConfig);
    ZKDelegationTokenSecretManager.setCurator(null);
}
{code}

Example errors:
{code}
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15976)
2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (16001)
2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (19001)
2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (22001)
2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (31003)
2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (33977)
2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (47292)
2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (51292)
2016-09-25 01:47:24,346 WARN  ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
{code}

There are also some connections issues between KMS and Zookeeper. It is sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can get into this perpetual connect/disconnect cycle from which it eventually recovers or a restart also helps. I'm mentioning this fact in case it is related to this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org