You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Alex Ivanov (JIRA)" <ji...@apache.org> on 2016/09/25 08:57:21 UTC
[jira] [Updated] (HADOOP-13652) ZKDelegationTokenSecretManager
doesn't seem to honor ZK connection/session timeouts
[ https://issues.apache.org/jira/browse/HADOOP-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Ivanov updated HADOOP-13652:
---------------------------------
Description:
Looking at some of the errors I've seen due to Zookeeper connection issues from KMS, it doesn't seem like the following timeouts are picked up.
{code}
package org.apache.hadoop.security.token.delegation;
public abstract class ZKDelegationTokenSecretManager<TokenIdent extends AbstractDelegationTokenIdentifier>
extends AbstractDelegationTokenSecretManager<TokenIdent> {
public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 10000;
public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 10000;
...
}
{code}
Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: the curator defaults.
{code}
package org.apache.curator.framework;
public class CuratorFrameworkFactory
{
private static final int DEFAULT_SESSION_TIMEOUT_MS = Integer.getInteger("curator-default-session-timeout", 60 * 1000);
private static final int DEFAULT_CONNECTION_TIMEOUT_MS = Integer.getInteger("curator-default-connection-timeout", 15 * 1000);
...
}
{code}
It looks like DelegationTokenAuthenticationFilter is setting curator, and that may cause an issue:
{code}
package org.apache.hadoop.security.token.delegation.web;
public class DelegationTokenAuthenticationFilter
extends AuthenticationFilter {
protected void initializeAuthHandler(String authHandlerClassName,
FilterConfig filterConfig) throws ServletException {
ZKDelegationTokenSecretManager.setCurator((CuratorFramework)
filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider.
ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE));
super.initializeAuthHandler(authHandlerClassName, filterConfig);
ZKDelegationTokenSecretManager.setCurator(null);
}
...
}
{code}
Example errors:
{code}
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15976)
2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (16001)
2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (19001)
2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (22001)
2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (31003)
2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (33977)
2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (47292)
2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (51292)
2016-09-25 01:47:24,346 WARN ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
{code}
There are also some connections issues between KMS and Zookeeper. It is sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can get into this perpetual connect/disconnect cycle from which it eventually recovers or a restart also helps. I'm mentioning this fact in case it is related to this jira.
was:
Looking at some of the errors I've seen due to Zookeeper connection issues from KMS, it doesn't seem like the following timeouts are picked up.
{code}
package org.apache.hadoop.security.token.delegation;
public abstract class ZKDelegationTokenSecretManager<TokenIdent extends AbstractDelegationTokenIdentifier>
extends AbstractDelegationTokenSecretManager<TokenIdent> {
public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 10000;
public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 10000;
...
}
{code}
Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: the curator defaults.
{code}
package org.apache.curator.framework;
public class CuratorFrameworkFactory
{
private static final int DEFAULT_SESSION_TIMEOUT_MS = Integer.getInteger("curator-default-session-timeout", 60 * 1000);
private static final int DEFAULT_CONNECTION_TIMEOUT_MS = Integer.getInteger("curator-default-connection-timeout", 15 * 1000);
...
}
{code}
It looks like DelegationTokenAuthenticationFilter is setting curator, and that may cause an issue:
{code}
package org.apache.hadoop.security.token.delegation.web;
public class DelegationTokenAuthenticationFilter
extends AuthenticationFilter {
protected void initializeAuthHandler(String authHandlerClassName,
FilterConfig filterConfig) throws ServletException {
ZKDelegationTokenSecretManager.setCurator((CuratorFramework)
filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider.
ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE));
super.initializeAuthHandler(authHandlerClassName, filterConfig);
ZKDelegationTokenSecretManager.setCurator(null);
}
{code}
Example errors:
{code}
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15976)
2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (16001)
2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (19001)
2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (22001)
2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (31003)
2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (33977)
2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (47292)
2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (51292)
2016-09-25 01:47:24,346 WARN ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
{code}
There are also some connections issues between KMS and Zookeeper. It is sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can get into this perpetual connect/disconnect cycle from which it eventually recovers or a restart also helps. I'm mentioning this fact in case it is related to this jira.
> ZKDelegationTokenSecretManager doesn't seem to honor ZK connection/session timeouts
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-13652
> URL: https://issues.apache.org/jira/browse/HADOOP-13652
> Project: Hadoop Common
> Issue Type: Bug
> Components: kms
> Reporter: Alex Ivanov
>
> Looking at some of the errors I've seen due to Zookeeper connection issues from KMS, it doesn't seem like the following timeouts are picked up.
> {code}
> package org.apache.hadoop.security.token.delegation;
> public abstract class ZKDelegationTokenSecretManager<TokenIdent extends AbstractDelegationTokenIdentifier>
> extends AbstractDelegationTokenSecretManager<TokenIdent> {
> public static final int ZK_DTSM_ZK_SESSION_TIMEOUT_DEFAULT = 10000;
> public static final int ZK_DTSM_ZK_CONNECTION_TIMEOUT_DEFAULT = 10000;
> ...
> }
> {code}
> Instead, the connection/session timeouts are, correspondingly, 15 & 60 secs: the curator defaults.
> {code}
> package org.apache.curator.framework;
> public class CuratorFrameworkFactory
> {
> private static final int DEFAULT_SESSION_TIMEOUT_MS = Integer.getInteger("curator-default-session-timeout", 60 * 1000);
> private static final int DEFAULT_CONNECTION_TIMEOUT_MS = Integer.getInteger("curator-default-connection-timeout", 15 * 1000);
> ...
> }
> {code}
> It looks like DelegationTokenAuthenticationFilter is setting curator, and that may cause an issue:
> {code}
> package org.apache.hadoop.security.token.delegation.web;
> public class DelegationTokenAuthenticationFilter
> extends AuthenticationFilter {
> protected void initializeAuthHandler(String authHandlerClassName,
> FilterConfig filterConfig) throws ServletException {
> ZKDelegationTokenSecretManager.setCurator((CuratorFramework)
> filterConfig.getServletContext().getAttribute(ZKSignerSecretProvider.
> ZOOKEEPER_SIGNER_SECRET_PROVIDER_CURATOR_CLIENT_ATTRIBUTE));
> super.initializeAuthHandler(authHandlerClassName, filterConfig);
> ZKDelegationTokenSecretManager.setCurator(null);
> }
> ...
> }
> {code}
> Example errors:
> {code}
> 2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
> 2016-09-25 01:46:33,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
> 2016-09-25 01:46:34,028 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15976)
> 2016-09-25 01:46:34,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (16001)
> 2016-09-25 01:46:37,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (19001)
> 2016-09-25 01:46:40,053 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (22001)
> 2016-09-25 01:46:49,055 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (31003)
> 2016-09-25 01:46:52,029 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (33977)
> 2016-09-25 01:47:05,344 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (47292)
> 2016-09-25 01:47:09,345 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (51292)
> 2016-09-25 01:47:24,346 WARN ConnectionState - Connection attempt unsuccessful after 66294 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
> 2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
> 2016-09-25 01:47:43,740 ERROR ConnectionState - Connection timed out for connection string (host1, host2, host3) and timeout (15000) / elapsed (15001)
> {code}
> There are also some connections issues between KMS and Zookeeper. It is sporadic, that's why I'm still trying to pinpoint them, but essentially KMS can get into this perpetual connect/disconnect cycle from which it eventually recovers or a restart also helps. I'm mentioning this fact in case it is related to this jira.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org