You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/11/15 17:44:00 UTC

[jira] [Commented] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env

    [ https://issues.apache.org/jira/browse/HADOOP-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688422#comment-16688422 ] 

Steve Loughran commented on HADOOP-14982:
-----------------------------------------

I'm seeing this happen, or something similar today, on a 3.1+ branch.
{code}
2018-11-15 17:34:44,548 [main] INFO  tools.DistCp (DistCp.java:run(144)) - Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[s3a://hwdev-steve-ireland-new/examples], targetPath=s3a://hwdev-steve-new/dest, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[s3a://hwdev-steve-ireland-new/examples], targetPathExists=true, preserveRawXattrsfalse
2018-11-15 17:34:45,039 [main] INFO  client.AHSProxy (AHSProxy.java:createAHSProxy(42)) - Connecting to Application History server at host-000003/172.27.20.152:10200
2018-11-15 17:34:45,716 [main] WARN  shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2018-11-15 17:34:46,138 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:34:46,439 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:34:46,442 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000004:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000004/172.27.18.67:8020 after 1 failover attempts. Trying to failover after sleeping for 900ms.
2018-11-15 17:34:47,637 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:34:47,638 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000003:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000003/172.27.20.152:8020 after 2 failover attempts. Trying to failover after sleeping for 1096ms.
2018-11-15 17:34:49,033 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:34:49,034 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000004:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000004/172.27.18.67:8020 after 3 failover attempts. Trying to failover after sleeping for 5110ms.
2018-11-15 17:34:54,437 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:34:54,440 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000003:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000003/172.27.20.152:8020 after 4 failover attempts. Trying to failover after sleeping for 8056ms.
^C^C^C^C^C^C^C^C^C^C^C^C2018-11-15 17:35:02,922 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:35:02,924 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000004:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000004/172.27.18.67:8020 after 5 failover attempts. Trying to failover after sleeping for 19775ms.
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C2018-11-15 17:35:22,994 [main] WARN  ipc.Client (Client.java:run(752)) - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2018-11-15 17:35:22,997 [main] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - java.io.IOException: DestHost:destPort host-000003:8020 , LocalHost:localPort HW13176.local/192.168.99.1:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS], while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over host-000003/172.27.20.152:8020 after 6 failover attempts. Trying to failover after sleeping for 11610ms.
2018-11-15 17:35:25,590 [Thread-0] WARN  util.ShutdownHookManager (ShutdownHookManager.java:executeShutdown(128)) - ShutdownHook 'Cleanup' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
fish: 'bin/hadoop distcp \
 s3a://hwde…' terminated by signal SIGKILL (Forced quit)
{code}

note also how the ^C calls were ignored due to the shutdown block, and while there's a limit there me getting bored and killing the process. During that IPC-sleep-retry code, interrupts should be acted on. 

Anyway, {{AccessControlException}} exceptions shoudn't be wrapped and should (probably) trigger a fail

> Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14982
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14982
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>             Fix For: 3.1.0, 2.10.0
>
>         Attachments: HADOOP-14892-001.patch, HADOOP-14892-002.patch, HADOOP-14982-003.patch
>
>
> If HA is configured for the Resource Manager in a secure environment, using the mapred client goes into a loop if the user is not authenticated with Kerberos.
> {noformat}
> [root@pb6sec-1 ~]# mapred job -list
> 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36
> 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms.
> 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25
> 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 failover attempts. Trying to failover after sleeping for 582ms.
> 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36
> 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms.
> 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25
> 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 failover attempts. Trying to failover after sleeping for 1667ms.
> 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36
> 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms.
> 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25
> 17/10/25 06:37:49 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 6 failover attempts. Trying to failover after sleeping for 1055ms.
> 17/10/25 06:37:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36
> 17/10/25 06:37:50 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 17/10/25 06:37:50 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 7 failover attempts. Trying to failover after sleeping for 2608ms.
> ...
> {noformat}
> The reason is that the retry handler sees a {{ConnectException}}, then fails over to the inactive RM. This obviously doesn't work, so it comes back to the active and whole process starts again. The RetryHandler should examine if the {{ConnectException}} is actually caused by a {{GSSException}} (and probably check the "No valid credentials provided" message) and if so, it should not perform a failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org