You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2015/05/09 02:54:00 UTC

[jira] [Resolved] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5903.
------------------------------------------------
    Resolution: Invalid

My comment [above|https://issues.apache.org/jira/browse/MAPREDUCE-5903?focusedCommentId=14152049&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14152049] still holds. This is not supported today.

If you don't want to create user-accounts, then you can do the following
 - Find a local unix user to map all kerberos/LDAP authenticated users to
 - Set yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users to true
 - Set yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user configuration to the local unix user.

For e.g., the default for this is nobody, which means all jobs will run as the nobody unix user. Clearly this will have other security concerns as all jobs run as the same user.

Closing this as invalid for now.

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5903
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>         Environment: hadoop: 2.4.0.2.1.2.0
>            Reporter: Victor Kim
>            Priority: Critical
>              Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result: Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)