You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2017/11/28 00:54:01 UTC

[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

     [ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated HADOOP-15059:
--------------------------------
    Attachment: HADOOP-15059.001.patch

Attaching a patch that has the container launch process write two token files, the legacy token format in the existing container_tokens file and the new version 1 format in a new container_tokens-v1 file.  I left the container localizer path alone since localizers are running the same code as the nodemanager and therefore can directly support the new v1 format as-is.

The basic idea is to tack on the "-v1" suffix to the legacy token pathname to form the new v1 token pathname.  The container launcher and container executors both do this, so the interface between them did not have to change.  The legacy path is passed between them to indicate where both token files can be located (once the suffix is applied to form the new token path).  It's definitely not the cleanest, but it was relatively simple to implement.  I refactored some names in the container start context to make it more clear which path is being used.

This needs a lot more testing, but I was able to run a sleep job on a simple security pseudo-distributed cluster and manually verified both container token files were being written and each was the proper format.  I also manually forced the launcher to omit the new environment variable for the version 1 file, forcing the UGI to load the legacy token file, and that worked as well.  I have not had a chance yet to test the rolling-upgrade-with-tarball scenario nor the native container-executor changes, but I thought it was far enough along to at least get some feedback.

If others could take a look at the patch and/or take it for a test drive that would be great.


> 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
> 	at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
> 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
> 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
> 	... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> 	at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> 	... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we claim "rolling upgrade" is supported in Hadoop 3, we should fix this before we ship 3.0 otherwise all MR running applications will get stuck during/after upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org