You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2015/01/31 10:45:34 UTC
[jira] [Commented] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299739#comment-14299739 ] 

Hadoop QA commented on MAPREDUCE-6238:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12695742/MAPREDUCE-6238.000.patch
  against trunk revision 26c2de3.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

        {color:red}-1 release audit{color}.  The applied patch generated 1 release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5135//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5135//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5135//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5135//console

This message is automatically generated.

> MR2 can't run local jobs with -libjars command options which is a regression from MR1
> -------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6238
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: MAPREDUCE-6238.000.patch
>
>
> MR2 can't run local jobs with -libjars command options which is a regression from MR1. 
> When run MR2 job with -jt local and -libjars, the job fails with java.io.FileNotFoundException: File does not exist: hdfs://XXXXXXXXXXXXXXX.jar.
> But the same command is working in MR1.
> I find the problem is because when MR2 run local job using  LocalJobRunner
> from JobSubmitter, the JobSubmitter#jtFs is local filesystem,
> So copyRemoteFiles will return from [the middle of the function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138]
> because source and destination file system are same.
> {code}
>     if (compareFs(remoteFs, jtFs)) {
>       return originalPath;
>     }
> {code}
> The following code at [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219]
> try to add the destination file to DistributedCache which introduce a bug for local job.
> {code}
>         Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
>         DistributedCache.addFileToClassPath(
>             new Path(newPath.toUri().getPath()), conf);
> {code}
> Because new Path(newPath.toUri().getPath()) will lose the filesystem information from newPath, the file added to DistributedCache will use the default Uri filesystem hdfs based on the following code. This causes the 
>  FileNotFoundException when we access the file later at 
>  [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270]
> {code}
>   public static void addFileToClassPath(Path file, Configuration conf)
>     throws IOException {
> 	  addFileToClassPath(file, conf, file.getFileSystem(conf));
>   }
>   public static void addFileToClassPath
>            (Path file, Configuration conf, FileSystem fs)
>         throws IOException {
>     String classpath = conf.get(MRJobConfig.CLASSPATH_FILES);
>     conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString()
>              : classpath + "," + file.toString());
>     URI uri = fs.makeQualified(file).toUri();
>     addCacheFile(uri, conf);
>   }
> {code}
> Compare to the following [MR1 code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]:
> {code}
>         Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication);
>         DistributedCache.addFileToClassPath(
>           new Path(newPath.toUri().getPath()), job, fs);
> {code}
> You will see why MR1 doesn't have this issue.
> because it passes the local filesystem into  DistributedCache#addFileToClassPath instead of using the default Uri filesystem hdfs.
> We should do the same in MR2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)