You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Derek Wollenstein (JIRA)" <ji...@apache.org> on 2009/03/09 21:33:50 UTC
[jira] Created: (HADOOP-5443) -archives option in JobConf doesn't support symlink for an uncompressed archive directory

-archives option in JobConf doesn't support symlink for an uncompressed archive directory
-----------------------------------------------------------------------------------------

                 Key: HADOOP-5443
                 URL: https://issues.apache.org/jira/browse/HADOOP-5443
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.1
            Reporter: Derek Wollenstein
            Priority: Minor


According to http://hadoop.apache.org/core/docs/r0.19.1/streaming.html#Large+files+and+archives+in+Hadoop+Streaming, it should be possible to have an archive uncompressed into the working directory of a job with a given alias.  The documentation here says that
"The -archives option allows you to copy jars locally to the cwd of tasks and automatically unjar the files. For example:

-archives hdfs://host:fs_port/user/testfile.jar#testlink3

In the example above, a symlink testlink3 is created in the current working directory of tasks. This symlink points to the directory that stores the unjarred contents of the uploaded jar file. "

This feature currently breaks because the entires string, including the alias, is validated as a filename by the GenericOptionsParser
I've pasted a stacktrace ( with modified filenames/hosts) below

java.io.FileNotFoundException: File hdfs://host:fs_port/user/testfile.jar#testlink3 does not exist.
        at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:319)
        at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:247)
        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:345)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:136)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

This breaks a number of jobs that worked with the cacheArchives option in hadoop streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.