You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "steven.z zhuang" <st...@gmail.com> on 2011/04/26 16:33:15 UTC

access hdfs from streaming job

hi, list,
            I have this very old, simple question, which I can not figure
out in short time, so I turn to you guys.
            OK, in my perl hadoop streaming job, I want to access a file in
HDFS, what I did is as fllows:
                     1. fork a subprocess and try to dump the file into
local FS file, failed.
                       perl code:           `hadoop fs -get XXX local-dir`;
                          here by local dir, I tried both full path and
short relative file paths, both not work. and I can be sure the user running
the hadoop has the access to the target dir.

                          I did get something from the command, but the file
content is just the same exception message as the following item presents.

                     2. try to open a pipe to the following command, failed:
                              perl code :              open(FH, "hadoop fs
-cat XXX | ") .....

                        what is weird is that I can read from the pipe, and
the content is something like the following:
*
*
*                            Exception in thread "main"
java.lang.NoClassDefFoundError:
"-Dhadoop/tasklog/taskid=attempt_201104251856_0216_m_000001_0 *
* Caused by: java.lang.ClassNotFoundException:
"-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0 *
*at java.net.URLClassLoader$1.run(URLClassLoader.java:202) *
*at java.security.AccessController.doPrivileged(Native Method) *
*at java.net.URLClassLoader.findClass(URLClassLoader.java:190) *
*at java.lang.ClassLoader.loadClass(ClassLoader.java:307) *
*at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) *
*at java.lang.ClassLoader.loadClass(ClassLoader.java:248)  *
*Could not find the main class:
"-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0.   ....*
*
*
                        seems to me it's some path problem, but I don't know
what caused the problem.

                    anyone know how to fix this please help, thanks!

--
Steven

Re: access hdfs from streaming job

Posted by "steven.z zhuang" <st...@gmail.com>.
OK, I will try to answer this question myself.

this is caused by the env variable HADOOP_CLIENT_OPTS being double quoted
in mapred/org/apache/hadoop/mapred/TaskRunner.java
which in turn will make the command line in streaming job like this:
        nohup /dist/JAVA_HOME/bin/java -Dproc_fs -Xmx4000m
*"-Dhadoop.tasklog.taskid=attempt_201104251856_0303_m_000001_0
-Dhadoop.tasklog.iscleanup=false
-Dhadoop.tasklog.totalLogFileSize=0"*-Dhadoop.log.dir=/blah/logs

the HADOOP_CLIENT_OPTS here is doubled quoted, which makes java take it not
as a set of options.

Anyone know why this variable "*HADOOP_CLIENT_OPTS *" is specially treated
than other variables?
anyway, seems it's just a trouble-making idea.

anyone know how to fix this?

2011/4/26 steven.z zhuang <st...@gmail.com>

> hi, list,
>             I have this very old, simple question, which I can not figure
> out in short time, so I turn to you guys.
>             OK, in my perl hadoop streaming job, I want to access a file in
> HDFS, what I did is as fllows:
>                      1. fork a subprocess and try to dump the file into
> local FS file, failed.
>                        perl code:           `hadoop fs -get XXX local-dir`;
>                           here by local dir, I tried both full path and
> short relative file paths, both not work. and I can be sure the user running
> the hadoop has the access to the target dir.
>
>                           I did get something from the command, but the
> file content is just the same exception message as the following item
> presents.
>
>                      2. try to open a pipe to the following command,
> failed:
>                               perl code :              open(FH, "hadoop fs
> -cat XXX | ") .....
>
>                         what is weird is that I can read from the pipe, and
> the content is something like the following:
> *
> *
> *                            Exception in thread "main"
> java.lang.NoClassDefFoundError:
> "-Dhadoop/tasklog/taskid=attempt_201104251856_0216_m_000001_0 *
> * Caused by: java.lang.ClassNotFoundException:
> "-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0 *
> *at java.net.URLClassLoader$1.run(URLClassLoader.java:202) *
> *at java.security.AccessController.doPrivileged(Native Method) *
> *at java.net.URLClassLoader.findClass(URLClassLoader.java:190) *
> *at java.lang.ClassLoader.loadClass(ClassLoader.java:307) *
> *at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) *
> *at java.lang.ClassLoader.loadClass(ClassLoader.java:248)  *
> *Could not find the main class:
> "-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0.   ....*
> *
> *
>                         seems to me it's some path problem, but I don't
> know what caused the problem.
>
>                     anyone know how to fix this please help, thanks!
>
> --
> Steven
>
>