You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "steven.z zhuang" <st...@gmail.com> on 2011/04/26 16:33:15 UTC
access hdfs from streaming job
hi, list,
I have this very old, simple question, which I can not figure
out in short time, so I turn to you guys.
OK, in my perl hadoop streaming job, I want to access a file in
HDFS, what I did is as fllows:
1. fork a subprocess and try to dump the file into
local FS file, failed.
perl code: `hadoop fs -get XXX local-dir`;
here by local dir, I tried both full path and
short relative file paths, both not work. and I can be sure the user running
the hadoop has the access to the target dir.
I did get something from the command, but the file
content is just the same exception message as the following item presents.
2. try to open a pipe to the following command, failed:
perl code : open(FH, "hadoop fs
-cat XXX | ") .....
what is weird is that I can read from the pipe, and
the content is something like the following:
*
*
* Exception in thread "main"
java.lang.NoClassDefFoundError:
"-Dhadoop/tasklog/taskid=attempt_201104251856_0216_m_000001_0 *
* Caused by: java.lang.ClassNotFoundException:
"-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0 *
*at java.net.URLClassLoader$1.run(URLClassLoader.java:202) *
*at java.security.AccessController.doPrivileged(Native Method) *
*at java.net.URLClassLoader.findClass(URLClassLoader.java:190) *
*at java.lang.ClassLoader.loadClass(ClassLoader.java:307) *
*at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) *
*at java.lang.ClassLoader.loadClass(ClassLoader.java:248) *
*Could not find the main class:
"-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0. ....*
*
*
seems to me it's some path problem, but I don't know
what caused the problem.
anyone know how to fix this please help, thanks!
--
Steven
Re: access hdfs from streaming job
Posted by "steven.z zhuang" <st...@gmail.com>.
OK, I will try to answer this question myself.
this is caused by the env variable HADOOP_CLIENT_OPTS being double quoted
in mapred/org/apache/hadoop/mapred/TaskRunner.java
which in turn will make the command line in streaming job like this:
nohup /dist/JAVA_HOME/bin/java -Dproc_fs -Xmx4000m
*"-Dhadoop.tasklog.taskid=attempt_201104251856_0303_m_000001_0
-Dhadoop.tasklog.iscleanup=false
-Dhadoop.tasklog.totalLogFileSize=0"*-Dhadoop.log.dir=/blah/logs
the HADOOP_CLIENT_OPTS here is doubled quoted, which makes java take it not
as a set of options.
Anyone know why this variable "*HADOOP_CLIENT_OPTS *" is specially treated
than other variables?
anyway, seems it's just a trouble-making idea.
anyone know how to fix this?
2011/4/26 steven.z zhuang <st...@gmail.com>
> hi, list,
> I have this very old, simple question, which I can not figure
> out in short time, so I turn to you guys.
> OK, in my perl hadoop streaming job, I want to access a file in
> HDFS, what I did is as fllows:
> 1. fork a subprocess and try to dump the file into
> local FS file, failed.
> perl code: `hadoop fs -get XXX local-dir`;
> here by local dir, I tried both full path and
> short relative file paths, both not work. and I can be sure the user running
> the hadoop has the access to the target dir.
>
> I did get something from the command, but the
> file content is just the same exception message as the following item
> presents.
>
> 2. try to open a pipe to the following command,
> failed:
> perl code : open(FH, "hadoop fs
> -cat XXX | ") .....
>
> what is weird is that I can read from the pipe, and
> the content is something like the following:
> *
> *
> * Exception in thread "main"
> java.lang.NoClassDefFoundError:
> "-Dhadoop/tasklog/taskid=attempt_201104251856_0216_m_000001_0 *
> * Caused by: java.lang.ClassNotFoundException:
> "-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0 *
> *at java.net.URLClassLoader$1.run(URLClassLoader.java:202) *
> *at java.security.AccessController.doPrivileged(Native Method) *
> *at java.net.URLClassLoader.findClass(URLClassLoader.java:190) *
> *at java.lang.ClassLoader.loadClass(ClassLoader.java:307) *
> *at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) *
> *at java.lang.ClassLoader.loadClass(ClassLoader.java:248) *
> *Could not find the main class:
> "-Dhadoop.tasklog.taskid=attempt_201104251856_0216_m_000001_0. ....*
> *
> *
> seems to me it's some path problem, but I don't
> know what caused the problem.
>
> anyone know how to fix this please help, thanks!
>
> --
> Steven
>
>