You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shrish <s....@lancaster.ac.uk> on 2011/07/22 17:18:49 UTC

Problem with Hadoop Streaming -file option for Java class files

I am struggling with a  issue in hadoop streaming in the "-file" option.

First I tried the very basic example in streaming:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper
org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstchk22

which worked absolutely fine.

Then I copied the IdentityMapper.java source code and compiled it. Then I placed
this class file in the /home/hadoop folder and executed the following in the
terminal.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file ~/IdentityMapper.class
-mapper IdentityMapper.class \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch6

The execution failed with the following error in the stderr file:

java.io.IOException: Cannot run program "IdentityMapper.class":
java.io.IOException: error=2, No such file or directory

Then again I tried it by copying the IdentityMapper.class file in the hadoop
installation and executed the following:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file IdentityMapper.class
-mapper IdentityMapper.class \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch5

But unfortunately again I got the same error.

It would be great if you can help me with it as I cannot move any further
without overcoming this.


***I am trying this after I tried hadoop-streaming for a different class file
which failed, so to identify if there is something wrong with the class file
itself or with the way I am using it


Thanking you in anticipation


Re: Problem with Hadoop Streaming -file option for Java class files

Posted by Robert Evans <ev...@YAHOO-INC.COM>.
>From a practical standpoint if you just leave off the -mapper you will get an IdentityMapper being run in streaming.  I don't believe that -mapper will understand something.class as a class file that should be loaded and used as the mapper.  I think you need to specify the class, including the package to get it to load like you did with org.apache.hadoop.mapred.lib.IdentityMapper.  I am not sure what changes you made to IdentiyMapper.java before recompiling but in order to get it on the classpath you probably need to ship it as a jar not as a single file.  I believe that you can use -libJars to ship it and add it to the classpath of the JVM, but I am not positive of that.

--Bobby Evans

On 7/22/11 10:18 AM, "Shrish" <s....@lancaster.ac.uk> wrote:



I am struggling with a  issue in hadoop streaming in the "-file" option.

First I tried the very basic example in streaming:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper
org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstchk22

which worked absolutely fine.

Then I copied the IdentityMapper.java source code and compiled it. Then I placed
this class file in the /home/hadoop folder and executed the following in the
terminal.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file ~/IdentityMapper.class
-mapper IdentityMapper.class \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch6

The execution failed with the following error in the stderr file:

java.io.IOException: Cannot run program "IdentityMapper.class":
java.io.IOException: error=2, No such file or directory

Then again I tried it by copying the IdentityMapper.class file in the hadoop
installation and executed the following:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file IdentityMapper.class
-mapper IdentityMapper.class \ -reducer /bin/wc -inputformat
KeyValueTextInputFormat -input gutenberg/* -output gutenberg-outputtstch5

But unfortunately again I got the same error.

It would be great if you can help me with it as I cannot move any further
without overcoming this.


***I am trying this after I tried hadoop-streaming for a different class file
which failed, so to identify if there is something wrong with the class file
itself or with the way I am using it


Thanking you in anticipation