You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Reik Schatz <re...@bwin.org> on 2010/03/10 13:11:46 UTC

using StreamInputFormat, StreamXmlRecordReader with your custom Jobs

Hi, I am playing around with version 0.20.2 of Hadoop. I have written 
and packaged a Job using a custom Mapper and Reducer. The input format 
in my Job is set to StreamInputFormat. Also setting property 
stream.recordreader.class to 
org.apache.hadoop.streaming.StreamXmlRecordReader.

This is how I want to start my job:
hadoop jar custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output

The problem is that in this case all classes from 
hadoop-0.20.2-streaming.jar are missing (ClassNotFoundException). I 
tried using -libjars without luck.
hadoop jar -libjars PATH/hadoop-0.20.2-streaming.jar 
custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output

Any chance to use streaming classes with your own Jobs without copying 
these classes to your projects and packaging them into your own jar?


/Reik

Re: using StreamInputFormat, StreamXmlRecordReader with your custom Jobs

Posted by Reik Schatz <re...@bwin.org>.
Uh, do I have to copy the jar file manually into HDFS before I invoke 
the hadoop jar command starting my own job?



Utkarsh Agarwal wrote:

> I think you can use DistributedCache to specify the location of the jar
> after you have it in hdfs..
>
> On Wed, Mar 10, 2010 at 6:11 AM, Reik Schatz <re...@bwin.org> wrote:
>
>   
>> Hi, I am playing around with version 0.20.2 of Hadoop. I have written and
>> packaged a Job using a custom Mapper and Reducer. The input format in my Job
>> is set to StreamInputFormat. Also setting property stream.recordreader.class
>> to org.apache.hadoop.streaming.StreamXmlRecordReader.
>>
>> This is how I want to start my job:
>> hadoop jar custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>>
>> The problem is that in this case all classes from
>> hadoop-0.20.2-streaming.jar are missing (ClassNotFoundException). I tried
>> using -libjars without luck.
>> hadoop jar -libjars PATH/hadoop-0.20.2-streaming.jar
>> custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>>
>> Any chance to use streaming classes with your own Jobs without copying
>> these classes to your projects and packaging them into your own jar?
>>
>>
>> /Reik
>>
>>     

Re: using StreamInputFormat, StreamXmlRecordReader with your custom Jobs

Posted by Utkarsh Agarwal <un...@gmail.com>.
I think you can use DistributedCache to specify the location of the jar
after you have it in hdfs..

On Wed, Mar 10, 2010 at 6:11 AM, Reik Schatz <re...@bwin.org> wrote:

> Hi, I am playing around with version 0.20.2 of Hadoop. I have written and
> packaged a Job using a custom Mapper and Reducer. The input format in my Job
> is set to StreamInputFormat. Also setting property stream.recordreader.class
> to org.apache.hadoop.streaming.StreamXmlRecordReader.
>
> This is how I want to start my job:
> hadoop jar custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>
> The problem is that in this case all classes from
> hadoop-0.20.2-streaming.jar are missing (ClassNotFoundException). I tried
> using -libjars without luck.
> hadoop jar -libjars PATH/hadoop-0.20.2-streaming.jar
> custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>
> Any chance to use streaming classes with your own Jobs without copying
> these classes to your projects and packaging them into your own jar?
>
>
> /Reik
>