You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Miki Tebeka <mi...@gmail.com> on 2011/10/04 01:21:10 UTC

Re: Avro and Hadoop streaming

I *think* streaming support was added only in 1.6

On Mon, Jul 11, 2011 at 5:36 PM, Mona Gandhi <mo...@apture.com> wrote:
> I tried using the command that Miki posted, with the difference being the version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work.
>
> /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file test-mapper.py -file test-reducer.py  -inputformat org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output AvroOutput
>
>
> Error: -inputformat : class not found : org.apache.avro.mapred.AvroAsTextInputFormat
> Streaming Job Failed!
>
>
> Thanks for all the help!
>
> On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote:
>
>> Found the magic (-files and -libs):
>>
>> jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar
>>
>> hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
>>    -files $jars \
>>    -libjars $jars \
>>    -input /in/avro \
>>    -output /out/avro \
>>    -mapper avro-mapper.py \
>>    -reducer avro-reducer.py \
>>    -file avro-mapper.py \
>>    -file avro-reducer.py \
>>    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat
>>
>> Thanks for all the help!
>>
>> On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <sc...@richrelevance.com> wrote:
>>> Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
>>> (and relevant dependencies, or the avro-tools.jar with all dependencies
>>> bundled) in a location that gets picked up first in the task classpath.
>>>
>>> Packaging it in the job jar works. I'm not sure if putting it in the
>>> distributed cache and loading it as a library that way would.
>>>
>>> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
>>> <Ma...@icrossing.com> wrote:
>>>
>>>> You have to package it in the job jar file under a /lib directory.
>>>>
>>>>
>>>> On 6/15/11 9:26 AM, "Miki Tebeka" <mi...@gmail.com> wrote:
>>>>
>>>>> Still didn't work.
>>>>>
>>>>> I'm pretty new to hadoop world, I probably need to place the avro jar
>>>>> somewhere on the classpath of the nodes,
>>>>> however I have no idea how to do that.
>>>>>
>>>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>>> Miki,
>>>>>>
>>>>>> You'll need to provide the entire canonical class name
>>>>>> (org.apache.avro.mapredS).
>>>>>>
>>>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <mi...@gmail.com>
>>>>>> wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've tried to run a job with the following command:
>>>>>>>
>>>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>>>>    -input /in/avro \
>>>>>>>    -output $out \
>>>>>>>    -mapper avro-mapper.py \
>>>>>>>    -reducer avro-reducer.py \
>>>>>>>    -file avro-mapper.py \
>>>>>>>    -file avro-reducer.py \
>>>>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>>>>    -inputformat AvroAsTextInputFormat
>>>>>>>
>>>>>>> However I get
>>>>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>>>>>
>>>>>>> I'm probably missing something obvious to do.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --
>>>>>>> Miki
>>>>>>>
>>>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <cu...@apache.org>
>>>>>>> wrote:
>>>>>>>> Miki,
>>>>>>>>
>>>>>>>> Have you looked at AvroAsTextInputFormat?
>>>>>>>>
>>>>>>>>
>>>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>>>> roAsT
>>>>>>>> extInputFormat.html
>>>>>>>>
>>>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>>>>>
>>>>>>>> Are these perhaps what you're looking for?
>>>>>>>>
>>>>>>>> Doug
>>>>>>>>
>>>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>>>>> My plan is to write an inputformat class that emits json records,
>>>>>>>>> one
>>>>>>>>> per line. This way the streaming application can read one record per
>>>>>>>>> line.
>>>>>>>>>
>>>>>>>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
>>>>>>>>> ng+Ot
>>>>>>>>> her+Plugins+for+Jobs)
>>>>>>>>>
>>>>>>>>> I couldn't find any documentation/help about writing inputformat
>>>>>>>>> classes. Can someone point me to the right direction?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> --
>>>>>>>>> Miki
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>
>>>>
>>>> iCrossing Privileged and Confidential Information
>>>> This email message is for the sole use of the intended recipient(s) and
>>>> may contain confidential and privileged information of iCrossing. Any
>>>> unauthorized review, use, disclosure or distribution is prohibited. If
>>>> you are not the intended recipient, please contact the sender by reply
>>>> email and destroy all copies of the original message.
>>>>
>>>>
>>>
>>>
>
>