You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Aleksey Maslov <Al...@Lab49.com> on 2011/03/11 06:54:37 UTC

How to direct Reducer to write avro objects to avro sequence file?

Hi,
(using hadoop 0.20.2 and avro 1.4.1)

I have defined a simple avro object 'AvroObj' (a record of strings),
compiled the schema and 
setup a simple MR job that takes as input &lt;Object, Text&gt; and emits
&lt;Text, IntWritable&gt;
and reducer that takes said &lt;Text, IntWritable&gt; and ...
I would like to achieve is - have reducer emit &lt;NullWritable, AvroObj&gt;
pairs into an avro sequence file;

so the next mr job will open that avro file and read-in avro objects, not
text lines, out of it;

I have looked through the (H ed.2) book and few online samples but can't
figure out how to do it;
some online sources mention job config settings like:
        job.setOutputFormatClass(AvroOutputFormat.class);        
        AvroOutputFormat.setCompressOutput(conf, false);

But this doesn't compile - setCompressOutput asks for deprecated JobConf
object, and
"setOutputFormatClass" gives error about its param - param not applicable to
AvroOutputFormat.class;

Could someone enlighten me how to have reducer write to avro sequence file ?

Cheers;

--
View this message in context: http://apache-avro.679487.n3.nabble.com/How-to-direct-Reducer-to-write-avro-objects-to-avro-sequence-file-tp2663706p2663706.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: How to direct Reducer to write avro objects to avro sequence file?

Posted by Scott Carey <sc...@richrelevance.com>.
The Avro unit tests have several examples:

http://svn.apache.org/viewvc/avro/tags/release-1.5.0/lang/java/mapred/src/t
est/java/org/apache/avro/mapred/TestWordCount.java?view=markup


On 3/14/11 1:54 PM, "Aleksey Maslov" <Al...@Lab49.com> wrote:

>I am confused. 
>
>The patch applies only to Avro 1.3.3, Hadoop 0.20.2
>(https://issues.apache.org/jira/browse/AVRO-593);
>I am running avro 1.4.1 - shouldn't the patch be already in that base
>version?
>I tried integrate, but there are files missing from 1.4.1 that patch tries
>to modify (ex: AvroSerialization.java);
>---
>
>I am not following your explanation - could you please give me a quick
>example (or a link to one) of how to use avro objects as hadoop's
>key/value
>types for MR jobs ?
>
>Thanks,
>Aleksey
>
>
>--
>View this message in context:
>http://apache-avro.679487.n3.nabble.com/How-to-direct-Reducer-to-write-avr
>o-objects-to-avro-sequence-file-tp2663706p2677331.html
>Sent from the Avro - Users mailing list archive at Nabble.com.


Re: How to direct Reducer to write avro objects to avro sequence file?

Posted by Aleksey Maslov <Al...@Lab49.com>.
I am confused. 

The patch applies only to Avro 1.3.3, Hadoop 0.20.2
(https://issues.apache.org/jira/browse/AVRO-593);
I am running avro 1.4.1 - shouldn't the patch be already in that base
version?
I tried integrate, but there are files missing from 1.4.1 that patch tries
to modify (ex: AvroSerialization.java);
---

I am not following your explanation - could you please give me a quick
example (or a link to one) of how to use avro objects as hadoop's key/value
types for MR jobs ?

Thanks,
Aleksey


--
View this message in context: http://apache-avro.679487.n3.nabble.com/How-to-direct-Reducer-to-write-avro-objects-to-avro-sequence-file-tp2663706p2677331.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: How to direct Reducer to write avro objects to avro sequence file?

Posted by Harsh J <qw...@gmail.com>.
By 'Avro sequence files' do you mean Avro data-files?

Avro-Mapred classes right now only support the older, stable API
(which has been undeprecated in 0.20.3, and is supported in 0.21 as
well - no worries in using it really). There is AVRO-593 that tracks a
new API implementation of Avro's mapred suppor (but it should be
fairly easy to write your own wrappers for these after a bit of
reading, since changes are mostly superficial).

On Fri, Mar 11, 2011 at 11:24 AM, Aleksey Maslov
<Al...@lab49.com> wrote:
> Hi,
> (using hadoop 0.20.2 and avro 1.4.1)
>
> I have defined a simple avro object 'AvroObj' (a record of strings),
> compiled the schema and
> setup a simple MR job that takes as input &lt;Object, Text&gt; and emits
> &lt;Text, IntWritable&gt;
> and reducer that takes said &lt;Text, IntWritable&gt; and ...
> I would like to achieve is - have reducer emit &lt;NullWritable, AvroObj&gt;
> pairs into an avro sequence file;
>
> so the next mr job will open that avro file and read-in avro objects, not
> text lines, out of it;
>
> I have looked through the (H ed.2) book and few online samples but can't
> figure out how to do it;
> some online sources mention job config settings like:
>        job.setOutputFormatClass(AvroOutputFormat.class);
>        AvroOutputFormat.setCompressOutput(conf, false);
>
> But this doesn't compile - setCompressOutput asks for deprecated JobConf
> object, and
> "setOutputFormatClass" gives error about its param - param not applicable to
> AvroOutputFormat.class;
>
> Could someone enlighten me how to have reducer write to avro sequence file ?
>
> Cheers;
>
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/How-to-direct-Reducer-to-write-avro-objects-to-avro-sequence-file-tp2663706p2663706.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>



-- 
Harsh J
www.harshj.com