You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Markus Weimer <we...@yahoo-inc.com> on 2011/04/08 22:14:42 UTC
Map-only conversion job
Hi,
I seem to hit a case not covered by the mapred package documentation:
I'd like to read from a TextInputFormat and produce AVRO data in a map-
only job. How Do I do that?
Thanks,
Markus
Re: Map-only conversion job
Posted by Doug Cutting <cu...@apache.org>.
On 04/12/2011 02:18 PM, Markus Weimer wrote:
>> In short, the way to do this is to:
>> - use a org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
>> - call AvroJob.setOutputSchema(job,schema) with O's schema
>>
>> Does that make sense? If that works for you, I can add it to the
>> javadoc.
>
> Yes, it worked. Incidently, it also reduced my file size to 33% of my
> previous custom-avro-writable-in-sequence-file approach.
Great! I'll update the documentation and add a test for this case.
https://issues.apache.org/jira/browse/AVRO-802
Thanks,
Doug
Re: Map-only conversion job
Posted by Markus Weimer <we...@yahoo-inc.com>.
Hi Doug,
>> I seem to hit a case not covered by the mapred package documentation:
>> I'd like to read from a TextInputFormat and produce AVRO data in a
>> map-only job. How Do I do that?
>
> In short, the way to do this is to:
> - use a
> org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
> - call AvroJob.setOutputSchema(job,schema) with O's schema
>
> Does that make sense? If that works for you, I can add it to the
> javadoc.
Yes, it worked. Incidently, it also reduced my file size to 33% of my
previous custom-avro-writable-in-sequence-file approach.
Thanks,
Markus
Re: Map-only conversion job
Posted by Doug Cutting <cu...@apache.org>.
On 04/08/2011 01:14 PM, Markus Weimer wrote:
> I seem to hit a case not covered by the mapred package documentation:
> I'd like to read from a TextInputFormat and produce AVRO data in a
> map-only job. How Do I do that?
In short, the way to do this is to:
- use a org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
- call AvroJob.setOutputSchema(job,schema) with O's schema
Does that make sense? If that works for you, I can add it to the javadoc.
Doug