You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by Markus Weimer <we...@yahoo-inc.com> on 2011/04/08 22:14:42 UTC

Map-only conversion job

Hi,

I seem to hit a case not covered by the mapred package documentation:  
I'd like to read from a TextInputFormat and produce AVRO data in a map- 
only job. How Do I do that?

Thanks,

Markus

Re: Map-only conversion job

Posted by Doug Cutting <cu...@apache.org>.

On 04/12/2011 02:18 PM, Markus Weimer wrote:
>> In short, the way to do this is to:
>> - use a org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
>> - call AvroJob.setOutputSchema(job,schema) with O's schema
>>
>> Does that make sense?  If that works for you, I can add it to the
>> javadoc.
> 
> Yes, it worked. Incidently, it also reduced my file size to 33% of my
> previous custom-avro-writable-in-sequence-file approach.

Great!  I'll update the documentation and add a test for this case.

https://issues.apache.org/jira/browse/AVRO-802

Thanks,

Doug

Re: Map-only conversion job

Posted by Markus Weimer <we...@yahoo-inc.com>.

Hi Doug,

>> I seem to hit a case not covered by the mapred package documentation:
>> I'd like to read from a TextInputFormat and produce AVRO data in a
>> map-only job. How Do I do that?
>
> In short, the way to do this is to:
> - use a  
> org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
> - call AvroJob.setOutputSchema(job,schema) with O's schema
>
> Does that make sense?  If that works for you, I can add it to the  
> javadoc.

Yes, it worked. Incidently, it also reduced my file size to 33% of my  
previous custom-avro-writable-in-sequence-file approach.

Thanks,

Markus

Re: Map-only conversion job

Posted by Doug Cutting <cu...@apache.org>.

On 04/08/2011 01:14 PM, Markus Weimer wrote:
> I seem to hit a case not covered by the mapred package documentation:
> I'd like to read from a TextInputFormat and produce AVRO data in a
> map-only job. How Do I do that?

In short, the way to do this is to:
- use a org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
- call AvroJob.setOutputSchema(job,schema) with O's schema

Does that make sense?  If that works for you, I can add it to the javadoc.

Doug