You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Garrett Wu (JIRA)" <ji...@apache.org> on 2010/10/29 22:28:19 UTC

[jira] Commented: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

    [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926441#action_12926441 ] 

Garrett Wu commented on AVRO-593:
---------------------------------

I'm also interested in using the newer mapreduce API with Avro, so I'm trying to write an AvroWritable and some input and output format classes that know how to deal with the schemas.  I should have a patch next week, but the idea is:

- Introduce new classes AvroKey and AvroValue that implement Writable.
- Users can call AvroJob.setInputKeySchema(), AvroJob.setInputValueSchema(), AvroJob.setMapOutputKeySchema(), AvroJob.setMapOutputValueSchema(), AvroJob.setReduceOutputKeySchema(), AvroJob.setReduceOutputValueSchema() as needed.
- Provide AvroContainerFileInputFormat/AvroContainerFileOutputFormat, AvroSequenceFileInputFormat, AvroSequenceFileOutputFormat that read and write the schemas for the data appropriately.  The schema in the sequence files can be stored in the header's metadata.
- Users can write Mappers and Reducers as they normally would.  Note that this differs slightly from the org.apache.avro.mapred.* way of doing things -- I don't plan to supply special AvroMapper and AvroReducer base classes or a new Serialization, since the AvroKey/AvroValue classes are Writable just like any other hadoop key/value type.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.