You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Steve Severance (JIRA)" <ji...@apache.org> on 2010/07/07 20:16:51 UTC

[jira] Created: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Avro mapreduce apis incompatible with hadoop 0.20.2
---------------------------------------------------

                 Key: AVRO-593
                 URL: https://issues.apache.org/jira/browse/AVRO-593
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.3.3, 1.3.2
         Environment: Avro 1.3.3, Hadoop 0.20.2
            Reporter: Steve Severance


The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926450#action_12926450 ] 

Scott Carey commented on AVRO-593:
----------------------------------

FYI: Wrapping Avro serialization 'inside' of Writable will work, but there will be some non-trivial performance cost to that.  Writable requires more fine-grained reads and writes from the underlying stream preventing optimal buffering for Avro.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886111#action_12886111 ] 

Scott Carey commented on AVRO-593:
----------------------------------

The old mapred API is being un-deprecated for 0.21 and is not going away soon. The new mapreduce API is not yet finished.

However we will eventually need to support the newer API.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886145#action_12886145 ] 

Scott Carey commented on AVRO-593:
----------------------------------

Is there a specific use case where this is failing for you or is it just the use of deprecated APIs that is a problem?

I suppose that integrating Avro with another library that is on the newer API could be an issue.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Posted by "Garrett Wu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926441#action_12926441 ] 

Garrett Wu commented on AVRO-593:
---------------------------------

I'm also interested in using the newer mapreduce API with Avro, so I'm trying to write an AvroWritable and some input and output format classes that know how to deal with the schemas.  I should have a patch next week, but the idea is:

- Introduce new classes AvroKey and AvroValue that implement Writable.
- Users can call AvroJob.setInputKeySchema(), AvroJob.setInputValueSchema(), AvroJob.setMapOutputKeySchema(), AvroJob.setMapOutputValueSchema(), AvroJob.setReduceOutputKeySchema(), AvroJob.setReduceOutputValueSchema() as needed.
- Provide AvroContainerFileInputFormat/AvroContainerFileOutputFormat, AvroSequenceFileInputFormat, AvroSequenceFileOutputFormat that read and write the schemas for the data appropriately.  The schema in the sequence files can be stored in the header's metadata.
- Users can write Mappers and Reducers as they normally would.  Note that this differs slightly from the org.apache.avro.mapred.* way of doing things -- I don't plan to supply special AvroMapper and AvroReducer base classes or a new Serialization, since the AvroKey/AvroValue classes are Writable just like any other hadoop key/value type.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

Posted by "Garrett Wu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Garrett Wu updated AVRO-593:
----------------------------

    Attachment: AVRO-593.patch

Thanks for the info, Scott.

Trying to avoid putting avro serialization 'inside' of Writables, I came up with this patch that tries to keep features/changes to a bare minimum.  Let me know what you think.

> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
>                 Key: AVRO-593
>                 URL: https://issues.apache.org/jira/browse/AVRO-593
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.2, 1.3.3
>         Environment: Avro 1.3.3, Hadoop 0.20.2
>            Reporter: Steve Severance
>         Attachments: AVRO-593.patch
>
>
> The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.