You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2010/04/22 07:14:51 UTC

[jira] Issue Comment Edited: (AVRO-512) define and implement mapreduce connector protocol

    [ https://issues.apache.org/jira/browse/AVRO-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859659#action_12859659 ] 

Scott Carey edited comment on AVRO-512 at 4/22/10 1:13 AM:
-----------------------------------------------------------

I agree that Avro should not require MapReduce -- specifically the maven POM should not cause consumers to pull MapReduce by default.

But, I think we already prevent that.  The POM generated by the build specifies hadoop-core as "optional" meaning downstream projects that consume Avro won't automatically pull the Hadoop jar.  Another option for similar effect is to specify the dependency scope as "provided" instead of "compile" which makes the jar available for build and test but does not bundle it.  This is probably preferred for MapReduce.  If a user wants to use those APIs, they have to get a copy of their own hadoop-core jar or specify the dependency themselves.

Putting the code in Hadoop is probably a problem, unless we want to release new versions of 0.18, 0.19, 0.20,  etc.  Placing it in Hadoop means that changes to the Avro lower level APIs will break compatibility with the version in Hadoop.  Honestly, some of those APIs are going to keep evolving and dot-releases of AVRO can break these APIs (but not encoded formats).  Until these APIs are more locked down it is better to keep packages like this in the Avro project.

-----------
Going slightly off topic now:

A few other libraries Avro bundles have similar issues -- optional side features should specify either "provided" or "optional" flags in the maven pom.   Or, the project needs to be split up into a few jars.

avro-core
->  avro-genavro
->  avro-protocol
->  avro-mapred
->  avro-reflect

probably covers the main dependency chunks.  Avro-core can get away with only jackson, slf4j, and commons-lang, I think -- meaning generic, and specific APIs, file formats, etc work.


      was (Author: scott_carey):
    I agree that Avro should not require MapReduce -- specifically the maven POM should not cause consumers to pull MapReduce by default.

But, that is what happens.  The POM generated by the build specifies hadoop-core as "optional" meaning downstream projects that consume Avro won't automatically pull the Hadoop jar.  Another option for similar effect is to specify the dependency scope as "provided" instead of "compile" which makes the jar available for build and test but does not bundle it.  This is probably preferred for MapReduce.  If a user wants to use those APIs, they have to get a copy of their own hadoop-core jar or specify the dependency themselves.

Putting the code in Hadoop is probably a problem, unless we want to release new versions of 0.18, 0.19, 0.20,  etc.  Placing it in Hadoop means that changes to the Avro lower level APIs will break compatibility with the version in Hadoop.  Honestly, some of those APIs are going to keep evolving and dot-releases of AVRO can break these APIs (but not encoded formats).  Until these APIs are more locked down it is better to keep packages like this in the Avro project.

-----------
Going slightly off topic now:

A few other libraries Avro bundles have similar issues -- optional side features should specify either "provided" or "optional" flags in the maven pom.   Or, the project needs to be split up into a few jars.

avro-core
->  avro-genavro
->  avro-protocol
->  avro-mapred
->  avro-reflect

probably covers the main dependency chunks.  Avro-core can get away with only jackson, slf4j, and commons-lang, I think -- meaning generic, and specific APIs, file formats, etc work.

  
> define and implement mapreduce connector protocol
> -------------------------------------------------
>
>                 Key: AVRO-512
>                 URL: https://issues.apache.org/jira/browse/AVRO-512
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>
> Avro should provide Hadoop Mapper and Reducer implementations that connect to a subprocess in another programming language, transmitting raw binary values to and from that process.  This should be modeled after Hadoop Pipes.  It would allow one to easily write efficient mapreduce programs in non-Java languages that process Avro-format data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.