You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2010/05/19 01:12:57 UTC

[jira] Created: (AVRO-545) Move mapreduce bindings out of avro jar

Move mapreduce bindings out of avro jar
---------------------------------------

                 Key: AVRO-545
                 URL: https://issues.apache.org/jira/browse/AVRO-545
             Project: Avro
          Issue Type: Bug
          Components: java
            Reporter: Owen O'Malley


MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908243#action_12908243 ] 

Owen O'Malley commented on AVRO-545:
------------------------------------

The problem is the standard one that jars shouldn't recursively depend on each other. It leads to lots of issues. In particular, if Hadoop changes a method, it can require a new release of avro to use the two together. This is a classic anti-pattern for software. You never want projects to recursively depend on each other.

The fact that your pom file lies about its dependencies makes this obvious. If you made the correct pom file, it would be impossible to release either Avro or Hadoop except simultaneously.



> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908263#action_12908263 ] 

Scott Carey commented on AVRO-545:
----------------------------------

Related to: https://issues.apache.org/jira/browse/AVRO-647


None of the bits of Avro that Hadoop will use depend on Hadoop.
None of the bits of Avro that use Hadoop will be called by Hadoop.
If this was not so, then moving it out of the jar would not be a possible solution to the problem.

There is no circular dependency between Hadoop itself and Avro unless Hadoop decides to use classes in o.a.a.mapred.  
Unless a user decided to call those in a Task. But then this inclusion might actually be desired!

Because of the way that Hadoop works, putting all of its dependencies in the front of the classpath for a Task, no user user will be able to run a newer version of Avro than what is in Hadoop.  With the mapred package broken out, at least a user might have a chance of using a different version of that, provided it was compatible with the 'avro-core' version Hadoop was using; but the safe bet would be to force the exact same version and bundle it with hadoop.

So at first I thought this was important to break out for Hadoop's sake, but now I don't.  Its important for Avro's sake for users and applications that don't use Hadoop.

It might be the other way around.  Using Avro in Hadoop is blocked by Hadoop sorting out its classpath issues since it currently forces all user Tasks to run with its dependencies  (there is no separate classloader for Tasks, for instance).
There is a Hadoop ticket for that, I can't seem to locate it right now.


Avro does not lie about its dependency on Hadoop.  There is never a time that an Avro user needs hadoop.  Although "provided" scope might be more appropriate than "optional", its functionally the same in this case.
The only way that a user can execute any avro.mapred code is to run from inside Hadoop, where the hadoop jars and dependencies come from hadoop and not from any packaging the user may attempt.  Specifying the dependency as a runtime dependency would be a lie -- the execution context (Hadoop) is expected to provide it.



> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-545:
------------------------------

    Status: Open  (was: Patch Available)

A new jar should be a separately publishable maven artifact.  So ivy.xml needs to be updated to add this as a separate configuration, a pom and source jar need to be generated, the mvn-install and mvn-deploy targets need to be updated, etc.

I also don't yet see the functionaly problem that Avro's jar referring to Hadoop classes creates for Hadoop.  Can you please describe the problem scenario in more detail?  Thanks!

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908379#action_12908379 ] 

Doug Cutting commented on AVRO-545:
-----------------------------------

> If you are really going to ignore my patch...

I did not ignore your patch.  I reviewed it.  It is incomplete.  Folks who use the MR API in Avro 1.4.0, retrieving it from Maven central, need to have an upgrade path that lets them still get the code from Maven central.  A new jar needs a new POM, javadoc, source, plus updates to Ant targets that publish artifacts, etc.  I would also vastly prefer that we make such an incompatible change once, rather than piecemeal, providing a complete fix for AVRO-647.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908377#action_12908377 ] 

Owen O'Malley commented on AVRO-545:
------------------------------------

This is a regression. The 1.4.0 pom does not accurately reflect the dependencies. That is broken. Users should be able to depend on your pom and automatically get the transitive dependencies. Clearly, if you weren't breaking the rules by introducing a cycle in the dependency graph you *would* include the Hadoop jars in your pom.

I also noticed that your avro-tool jar includes everything from the build classes, which includes everything from the main avro jar. I assume that is unintentional.

If you are really going to ignore my patch and not fix this until the end of the year. I'll need to fork Avro back into Hadoop and I'd really rather avoid that. I already have more than enough work cleaning up Hadoop.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated AVRO-545:
-------------------------------

               Status: Patch Available  (was: Open)
    Affects Version/s: 1.4.0
        Fix Version/s: 1.4.1

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-545:
------------------------------

        Parent: AVRO-647
    Issue Type: Sub-task  (was: Bug)

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868919#action_12868919 ] 

Owen O'Malley commented on AVRO-545:
------------------------------------

Note that I'm just asking that the Avro jar that Hadoop depends on should not depend on any of the Hadoop jars. Moving the classes to the tools jar or some other jar would be fine.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Owen O'Malley
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated AVRO-545:
-------------------------------

    Attachment: avro-545.patch

This patch moves the mapred classes to a different jar.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Owen O'Malley
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey reassigned AVRO-545:
--------------------------------

    Assignee: Scott Carey  (was: Owen O'Malley)

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Scott Carey
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated AVRO-545:
-------------------------------

    Priority: Blocker  (was: Major)

This blocks Avro 1.4.1's adoption by Hadoop.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned AVRO-545:
----------------------------------

    Assignee: Owen O'Malley

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908268#action_12908268 ] 

Doug Cutting commented on AVRO-545:
-----------------------------------

> if Hadoop changes a method, it can require a new release of avro to use the two together

That seems true regardless of whether Avro's jars are split.  If Hadoop changes a method, Hadoop could continue to use its existing version of Avro, since Hadoop itself does not call the Avro methods that might call that changed method.  It would break users of Avro's MR API, but they'd be broken anyway until Avro's MR code is updated.  Also, the APIs that Avro uses are APIs that Hadoop has promised not to change, and Avro will not consider updating to newer Hadoop APIs until 0.22 has mostly replaced 0.20 in production, i.e., not for a while.

I don't see that upgrading Hadoop to Avro 1.4.0 would create a regression for users of Hadoop.  Without a regression, I don't see how this should be a blocking issue.

The argument for breaking up jars that Scott made in AVRO-647 is that it would permit our POMs to more precisely specify dependencies, so that, e.g., folks who don't use RPC don't pull in Jetty, which sounds like a useful improvement.  So I'd be okay committing a patch that creates POMs for new jars, etc.  But I'd rather see this done once, as a complete fix for AVRO-647 rather than piecemeal here, so that we only force Avro users to update their dependencies once.  I also believe that such an incompatible change would be more appropriate for Avro 1.5.0 than 1.4.1.  By the time that Hadoop removes its 0.20 APIs we'll long since have split Avro into multiple jars.

So my proposal is that we remove this as a blocker for Avro 1.4.1 but add AVRO-647 to the plan for Avro 1.5.0, which should be out before the end of the year.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Sub-task
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 1.4.1
>
>         Attachments: avro-545.patch
>
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868912#action_12868912 ] 

Owen O'Malley commented on AVRO-545:
------------------------------------

I understand the mechanics of how it has been accomplished. It is bad system design to make circular dependencies between projects. I consider this a blocker for Avro 1.4 being used by Hadoop Common, HDFS, or MapReduce.

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Owen O'Malley
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-545) Move mapreduce bindings out of avro jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868901#action_12868901 ] 

Scott Carey commented on AVRO-545:
----------------------------------

The mapreduce dependency in Avro is specified as "optional", which means that consumers of avro do not automatically pull it in as a dependency.  In other words, its not a transitive dependency.  

We can move it to another jar "avro-mapred.jar", but there is only a KB or two difference in jar size at the moment.

{code}
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
      <version>0.20.2</version>
      <optional>true</optional>
    </dependency>
{code}

In short, if Hadoop 0.22 depends on Avro 1.4, it does not imply that Hadoop 0.22 depends on Hadoop 0.20.2 due to the above "optional" flag.


http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html

>From the maven site:
{quote}
How do optional dependencies work?

Project-A -> Project-B

The diagram above says that Project-A depends on Project-B. When A declares B as an optional dependency in its POM, this relationship remains unchanged. Its just like a normal build where Project-B will be added in its classpath.

Project-X -> Project-A

But when another project(Project-X) declares Project-A as a dependency in its POM, the optional dependency takes effect. You'll notice that Project-B is not included in the classpath of Project-X; you will need to declare it directly in your POM in order for B to be included in X's classpath.

{quote}

> Move mapreduce bindings out of avro jar
> ---------------------------------------
>
>                 Key: AVRO-545
>                 URL: https://issues.apache.org/jira/browse/AVRO-545
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Owen O'Malley
>
> MapReduce should not depend on any jars (eg. avro's main jar) that also depend on the MapReduce jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.