You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Holger Hoffstätte (JIRA)" <ji...@apache.org> on 2011/01/14 16:27:45 UTC

[jira] Created: (AVRO-735) Split packages across artifacts

Split packages across artifacts
-------------------------------

                 Key: AVRO-735
                 URL: https://issues.apache.org/jira/browse/AVRO-735
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.5.0
            Reporter: Holger Hoffstätte
             Fix For: 1.5.0


I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.

The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.

As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:

org.apache.avro.specific is defined by acro, compiler AND ipc

org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc

It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.

Please let me know if this is an acceptable change and if you want me to provide help/patches etc.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-735) Split packages across artifacts

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982823#action_12982823 ] 

Scott Carey commented on AVRO-735:
----------------------------------

bq. I understand, but why does this preclude doing the easy/nonbreaking things now? I mean, I really cannot see any reason why avro:..ipc has to contain generic IOStream classes just because they are also used in ipc. Move them to avro:..io and done. Same for the AvroRemoteException - it can stay in avro alright, but there is no harm by moving it into avro..io or wherever. Bang: one down.

The only consideration there is public classes.  If its public, moving them is an API breakage.  We can do that, but until now there was little motivation.  This affects OSGi users and not others.  I'll have a look at moving the easy stuff like the IOStream and AvroRemoteException bits which were already discussed in the past (but we saw no harm in leaving them alone).


bq. So have public and private APIs?! No need to rely on package overlaps for that.

Someone would have to document it or use some annotation framework to declare what is public and private.   And the vast majority of users will not use such an annotation framework and will ONLY look at what can be accessed and the Javadoc. I feel that its always best to use the built-in java visibility when possible.  I'm not sure OSGi alone is enough of a driver to expose non-public APIs.  


bq. I think that just shows that those three really belong together, and that dependency problems come from ipc, mapred and the tools. avro, compiler and ipc together are still pretty small.

Their dependencies are not small.  Furhtermore, as I said it is _not possible_ to build all three at once with maven.   You need the output artifacts of one to generate code to compile another.  It is possible to do this in two modules (avro + compile, ipc), but not one.  Three makes the most sense since we want to separate out the compiler dependencies.  If you aren't using the compiler, why would you require pulling in its unique dependencies?  We want to be able to use any dependency we wish in the compiler without forcing the maven/ivy user to have those on their classpath by default.


bq. As an example, it's fairly easy to embed the avro/compiler/ipc trifecta and just block the imports that a bundle doesn't need (assuming the bundle has service-like standalone functionality).

If its easy to make an OSGi bundle with a couple jars then this is definitely preferred over making a custom bundled jar artifact.   Most likely this would be avro + ipc unless you need the compiler functionality to create java classes from .avpr .avsc or .avdl files.


I propose that we look at the low hanging fruit stuff for 1.5.0 -- simple things like AvroRemoteException that reduce the number of packages that overlap amongst artifacts.  As long as this doesn't force something dangerous to be publicly visible I'm fine with that.  We'll need others to agree however.

Items such as moving netty/jetty and tracing out on their own are definitely not in scope for 1.5.0, nor attempting to untangle ipc and avro.   The current structure may not be ideal for OSGi but it is a HUGE improvement for most  ivy and maven users.

> Split packages across artifacts
> -------------------------------
>
>                 Key: AVRO-735
>                 URL: https://issues.apache.org/jira/browse/AVRO-735
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Holger Hoffstätte
>             Fix For: 1.5.0
>
>
> I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.
> The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.
> As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:
> org.apache.avro.specific is defined by acro, compiler AND ipc
> org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc
> It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.
> Please let me know if this is an acceptable change and if you want me to provide help/patches etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-735) Split packages across artifacts

Posted by "Holger Hoffstätte (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982587#action_12982587 ] 

Holger Hoffstätte commented on AVRO-735:
----------------------------------------


> The initial move did not try and solve this problem because it is
> trickier than it looks and would complicate the initial split ticket
> significantly.

That's fine - probably would have done the same thing initially. What I don't understand is: if the separate artifacts are tied to each other anyway, why split them up in the first place?
If the dependency for ipc, mapred and tools were the motivation then maybe avro, compiler and ipc should have stayed one project, with ipc's transport-specific dependencies separated out.
Sorry, not blaming anyone..came too late to the party to suggest otherwise :/

> There are some cases that would be simple moves, but there are others
> that simple movement would require changing classes or methods with
> package scope visibility to public, which is not acceptable.  To
> avoid this there is more refactoring required and some API breakages.

I understand, but why does this preclude doing the easy/nonbreaking things now? I mean, I really cannot see any reason why avro:..ipc has to contain generic IOStream classes just because they are also used in ipc. Move them to avro:..io and done. Same for the AvroRemoteException - it can stay in avro alright, but there is no harm by moving it into avro..io or wherever. Bang: one down.

> This is a change that I agree we should attempt, but I'm not
> convinced that we should do so before 1.5.0 or that it is even
> possible.  If it is we could introduce the resulting API breakages in
> a later release.  1.5.0 may be very soon.

..which is why I wanted to fix the easy (really trivial) things now; I am not at all suggesting the full surgery in the last minute. My understanding is that 1.5 is already another break compared to previous versions (see the Hadoop-related drama). Selling more breaks later will just get harder and harder.
Can't really put the horseshoes under the horse when it's out of the barn..

> Conceptually, the requirement that we can't share packages across
> jars means that avro-ipc can only use public API's to work with avro
> -- and that may never be desirable.

So have public and private APIs?! No need to rely on package overlaps for that.

> Its not possible to build avro-ipc and avro using Maven in the same
> project -- avro-ipc requires compiling schema files into Java
> classes.  In order to compile those schema files, the build needs to
> have already created the avro-compiler artifact which depends on
> avro.

I think that just shows that those three really belong together, and that dependency problems come from ipc, mapred and the tools. avro, compiler and ipc together are still pretty small.

> Would it be possible for OSGi to simply not support a smaller bundle
> than avro + avro-ipc?  I think all other components can separate
> cleanly by package. Alternatively, we could build a variation
> avro-ipc.jar that shades in avro.jar that could be the smallest unit
> for OSGi.  This however would mean that all Avro users have to pull
> in jetty and netty even if they aren't using those features.

I'm not convinced that trying to build "special" artifacts is going to fix anything in either the short, medium or long runs. As an example, it's fairly easy to embed the avro/compiler/ipc trifecta and just block the imports that a bundle doesn't need (assuming the bundle has service-like standalone functionality). This would only be necessary for no good reason whatsoever, increase bloat and cost everyone's time over and over again.
I fully agree that not every jar has to be bundleized by itself (as some people try and complain about..), but if the jar is useless on its own without a set of add-ons - why are they separate in the first place?

Maybe I should have explained my initial motivation for all this earlier :)
I intend to use avro-ipc as a transport layer for OSGi RemoteServices, and probably would have been fine with split packages etc. since I can just embed the jars into the transport bundle and block stuff I don't need, as described above. But since 1.5.0 is already a breaking release I figured we can fix the easy things now, so that I can go spelunking on the not-so-easy things afterwards, for 1.6/2.0.

> Another approach would be to trim the dependencies from avro-ipc down
> by removing implementations like netty and jetty.  Then we could have
> a separate jar with those implementations, which could be in a
> different package.

This would have been my step 3 or 5 :)
Definitely a good way forward and also very useful for non-OSGi (plain maven etc.) users.

Not sure if that helped? I don't want to hold up the release.


> Split packages across artifacts
> -------------------------------
>
>                 Key: AVRO-735
>                 URL: https://issues.apache.org/jira/browse/AVRO-735
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Holger Hoffstätte
>             Fix For: 1.5.0
>
>
> I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.
> The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.
> As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:
> org.apache.avro.specific is defined by acro, compiler AND ipc
> org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc
> It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.
> Please let me know if this is an acceptable change and if you want me to provide help/patches etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-735) Split packages across artifacts

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985234#action_12985234 ] 

Scott Carey commented on AVRO-735:
----------------------------------

I did not expect to be able to completely separate packages to be distinct per artifact, but the patch in the child task AVRO-737 does just that.

There is more to do on the test side, but that can happen after 1.5.0.

> Split packages across artifacts
> -------------------------------
>
>                 Key: AVRO-735
>                 URL: https://issues.apache.org/jira/browse/AVRO-735
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Holger Hoffstätte
>             Fix For: 1.5.0
>
>
> I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.
> The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.
> As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:
> org.apache.avro.specific is defined by acro, compiler AND ipc
> org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc
> It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.
> Please let me know if this is an acceptable change and if you want me to provide help/patches etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-735) Split packages across artifacts

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981857#action_12981857 ] 

Scott Carey commented on AVRO-735:
----------------------------------

The initial move did not try and solve this problem because it is trickier than it looks and would complicate the initial split ticket significantly.

There are some cases that would be simple moves, but there are others that simple movement would require changing classes or methods with package scope visibility to public, which is not acceptable.  To avoid this there is more refactoring required and some API breakages.   

This is a change that I agree we should attempt, but I'm not convinced that we should do so before 1.5.0 or that it is even possible.  If it is we could introduce the resulting API breakages in a later release.  1.5.0 may be very soon.

Conceptually, the requirement that we can't share packages across jars means that avro-ipc can only use public API's to work with avro -- and that may never be desirable.  
Its not possible to build avro-ipc and avro using Maven in the same project -- avro-ipc requires compiling schema files into Java classes.  In order to compile those schema files, the build needs to have already created the avro-compiler artifact which depends on avro.

Would it be possible for OSGi to simply not support a smaller bundle than avro + avro-ipc?  I think all other components can separate cleanly by package.
Alternatively, we could build a variation avro-ipc.jar that shades in avro.jar that could be the smallest unit for OSGi.  This however would mean that all Avro users have to pull in jetty and netty even if they aren't using those features.

Another approach would be to trim the dependencies from avro-ipc down by removing implementations like netty and jetty.  Then we could have a separate jar with those implementations, which could be in a different package.


> Split packages across artifacts
> -------------------------------
>
>                 Key: AVRO-735
>                 URL: https://issues.apache.org/jira/browse/AVRO-735
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Holger Hoffstätte
>             Fix For: 1.5.0
>
>
> I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.
> The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.
> As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:
> org.apache.avro.specific is defined by acro, compiler AND ipc
> org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc
> It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.
> Please let me know if this is an acceptable change and if you want me to provide help/patches etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (AVRO-735) Split packages across artifacts

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey resolved AVRO-735.
------------------------------

    Resolution: Fixed
      Assignee: Scott Carey

This was resolved in AVRO-737.   I had expected to only partially resolve this there, but ended up completing all of it.

> Split packages across artifacts
> -------------------------------
>
>                 Key: AVRO-735
>                 URL: https://issues.apache.org/jira/browse/AVRO-735
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Holger Hoffstätte
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>
> I was glad to see the ongoing work for a more modular build (thanks Scott Carey!). Whilst looking into the cross-platform IPC facilities for use in OSGi I noticed something that makes OSGi compatibility (and maintenance) more difficult than necessary, for no good reason. I plan to submit OSGi bundle patches later (though not necessarily for the 1.5.0 release) so this is a necessary prelude.
> The term "split packages" refers to the situation that two artifacts carry the same packages, which means that the classes in both packages are more or less randomly munged together at runtime. This unfortunate situation is "mostly" without consequence in "normal" flat-classpath Java (assuming there are no overlaps!), but bad for OSGi since class visibility & wiring is based on package visibility. Split packages generally make any form of automatic package resolution (for deployment) almost impossible.
> As far as I can see there are several classes in packages across artifacts that can easily be moved a bit without really disturbing anything. Some examples:
> org.apache.avro.specific is defined by acro, compiler AND ipc
> org.apache.avro.ipc (!) is defined in avro and contains classes that could go into avro:avro.io (the buffers) or avro-ipc:org.apache.avro.ipc
> It seems that the previously unmodular package membership of classes has been carried over during the artifact separation. I'd like to see this cleaned up as well before the 1.5.0 release, as this is a breaking change. However, most of the overlaps can be fixed easily with IDE refactorings like package renaming or by moving classes.
> Please let me know if this is an acceptable change and if you want me to provide help/patches etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.