You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2010/09/01 03:25:53 UTC

[jira] Created: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
--------------------------------------------------------------

                 Key: AVRO-647
                 URL: https://issues.apache.org/jira/browse/AVRO-647
             Project: Avro
          Issue Type: Improvement
          Components: java
            Reporter: Scott Carey
            Assignee: Scott Carey


Our dependencies are starting to get a little complicated on the Java side.

I propose we build two (possibly more) jars related to our major dependencies and functions.

1. avro.jar  (or perhaps avro-core.jar)
This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).

2. avro-dev.jar
This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.

3. avro-hadoop.jar
This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904866#action_12904866 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Thoughts?

I know how to do the above with Maven directly, but I'm not as familiar with Ivy.  Would we need one ivy.xml files per jar/pom combination we want to build?  For some things this clearly breaks up by package:

o.a.a.mapred
o.a.a.mapred.tether 
o.a.a.pig
 >>> avro-hadoop.jar

But some things such as the dev tools would be more difficult.  I'm not sure we would choose to separate those from core.   We could instead specify the dev dependencies such as javacc as 'optional' in the pom / 'transisive=false' in ivy.

Two things jump out as definitely important to separate:
1. Hadoop, etc.
2. A future maven plugin for idl/specific compilers.

Before I add a pig dependency I'd like to sort out our packaging and dependency strategy here.


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905172#action_12905172 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

> Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use.

That might be a useful short-term strategy: make more dependencies optional and document which features require what dependencies.


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913276#action_12913276 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

> ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io.

Those should perhaps move to the io package.

> In order to make a 'core' library I moved Requestor and Responder to avro-ipc.

You moved these along with the various implementations of Requestor and Responder, so the jar splits don't correspond to java packages, right?  If we embrace that approach generally, then we wouldn't move any classes to different packages at this stage.  Rather the different trees and jars can overlap in the java packages they contain.  The only incompatibility we create at this point will be in packaging, not in any APIs.  It would be good to separate API changes from packaging changes.

So we'd then leave ByteBufferInputStream, ByteBufferOutputStream and AvroRemoteException in the ipc package, but include them in the core jar & tree.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904864#action_12904864 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

Definitely +1 to the idea.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905124#action_12905124 ] 

Scott Carey commented on AVRO-647:
----------------------------------

bq. Finally, to be clear, is there a motive for this beyond better expressing dependencies? Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve. Is that right, or are there other problems too?

That is the main case here.  Dependendies become more explicit.  Users should be able to consume the parts they need without too much accidental baggage.  Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use.

However, Avro is by nature something that many things will depend on, and many of those things portions of Avro might itself depend on.  In particular, making it easy to avoid circular dependencies is a plus.  As we have seen (https://issues.apache.org/jira/browse/AVRO-545) , even if it is possible to use ivy/maven features to prevent circular dependency, it makes users uneasy.

The guidelines I use for my projects is two-fold:
* If the cascaded set of dependencies is large and likely to conflict with other things, it should be easy to separate (for Avro, this is the hadoop dependency).
* If the dependency is physically large (large jar file), consider making it easy to separate.
* If the dependency is for a minor rarely used feature, be careful.  For example Jackson 1.0.1 being used by hadoop 0.20+ for dumping configuration files to JSON causes problems.

So for the case of Reflect, if paranamer doesn't have a lot of cascaded dependencies itself, nor is a large jar on its own, then including it in avro-data is not going to be a big deal.  

bq. If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree. 

We have three choices, I think:
1.  Leave the source tree as-is, and have the build use ant file excludes/includes to define what is packaged in each one.   Managing the excludes/includes will be troublesome and would be easier if the split was cleanly done by package.  Not much else would have to change -- the compile and test phases would stay the same.  There would also be the downside that tests would not implicitly test the packaging boundaries.
2.  Break it into different source trees and continue using ant/ivy.  This is more work and means we would be breaking up tests and compile phases too.
3.  Break it into different source trees and use maven.  Maven is a natural fit for this sort of thing and I'm experienced with it, but it is not trivial and others here aren't as familiar with it.  To wire up IDL and the Specific compiler,  Maven plugins would be required.  Interop testing would probably still require ant. 


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-647:
-----------------------------

    Attachment: AVRO-647.patch
                migrateAvro.sh

h3.  Avro maven build patch and jar split-up.
This is a mostly complete patch for splitting the Java portion of the avro project up into 6 sub-projects.
This requires a maven plugin, and so solves AVRO-159 and part of AVRO-572 as well.

The project structure is as follows:

|| path from lang/java | artifactId | name | artifact type | notes |
| / | avro-parent | Apache Avro Parent | pom | parent, inherits from Apache master pom, sets common build properties and versions |
| /avro/ | avro |Apache Avro | jar | discussed as "avro-core" previously |
| /compiler/ | avro-compiler | Apache Avro Compiler | jar | Avro IDL compiler and Specific compiler, including ant tasks | 
| /maven-plugin/ | avro-maven-plugin | Apache Avro Maven Plugin | maven-plugin | Maven mojos for avpr > java; avsc > java; avdl > java; |
| /ipc/ | avro-ipc | Apache Avro IPC | jar | Avro IPC components, protocols, trancievers, etc |
| /mapred/ | avro-mapred | Apache Avro Mapred API | jar | An org.apache.hadoop.mapred API using Avro serialization |
| /tools/ | avro-tools | Apache Avro Tools | jar (with dependencies) | A single jar containing all of Avro and dependencies, with command line tools |

h4. Status
* Compiles, all tests run other than some IPC ones I need help with.  Those don't work for me (and have not for 6+ months on my machine).
* This is not integrated with the other language builds yet.  There is a little work left there to tie the master buld to this.
* This does not yet delete the old directory structure, so side-by-side comparrison is possible.
* There are other changes / enhancements to the build and test process that can leverage this.  I'm trying to get a commit done with the basics soon, we can open other tickets up for cleanup and enhancements.  This is a big checkin with guaranteed merge issues.  If we can get most of it in, that will solve the merge difficulties.
* I have not gotten the 'with dependencies' part of avro-tools copmlete,  that should not block reviewers from having a look.

h3. Patching Instructions
Example instructions.  Change to the lang/java directory, run the shell script, then the patch, then add the new items.  The patch and script is based off of lang/java.
{noformat}
$ cd lang/java
$ ./migrateAvro.sh
$ patch -p0 < ../../AVR0-647.patch
$ svn add pom.xml avro/pom.xml compiler/pom.xml maven-plugin/pom.xml ipc/pom.xml mapred/pom.xml tools/pom.xml
$ svn add maven-plugin/src/main/java/org/apache/avro/mojo/*
{noformat}

h3. Building Instructions: command-line

To clean build all components without testing and install them in your local repository:
{noformat}
$ mvn clean install -Dtest=false -DfailIfNoTests=false
{noformat}

To compile only:
{noformat}
$ mvn compile
{noformat}

To run tests:
{noformat}
$ mvn test
{noformat}

To install to local repo, including running tests:
{noformat}
$ mvn install
{noformat}

Other useful mvn commands:
{noformat}
$ mvn clean
$ mvn validate
$ mvn help:effective-pom
$ mvn site
$ mvn generate-resources
{noformat}

To download all available javadoc and source of dependent projects into your local repo:
{noformat}
$ mvn dependency:resolve -Dclassifier=javadoc
$ mvn dependency:resolve -Dclassifier=sources
{noformat}

h3. Building Instructions: Eclipse
Use Eclipse 3.6 Helios: http://www.eclipse.org/downloads/
Use the m2Eclipse plugin, latest version.
* Load the projects into the workspace using the "Import ..." dialog, and select "Existing Maven Projects"
* Select the lang/java directory, and it should show all 7 projects including the parent.  Import all of these.
* After the load and first build, it will not completely compile.  To fix it up to compile, select all of the projects and right-click.  Select *Maven > Update Project Configuration*.

h4. More maven information:

These are a good start:
http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html
http://maven.apache.org/guides/introduction/introduction-to-the-pom.html

For those new and experienced, "Maven By Example" is a very good intro -- especially chapters 3+
http://www.sonatype.com/books/mvnex-book/reference/public-book.html

Apache's maven policies, tips, etc:
http://www.apache.org/dev/publishing-maven-artifacts.html#inherit-parent

Plugins used include:
http://mojo.codehaus.org/javacc-maven-plugin/
http://maven.apache.org/plugins/maven-surefire-plugin/
http://maven.apache.org/plugins/maven-checkstyle-plugin/
http://paranamer.codehaus.org/

Other useful plugins:
http://mojo.codehaus.org/build-helper-maven-plugin/usage.html
http://mojo.codehaus.org/cobertura-maven-plugin/
http://maven.apache.org/plugins/maven-shade-plugin/

h4. Documentation
Much of this message is preliminary documentation.  Please comment on it as well. 

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>         Attachments: AVRO-647.patch, migrateAvro.sh
>
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922215#action_12922215 ] 

Scott Carey commented on AVRO-647:
----------------------------------

I'll have time in the first week of November to work on it and produce a patch to review.    

About half the work is shared in both solutions:   Splitting up the projects into directories, moving classes around, keeping track of all the 'svn mv' and 'svn add' commands that will be required.   That, in combination with figuring out some of the more complicated testing bits, is what primarily stalled me.
However, I think I already got past the difficult most difficult parts related to Requestor/Responder, but I won't know for sure until I try and tie the rest of it together. 



> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928723#action_12928723 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Another minor detail, naming:

avro-ipc or avro-rpc ?   I really don't care.
avro-mapred -- we might end up with avro-mapreduce as well for the newer api, so I stuck with the package name of the hadoop api. 

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-647.patch, migrateAvro.sh
>
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913116#action_12913116 ] 

Scott Carey commented on AVRO-647:
----------------------------------

More experimentation on the Avro front has brought out some interesting quirks in our dependencies.


1.  I'm not sure it makes sense to separate IDL and Specific from Core.   It turns out that the only extra library required as a runtime dependency for those two is commons-lang, and the _one_ class and method used there we could simply copy into our code to avoid the dependency.   Javacc is a build time only dependency that should not show up in our POM at all.   Paranamer-ant is the same.  Both have Maven plugins.  The upcoming templating version of SpecificCompiler might change what we want to do though.

2.  A LOT of our testing requires use of the Specific Compiler.  Most of the ipc package depends on output of the Specific Compiler to compile; Requestor/Responder are at the heart of that.  This would require that these be in a separate artifact.   The Maven artifacts would be 

avro-core (possibly with IDL)
avro-compile (optional, current version can be in core, template based one may require separation or shading)
avro-maven-plugin (Maven plugins for idl, specific compiler; depends on core and compile)
avro-ant  (the two classes for Ant tasks; depends on core, compile)
avro-ipc  (IPC  w/ netty/jetty; depends on core, compile, uses maven-plugin;  most testing is not possible until here!)
avro-mapred (including tether, or that separate?)
avro-tools


That is a lot of stuff, but really only 4 libraries that others can depend on, two build tools, and one command-line tool. 
The part that is a bit of a problem is that most of our testing of core can't happen in the core project because of its dependencies on specific compiler output. 

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913281#action_12913281 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Yes, the path I have gone down so far has split the classes in packages amongst the jars.  Moving Requestor means moving several things that go with it.  There is no way at this time to split by package in most places.

Having it be easier in a few places would be nice however.  The list of 'svn cp' s to run before applying the patch is getting very messy!  If there are a few places where an entire package can move save for one or two classes, it might be worthwhile to move them.    specific/reflect/generic are going to be split no matter what as far as I can tell -- and rightfully so at this time.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913211#action_12913211 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

> separating specific, generic, and reflect is meaningful.

I agree they're logically separate, but I think we want to avoid slicing things into 20 logically distinct jars.

> There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead. [ ... ]

Which classes are you thinking of?  I think we should resist the tendency to move things into util when we can't figure out where they belong.

Generic, specific and reflect all depend on ipc for Requestor and Responder.  The complicated bit is that ipc depends on the specific compiler for Handshake{Request,Response}.  So perhaps {Generic,Specific,Reflect}{Requestor,Responder} should all move to ipc to remove that circularity.  That would make the build easier.


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913184#action_12913184 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

BTW, I don't know how easy it is to separate (I suspect not easy), but separating specific, generic, and reflect is meaningful.

For testing, I think it's not harmful, in large part, for the test targets to depend on everything.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922229#action_12922229 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

Sounds great.  Thanks for the update!

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912771#action_12912771 ] 

Scott Carey commented on AVRO-647:
----------------------------------

It sounds like there is at least consensus to split the source tree up.  This will make either Ant or Maven easier to deal with to get the job done.   So that rules out #1.

{quote}
> To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.

Can you please explain these more?
{quote}

IDL and the Specific compiler depend on Avro core to run.  We have a multi-step build:  build classes that don't {depends.on.generated} then generate some stuff, then build those classes.

In maven, its not strictly required, but very difficult, to do something like the above without declaring the dependency and making it its own artifact.  Basically, the easy way is to split things up into core, rpc, idl, mapred, and tools and build them in the right order as separate components with explicit dependencies.
The easy way to do code generation is to make a maven plugin like AVRO-159 and use it in the build.  Fortunately, that means that Maven plugins for Specific and IDL are part of our own build and thus natural for us to maintain.

I have made a pom.xml that will build avro, but it excludes the {depends.on.generated} stuff and doesn't do any tests that require code generation or interop.

I haven't looked at how to do interop testing yet, but it seems like something that is at a higher level than the Java build.  Maven doesn't naturally pull data from anywhere that is not within the project or a declared artifact.   That might end up being easier to wire up with the other language builds using ant or shell scripts.


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928442#action_12928442 ] 

Scott Carey commented on AVRO-647:
----------------------------------

more remaining bits:

adding RAT to the build.
http://incubator.apache.org/rat/apache-rat-plugin/index.html

fixing checkstyle
adding the apache license header to the pom files -- though the parent pom build process does seem to add that somehow.


more notes:
I had to copy a couple common test classes, we might want a build-only artifact for test-tools and test-resources.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>         Attachments: AVRO-647.patch, migrateAvro.sh
>
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912725#action_12912725 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

I would be +1 full-maven for Java.  Amongst the evils available, it's one of the least objectionable.  I'm using it on another project now, and, well, I hate that I don't know what it's doing half the time, but it removes a considerable amount of the Ivy and Ant boilerplate.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905063#action_12905063 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

A breakdown by use-case might be:
 - avro-data (core & data files)
 - avro-rpc (includes netty, jetty) depends on avro-data
 - avro-mapred (mapreduce APIs) depends on avro-data
 - avro-mapred-tether (RPC-based mapred API) depends on avro-mapred & avro-rpc
 - avro-dev (specific & idl compiler, ant tasks) depends on avdo-data

About dependencies:
 - paranamer is used by reflect, to get the names of method parameters.  Perhaps avro-reflect should be made a separate jar?
 - velocity is used by RPC stats charting stuff and by AVRO-648 (template-based specific compiler)
 - commons-lang is used by the IDL compiler for StringEscapeUtils

If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree.  The build order would then be: data, mapred, dev, rpc, mapred-tether, since rpc depends on dev to compile the handshake.  Note that this would split packages among trees, as specific has some data classes and some rpc classes.

Finally, to be clear, is there a motive for this beyond better expressing dependencies?  Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve.  Is that right, or are there other problems too?

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922163#action_12922163 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

Scott, are you going to be able to complete this as a Maven conversion of the Java build, or should I tackle it with Ivy & Ant?

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905085#action_12905085 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

I may be missing something: what's http-client used for in the tools category?

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913250#action_12913250 ] 

Scott Carey commented on AVRO-647:
----------------------------------

bq. Which classes are you thinking of?

ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoderEncoder and we should consider moving them to util or io.
AvroRemoteException is referenced in many places as well.  

{quote}
Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake{Request,Response}. So perhaps {Generic,Specific,Reflect}{Requestor,Responder} should all move to ipc to remove that circularity. That would make the build easier.
{quote}

In order to make a 'core' library I moved Requestor and Responder to avro-ipc.  It was the cleanest break that allowed the Generic/Specific/Reflect API to otherwise remain.

Moving them all to ipc doesn't remove the circularity, you still can't build Requestor/Responder without first building SpecificCompiler and generating classes.   With Specific in 'core' ant tasks / maven plugins for the SpecificCompiler can be built off of core, and then ipc can be built after generating the classes that Requestor/Responder need using the just-built ant/maven tool.

Unless we figure out how to extract the dependency on generated code in Requestor/Responder (wrappers?), it looks like we have to build the SpecificCompiler before Requestor/Responder. 


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905134#action_12905134 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Yeah, I'm actually using a custom avro-maven-plugin based on the earlier versions of that for my build (the early versions did not compile avsc, only avpr).  So that part should not be too hard.   It would be a very radical change from ant/ivy though and there are bound to be some tricky things in a big change like that. 

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904876#action_12904876 ] 

Scott Carey commented on AVRO-647:
----------------------------------

So this is a rundown of what I know of the dependencies and what features use them:

core requirements:
jackson -- JSON
SLF4J -- logging
jetty -- HTTP transport
netty -- Socket transport

development:
javacc

tools:
commons-httpclient
jopt-simple

build/test only:
junit
maven
ant-eclipse
rat
checkstyle

I'm not sure :) :
paranamer
velocity
commons-lang


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912786#action_12912786 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

To be clear, I'm too much of a maven incompetent to volunteer.  I would be happy to test it out after the fact, though.

BTW, it would be totally acceptable and desirable for the maven plugins for avro code generation to be part of Avro's build.  Patrick, who wrote the plugin, would be happy to contribute it, if he hasn't already.  That solves a versioning problem for the plugin, too.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905131#action_12905131 ] 

Philip Zeyliger commented on AVRO-647:
--------------------------------------

If you go the mvn route (the one thing I love about maven is that it reliably puts the sources of the jars we depend on in my Eclipse workspace), http://github.com/phunt/avro-maven-plugin handles some of the specific stuff.

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912794#action_12912794 ] 

Scott Carey commented on AVRO-647:
----------------------------------

I think what is key here is whether the maintenance can be shared easily.  Only one person has to build this out (me for Maven, or likely Doug if Ant/Ivy).

Both take significantly less effort or expertise to modify and tweak once its mostly set up.

Even if we go with Maven,  Ant will be around to deal with things Maven doesn't do well.  They are complimentary tools.

At this point I've got Maven working as far as it will easily go without moving source trees around and splitting up the build.  That is a significantly larger time investment.  It doesn't look too difficult to keep going however.



> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913198#action_12913198 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Do you mean separating out all three from the 'inner' decoder/encoder/schema layer?  Or separating out those individually?

Separating Specific from the rest was easy.  However, it turned out to only be a handful of classes without external dependencies, so there wasn't much of a point.

There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead.   I think what I might try to do first is some refactoring to clean up that sort of stuff.


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912721#action_12912721 ] 

Scott Carey commented on AVRO-647:
----------------------------------

Are there any strong feelings on the three choices above?  To some extent I favor just going all the way to a maven build.  That makes dependency management easy, but does add baggage otherwise and is a learning curve for some. 




> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928818#action_12928818 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

> avro-ipc or avro-rpc ?

I don't care either.  I don't think it's worth renaming the Java package, yet RPC is the better-known term and what we tend to use in documentation.  It's unfortunate to have two terms for the same thing, but I don't see how we can easily rectify that now.

avro-mapred - [ ...] I stuck with the package name of the hadoop api.

+1



> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-647.patch, migrateAvro.sh
>
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913250#action_12913250 ] 

Scott Carey edited comment on AVRO-647 at 9/21/10 5:02 PM:
-----------------------------------------------------------

bq. Which classes are you thinking of?

ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io.
AvroRemoteException is referenced in many places as well.  

{quote}
Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake{Request,Response}. So perhaps {Generic,Specific,Reflect}{Requestor,Responder} should all move to ipc to remove that circularity. That would make the build easier.
{quote}

In order to make a 'core' library I moved Requestor and Responder to avro-ipc.  It was the cleanest break that allowed the Generic/Specific/Reflect API to otherwise remain.

Moving them all to ipc doesn't remove the circularity, you still can't build Requestor/Responder without first building SpecificCompiler and generating classes.   With Specific in 'core' ant tasks / maven plugins for the SpecificCompiler can be built off of core, and then ipc can be built after generating the classes that Requestor/Responder need using the just-built ant/maven tool.

Unless we figure out how to extract the dependency on generated code in Requestor/Responder (wrappers?), it looks like we have to build the SpecificCompiler before Requestor/Responder. 


      was (Author: scott_carey):
    bq. Which classes are you thinking of?

ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoderEncoder and we should consider moving them to util or io.
AvroRemoteException is referenced in many places as well.  

{quote}
Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake{Request,Response}. So perhaps {Generic,Specific,Reflect}{Requestor,Responder} should all move to ipc to remove that circularity. That would make the build easier.
{quote}

In order to make a 'core' library I moved Requestor and Responder to avro-ipc.  It was the cleanest break that allowed the Generic/Specific/Reflect API to otherwise remain.

Moving them all to ipc doesn't remove the circularity, you still can't build Requestor/Responder without first building SpecificCompiler and generating classes.   With Specific in 'core' ant tasks / maven plugins for the SpecificCompiler can be built off of core, and then ipc can be built after generating the classes that Requestor/Responder need using the just-built ant/maven tool.

Unless we figure out how to extract the dependency on generated code in Requestor/Responder (wrappers?), it looks like we have to build the SpecificCompiler before Requestor/Responder. 

  
> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-647:
-----------------------------

    Fix Version/s: 1.5.0

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-647.patch, migrateAvro.sh
>
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912745#action_12912745 ] 

Doug Cutting commented on AVRO-647:
-----------------------------------

I'm +0 for a full-Maven Java build.  I'd not oppose if someone else implements it, it's easy to maintain, supports what's required, etc.

If I were to do it myself, I'd probably use Ant, split the tree in four (core, idl+rpc, mapred, tools), have each import a shared build.xml file then have a top-level build.xml that calls the others.  I would be willing to do this over the coming month if no one else volunteers.

But if someone else (Scott?, Philip?) volunteers to implement this using Maven, I'd not get in their way.

> To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.

Can you please explain these more?


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop.  This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.