You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by "Erich Schubert (JIRA)" <ji...@apache.org> on 2012/09/25 16:55:07 UTC

[jira] [Created] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Erich Schubert created BIGTOP-713:
-------------------------------------

Summary: use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
Key: BIGTOP-713
URL: https://issues.apache.org/jira/browse/BIGTOP-713
Project: Bigtop
Issue Type: Improvement
Components: Debian
Affects Versions: 0.5.0
Reporter: Erich Schubert
Priority: Minor

debhelper can automate a lot of common things in debian package creation.

The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.

For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.

Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz

You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:

%:
dh $@

Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.

If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.

Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.

And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...

Even within the bigtop packages this leads to a totally unnecessary overlap:

995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
[...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "James Page (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462911#comment-13462911 ] 

James Page commented on BIGTOP-713:
-----------------------------------

Re source formats for Debian packages - we discussed this on the -dev mailing list a while back; I even volunteered to that a look but #fail I've not managed to spend any time other than teaching the build process to understand *.debian.tar.gz that source/format 3.0 produces.


I need to commit some time to looking at this for the next release.


                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464960#comment-13464960 ] 

Roman Shaposhnik commented on BIGTOP-713:
-----------------------------------------

bq. so with a 0 patching policy this means you cannot build Hadoop Pipes on current Debian

No disagreement there -- in a patch discussion I'd like to separate policy from capability. IOW, I complete agree that having a capability that would allow you to manage patches via the same infrastructure that Bigtop provides would be extremely helpful to projects like Debian. Given the Bigtop cross-distro nature I would like this capability to be cross-distro as well, but perhaps we can map it efficiently to Debian/RPM toolset. The question here, of course, is who would be doing the actual work ;-) And that's where we get back to a policy discussion -- as a matter of policy Bigtop doesn't do patches for OUR binary artifacts (the actual DEB/RPM packages that we publish) hence there's not much incentive for us to invest, but we'd love for this contribution to come from some of our community members (hint-hint ;-))

bq. conflicting library versions (And in some cases, it is easiest to patch (or recompile, for binary packages) some dependant software, to only have to provide one version of a library)

Unfortunately our experience has been that it is incredibly difficult to harmonize the versions of jars across such a gigantic stack that Hadoop ecosystem ended up being. It basically comes down to things downright breaking if you try to substitute versions.

bq. Debian java packaging already manages a symlink farm of the type:

Wait, are you saying that it is possible to install as many versions of foo.jar on Debian as I want? If so, please elaborate.

bq. A typical java wrappers script looks like this

I'm going to take a look. Stay tuned.


                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Assignee: Roman Shaposhnik
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463491#comment-13463491 ] 

Roman Shaposhnik commented on BIGTOP-713:
-----------------------------------------

First of all, we would love for some of the core Debian developers/maintainers to help us with making Bigtop a better citizen there. Especially when it comes to managign Java. If you're interested -- it would be extremely nice to have a thread on the bigtop-dev mailing list aimed at helping us implement some of the improvements. 

bq. For example, current Hadoop (0.23.3) 

Just to clarify something: Hadoop 0.23.3 is *really* not a current release of Hadoop. If you want stable pre-YARN Hadoop go with 1.X code line, if you want YARN and the latest HDFS goodness go with 2.X. I don't think outside of a few use cases there's any reason to use Hadoop 0.23.3 today.

bq. Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches.

Bigtop has had a policy of 0 patching so far. We never ever patch upstream components for our own releases. That said, providing such capability would be useful for folks who need Bigtop as a Hadoop stack management system.

bq. Furthermore, "java-wrappers" is a Debian and Ubuntu package 

Could you elaborate on what functionality you'd suggest us to leverage from there? Also, since we have to support a variety of different distros -- whether something similar is available in the rest of them? BIGTOP-276 aims at solving the most thorny issue of all -- classpath management in the presence of conflicting requirements (e.g. Hadoop wanting X.Y version of guava.jar and Zookeeper wanting A.B version, etc.).

bq. And seriously, /usr/lib/hadoop/lib is a *mess. There even is a package in there with a "" in the file name. 

Couldn't agree more :-( As I said -- anything that can help us sort out the classpath hell should be discussed on BIGTOP-276. We definitely shouldn't be shipping *identical* jars (at least symlinks should be done) but I really don't think we can get rid of the requirement of shipping different versions of the same jar to satisfy requirements of different project in the Hadoop ecosystem (that is also the reason why we can't simply depend on the jars provided by the distribution).

Anyway, any kind of help will definitely be appreciated provided that changes are applicable to Lucid+ Ubuntus and lenny+ debians.
                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "Erich Schubert (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463579#comment-13463579 ] 

Erich Schubert commented on BIGTOP-713:
---------------------------------------

bq. Hadoop 0.23.3

I was under the impression that 0.23.3 is the current hadoop release. The version numbering of hadoop is a mess. If you read the changelog for 2.0.0-alpha, the first line identifies it as 0.23.1 - same for 2.0.1-alpha. So I was under the assumption that 0.23.3 - the latest release, 2 months after 2.0.1-alpha - was actually the newest version. Just that nobody rebranded it as 2.0.2-alpha or so. And upstream subversion uses 3.0.0 everywhere IIRC.

bq. patching

Debian in would love to also have 0 patches. However, if you want to get a bug fixed quickly for your users, it often is best to fix it, make a patch, send that out to user users for testin and upstream for inclusion. Debian changelogs are full of entries like "remove patches ..., included upstream" (and also patches that were solved differently by upstream). But in fact the compile fix I mentioned - fixed in Hadoop SVN the same way - is a good example for the need of patching. It won't compile otherwise, so with a 0 patching policy this means you cannot build Hadoop Pipes on current Debian, because it has a too new GCC.

bq. conflicting library versions

Again, this is a problem that does not only affect Hadoop. In my personal opinion, it is a consequence of how dependencies are handled in the Java community. You leave it to your users (and maven) to get all the jars you need. If people would care more about having to use one system to manage the dependencies for all of the java software they use, they would be more aware of such conflicts. And of course they also occur with binary libaries. It is common for distributions to take care of this, and they will also try to offer multiple version of a library when incompatible.
And in some cases, it is easiest to patch (or recompile, for binary packages) some dependant software, to only have to provide one version of a library.

Debian java packaging already manages a symlink farm of the type:
xml-apis-ext.jar -> xml-apis-ext-1.4.01.jar

So packages can use "any version of xml-apis-ext.jar", for example. For explicit version dependencies, you would have a versioned depend on the package, obviously. Most of the version dependencies are a ">= x.y" type, a few are of the type "< z" (when e.g. an API changes for a major version).

When it is known that a package breaks API compatibility, the distributions should take care to make them installable at the same time. For example GNU trove 2 and GNU trove 3 are not API compatible. Debian ships them as "trove.jar -> trove-2.x.y.jar" and "trove-3.jar -> trove-3.0.3.jar" symlinks. So far, the packages depending on trove 2 or trove 3 continue to work...

.bq java-wrappers

I believe they allow apps to specify e.g. "java6", "java7" and the wrappers may choose a different java runtime than the system default.

A typical java wrappers script looks like this:
{noformat}
#!/bin/sh
. /usr/lib/java-wrappers/java-wrappers.sh
find_java_runtime openjdk6 sun6
find_jars app batik fop
run_java mainclass "$@"
{noformat}
Where find_jars will take care of setting up the classpath. I havn't looked into the details of how you would specify a versioned requirement. With trove, you would use trove-3.jar. Furthermore, many jars may already include other jars - when in the system folder, so it actually works well with the debian installed jars - via Class-path attribute in the manifest. Ideally, jars in Debian are packaged with such dependencies. For example fop.jar specifies commons-io.jar xercesImpl.jar xalan2.jar etc. Above example could even be simplified: batik is needed by fop, so we could leave it away.
Debian also ships some projects split into numerous smaller jar files. Batik is a good example. There is batik-all containing all of batik, but there are also smaller jars containing e.g. only the parser. So that a project that tries to reduce memory requirements can also just load that part of the batik into the classpath that is needed.

It's probably not perfect - the debian java team seems to be a bit underpowered, as so often (they for sure currently do not have the power to do Hadoop packages) - but they do seem to work on a manageable java ecosystem. Often such infrastructure things need some users to spread across distributions. I don't know what redhat has for managing java, maybe more, maybe less. The "alternatives" thing is a good example of infrastructure utilities adopted across distributions over time. Quoting from an internet page:

.bq Fedora's implementation of alternatives is a rewrite and extension of the alternatives system used in Debian.

So if java-wrappers are useful for Bigtop, it may be very manageable to have them adopted by the Fedora ecosystem; while bigtop-utils is not yet adopted by either I guess.
                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Assignee: Roman Shaposhnik
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Shaposhnik reassigned BIGTOP-713:
---------------------------------------

    Assignee: Roman Shaposhnik
    
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Assignee: Roman Shaposhnik
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BIGTOP-713) use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging

Posted by "James Page (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462913#comment-13462913 ] 

James Page commented on BIGTOP-713:
-----------------------------------

Also I don't think that the objective of bigtop is or should be to get packages accepted into Debian mainline.  Although as you state some of the dependencies are packaged (albeit at different versions) the amount of effort required to fill the gaps should not be underestimated; hadoop has been in Debian before but the time commitment was to much for the original maintainer and it was dropped.


                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
> If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in there with a "*" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira