You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Andre Kelpe <ak...@concurrentinc.com> on 2015/04/28 17:51:07 UTC

general packaging questions

Hi,

I am currently learning the ins and outs of bigtop to work on the Cascading
integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have a
few questions around packaging in bigtop:

1) most linux distros have packaging guidelines that should be followed.
Does bigtop follow any set of rules in particular? Is there a linting tool
for spec files etc?
2) Related to 1): Does bigtop require to follow a certain directory layout?
Our tools are currently meant to be untarred and used as is, if bigtop
requires them to be split over the file-system, we will have to work on
that upstream before they can be included.
3) I noticed that the packages are build from source instead of re-using
binary releases. Is that a strict requirement or does it just happen to be
that way? For the Cascading integration I was planning on downloading our
binary releases so that bigtop ship with the same bits as our SDK.
4) What is your take on packaging standalone libraries? I noticed that most
parts of bigtop are tools in the broader sense. Something one can invoke on
the command line, but there is also a package for apache crunch, which is a
library. What is the reasoning here? Would it make sense to build packages
for libraries in the Cascading eco-system?

Thanks for your answers!

- André

-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: general packaging questions

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Aha,
Am 30.04.2015 21:48 schrieb "Andrew Purtell" <ap...@apache.org>:

> BOM means bill of materials.
>
> Usually when we use this term we are referring to the top level file '
> bigtop.mk', which defines the component versions to use to assemble a
> given
> Bigtop release.
>
>
Ah, that makes perfect sense. Thanks for the explanation!

- André

Re: general packaging questions

Posted by Andrew Purtell <ap...@apache.org>.

BOM means bill of materials.

Usually when we use this term we are referring to the top level file '
bigtop.mk', which defines the component versions to use to assemble a given
Bigtop release.


On Wed, Apr 29, 2015 at 2:24 AM, Andre Kelpe <ak...@concurrentinc.com>
wrote:

> On Wed, Apr 29, 2015 at 1:01 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Inline
> >
> >
> Thanks for the answers. Some follow-up inline.
>
>
> > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am currently learning the ins and outs of bigtop to work on the
> > Cascading
> > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> have
> > a
> > > few questions around packaging in bigtop:
> > >
> > > 1) most linux distros have packaging guidelines that should be
> followed.
> > > Does bigtop follow any set of rules in particular? Is there a linting
> > tool
> > > for spec files etc?
> > >
> >
> > This is distro specific. RedHat family distributions (RHEL, Fedora,
> Centos,
> > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> From
> > personal experience if you build deb packages on Ubuntu the package build
> > will run the lintian tool automatically.
> >
> >
> I know about the tools, I was wondering if you follow a specific set of
> rules like: https://fedoraproject.org/wiki/Packaging:Guidelines or
> https://en.opensuse.org/openSUSE:Packaging_guidelines
>
> I guess you don't. I'll stick with whatever rpmlint reports then.
>
>
> > > 2) Related to 1): Does bigtop require to follow a certain directory
> > layout?
> > > Our tools are currently meant to be untarred and used as is, if bigtop
> > > requires them to be split over the file-system, we will have to work on
> > > that upstream before they can be included.
> > >
> >
> > Yes, broadly speaking we follow the Linux standard base (LSB). A typical
> > package build happens in four steps. We move files around in the third
> step
> > to make packages look more like LSB. Let me take you through one package
> as
> > an example:
> >
> > Step 1. Download source tarball from the software release site and expand
> > it.
> >
> > Step 2. do-package-build
> >
> > Here, for example, see what we do for ZooKeeper:
> >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > . We kick off a build of the component's binary artifacts while first
> > normalizing dependency versions according to the release BOM.
> >
>
> Pardon my ignorance, but what does "BOM" stand for?
>
>
>
> >
> > Step 3. install_<component>.sh
> >
> > Again let's look at the ZK package:
> >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > . Here we take the resulting tarball from the component build, expand it,
> > and move the locations of various types of files around to be more
> > LSB-like.
> >
> > Step 4. Native packager
> >
> > Finally we hand off the expanded and munged result from step 3 to the
> > native packager. For ZK, the RPM specfile used is here:
> >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > . The Debian package control files are here:
> >
> >
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> >
> >
> >
> Thanks, that all makes sense. I am currently trying to get my feet wet by
> building packages for lingual. I am following roughly what hive does.
>
>
> >
> >
> > > 3) I noticed that the packages are build from source instead of
> re-using
> > > binary releases. Is that a strict requirement or does it just happen to
> > be
> > > that way? For the Cascading integration I was planning on downloading
> our
> > > binary releases so that bigtop ship with the same bits as our SDK.
> > >
> >
> > We typically build packages from source so we can normalize
> dependencies.
> > For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> > broken at worst.
> >
> >
> We are strong believers in BYOH (bring your own hadoop) :-). Joking aside,
> we are distribution agnostic and as long as the distro passes our
> compatibility tests, it will work:
> http://www.cascading.org/support/compatibility/
>
> We set the hadoop dependencies to provided and expect that the environment
> will satisfy them. This should be easy to express via rpm/deb dependencies
> on the packaging level.
>
>
>
> >
> > > 4) What is your take on packaging standalone libraries? I noticed that
> > most
> > > parts of bigtop are tools in the broader sense. Something one can
> invoke
> > on
> > > the command line, but there is also a package for apache crunch, which
> > is a
> > > library. What is the reasoning here? Would it make sense to build
> > packages
> > > for libraries in the Cascading eco-system?
> > >
> > >
> > I'm not sure we have anything that amounts to a policy here. Crunch isn't
> > the only case. We package the DataFu library of UDFs for Pig. We package
> > the Phoenix SQL skin add-on for HBase. We also package Tez, which is a
> YARN
> > application requiring Hadoop, and although it could be useful on its own
> > it's meant to be picked up and used by the Hive and Pig packages.
> >
> > If a champion for a component shows up we will give it a look. We could
> > absolutely build a core Cascading package and then a number of library or
> > add-on packages, if that's how you would like to set things up as
> champion
> > or maintainer of same.
> >
> >
> Thanks for the clarification. I will try to work out a set of packages,
> that make sense from our point of view.
>
> - André
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: general packaging questions

Posted by Andre Kelpe <ak...@concurrentinc.com>.

On Wed, Apr 29, 2015 at 1:01 AM, Andrew Purtell <ap...@apache.org> wrote:

> Inline
>
>
Thanks for the answers. Some follow-up inline.


> On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
> wrote:
>
> > Hi,
> >
> > I am currently learning the ins and outs of bigtop to work on the
> Cascading
> > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have
> a
> > few questions around packaging in bigtop:
> >
> > 1) most linux distros have packaging guidelines that should be followed.
> > Does bigtop follow any set of rules in particular? Is there a linting
> tool
> > for spec files etc?
> >
>
> This is distro specific. RedHat family distributions (RHEL, Fedora, Centos,
> Amazon Linux) offer 'rpmlint'. You can install it and run it by hand. From
> personal experience if you build deb packages on Ubuntu the package build
> will run the lintian tool automatically.
>
>
I know about the tools, I was wondering if you follow a specific set of
rules like: https://fedoraproject.org/wiki/Packaging:Guidelines or
https://en.opensuse.org/openSUSE:Packaging_guidelines

I guess you don't. I'll stick with whatever rpmlint reports then.


> > 2) Related to 1): Does bigtop require to follow a certain directory
> layout?
> > Our tools are currently meant to be untarred and used as is, if bigtop
> > requires them to be split over the file-system, we will have to work on
> > that upstream before they can be included.
> >
>
> Yes, broadly speaking we follow the Linux standard base (LSB). A typical
> package build happens in four steps. We move files around in the third step
> to make packages look more like LSB. Let me take you through one package as
> an example:
>
> Step 1. Download source tarball from the software release site and expand
> it.
>
> Step 2. do-package-build
>
> Here, for example, see what we do for ZooKeeper:
>
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> . We kick off a build of the component's binary artifacts while first
> normalizing dependency versions according to the release BOM.
>

Pardon my ignorance, but what does "BOM" stand for?



>
> Step 3. install_<component>.sh
>
> Again let's look at the ZK package:
>
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> . Here we take the resulting tarball from the component build, expand it,
> and move the locations of various types of files around to be more
> LSB-like.
>
> Step 4. Native packager
>
> Finally we hand off the expanded and munged result from step 3 to the
> native packager. For ZK, the RPM specfile used is here:
>
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> . The Debian package control files are here:
>
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
>
>
>
Thanks, that all makes sense. I am currently trying to get my feet wet by
building packages for lingual. I am following roughly what hive does.


>
>
> > 3) I noticed that the packages are build from source instead of re-using
> > binary releases. Is that a strict requirement or does it just happen to
> be
> > that way? For the Cascading integration I was planning on downloading our
> > binary releases so that bigtop ship with the same bits as our SDK.
> >
>
> We typically build packages from source so we can normalize dependencies.
> For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> broken at worst.
>
>
We are strong believers in BYOH (bring your own hadoop) :-). Joking aside,
we are distribution agnostic and as long as the distro passes our
compatibility tests, it will work:
http://www.cascading.org/support/compatibility/

We set the hadoop dependencies to provided and expect that the environment
will satisfy them. This should be easy to express via rpm/deb dependencies
on the packaging level.



>
> > 4) What is your take on packaging standalone libraries? I noticed that
> most
> > parts of bigtop are tools in the broader sense. Something one can invoke
> on
> > the command line, but there is also a package for apache crunch, which
> is a
> > library. What is the reasoning here? Would it make sense to build
> packages
> > for libraries in the Cascading eco-system?
> >
> >
> I'm not sure we have anything that amounts to a policy here. Crunch isn't
> the only case. We package the DataFu library of UDFs for Pig. We package
> the Phoenix SQL skin add-on for HBase. We also package Tez, which is a YARN
> application requiring Hadoop, and although it could be useful on its own
> it's meant to be picked up and used by the Hive and Pig packages.
>
> If a champion for a component shows up we will give it a look. We could
> absolutely build a core Cascading package and then a number of library or
> add-on packages, if that's how you would like to set things up as champion
> or maintainer of same.
>
>
Thanks for the clarification. I will try to work out a set of packages,
that make sense from our point of view.

- André


-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: general packaging questions

Posted by Andrew Purtell <an...@gmail.com>.

Sure Cos, I will put something up next week. 



> On Apr 30, 2015, at 3:30 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> Andrew, 
> 
> do you might putting this on our wiki? Such a great and well-put explanation!
> I am sure it will help a lot of new contributors to get up to speed much
> quicker!
> 
> Thanks!
>  Cos
> 
>> On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
>> Inline
>> 
>> On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I am currently learning the ins and outs of bigtop to work on the Cascading
>>> integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have a
>>> few questions around packaging in bigtop:
>>> 
>>> 1) most linux distros have packaging guidelines that should be followed.
>>> Does bigtop follow any set of rules in particular? Is there a linting tool
>>> for spec files etc?
>> 
>> This is distro specific. RedHat family distributions (RHEL, Fedora, Centos,
>> Amazon Linux) offer 'rpmlint'. You can install it and run it by hand. From
>> personal experience if you build deb packages on Ubuntu the package build
>> will run the lintian tool automatically.
>> 
>> 
>>> 2) Related to 1): Does bigtop require to follow a certain directory layout?
>>> Our tools are currently meant to be untarred and used as is, if bigtop
>>> requires them to be split over the file-system, we will have to work on
>>> that upstream before they can be included.
>> 
>> Yes, broadly speaking we follow the Linux standard base (LSB). A typical
>> package build happens in four steps. We move files around in the third step
>> to make packages look more like LSB. Let me take you through one package as
>> an example:
>> 
>> Step 1. Download source tarball from the software release site and expand
>> it.
>> 
>> Step 2. do-package-build
>> 
>> Here, for example, see what we do for ZooKeeper:
>> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
>> . We kick off a build of the component's binary artifacts while first
>> normalizing dependency versions according to the release BOM.
>> 
>> Step 3. install_<component>.sh
>> 
>> Again let's look at the ZK package:
>> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
>> . Here we take the resulting tarball from the component build, expand it,
>> and move the locations of various types of files around to be more
>> LSB-like.
>> 
>> Step 4. Native packager
>> 
>> Finally we hand off the expanded and munged result from step 3 to the
>> native packager. For ZK, the RPM specfile used is here:
>> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
>> . The Debian package control files are here:
>> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
>> 
>> 
>> 
>> 
>>> 3) I noticed that the packages are build from source instead of re-using
>>> binary releases. Is that a strict requirement or does it just happen to be
>>> that way? For the Cascading integration I was planning on downloading our
>>> binary releases so that bigtop ship with the same bits as our SDK.
>> 
>> We typically build packages from source so we can normalize dependencies.
>> For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
>> Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
>> broken at worst.
>> 
>> 
>>> 4) What is your take on packaging standalone libraries? I noticed that most
>>> parts of bigtop are tools in the broader sense. Something one can invoke on
>>> the command line, but there is also a package for apache crunch, which is a
>>> library. What is the reasoning here? Would it make sense to build packages
>>> for libraries in the Cascading eco-system?
>> I'm not sure we have anything that amounts to a policy here. Crunch isn't
>> the only case. We package the DataFu library of UDFs for Pig. We package
>> the Phoenix SQL skin add-on for HBase. We also package Tez, which is a YARN
>> application requiring Hadoop, and although it could be useful on its own
>> it's meant to be picked up and used by the Hive and Pig packages.
>> 
>> If a champion for a component shows up we will give it a look. We could
>> absolutely build a core Cascading package and then a number of library or
>> add-on packages, if that's how you would like to set things up as champion
>> or maintainer of same.
>> 
>> 
>> 
>>> Thanks for your answers!
>>> 
>>> - André
>>> 
>>> --
>>> André Kelpe
>>> andre@concurrentinc.com
>>> http://concurrentinc.com
>> 
>> 
>> 
>> -- 
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)

Re: general packaging questions

Posted by Konstantin Boudnik <co...@apache.org>.

Jay, could you post this link to the wiki as well? Thanks!

On Fri, May 01, 2015 at 09:41AM, jay vyas wrote:
> By the way, I did a hack session w/ the guys at BU on how to build/test
> rpms using our existing vagrant workflow
> 
> https://www.youtube.com/watch?v=4GfcKEjO6e8
> 
> You can get a good feel for some of the lower level details hard to capture
> in text by watching it.
> 
> On Thu, Apr 30, 2015 at 6:30 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > Andrew,
> >
> > do you might putting this on our wiki? Such a great and well-put
> > explanation!
> > I am sure it will help a lot of new contributors to get up to speed much
> > quicker!
> >
> > Thanks!
> >   Cos
> >
> > On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> > > Inline
> > >
> > > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am currently learning the ins and outs of bigtop to work on the
> > Cascading
> > > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> > have a
> > > > few questions around packaging in bigtop:
> > > >
> > > > 1) most linux distros have packaging guidelines that should be
> > followed.
> > > > Does bigtop follow any set of rules in particular? Is there a linting
> > tool
> > > > for spec files etc?
> > > >
> > >
> > > This is distro specific. RedHat family distributions (RHEL, Fedora,
> > Centos,
> > > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> > From
> > > personal experience if you build deb packages on Ubuntu the package build
> > > will run the lintian tool automatically.
> > >
> > >
> > > > 2) Related to 1): Does bigtop require to follow a certain directory
> > layout?
> > > > Our tools are currently meant to be untarred and used as is, if bigtop
> > > > requires them to be split over the file-system, we will have to work on
> > > > that upstream before they can be included.
> > > >
> > >
> > > Yes, broadly speaking we follow the Linux standard base (LSB). A typical
> > > package build happens in four steps. We move files around in the third
> > step
> > > to make packages look more like LSB. Let me take you through one package
> > as
> > > an example:
> > >
> > > Step 1. Download source tarball from the software release site and expand
> > > it.
> > >
> > > Step 2. do-package-build
> > >
> > > Here, for example, see what we do for ZooKeeper:
> > >
> > https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > > . We kick off a build of the component's binary artifacts while first
> > > normalizing dependency versions according to the release BOM.
> > >
> > > Step 3. install_<component>.sh
> > >
> > > Again let's look at the ZK package:
> > >
> > https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > > . Here we take the resulting tarball from the component build, expand it,
> > > and move the locations of various types of files around to be more
> > > LSB-like.
> > >
> > > Step 4. Native packager
> > >
> > > Finally we hand off the expanded and munged result from step 3 to the
> > > native packager. For ZK, the RPM specfile used is here:
> > >
> > https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > > . The Debian package control files are here:
> > >
> > https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> > >
> > >
> > >
> > >
> > > > 3) I noticed that the packages are build from source instead of
> > re-using
> > > > binary releases. Is that a strict requirement or does it just happen
> > to be
> > > > that way? For the Cascading integration I was planning on downloading
> > our
> > > > binary releases so that bigtop ship with the same bits as our SDK.
> > > >
> > >
> > > We typically build packages from source so we can normalize
> > dependencies.
> > > For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> > > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> > > broken at worst.
> > >
> > >
> > > > 4) What is your take on packaging standalone libraries? I noticed that
> > most
> > > > parts of bigtop are tools in the broader sense. Something one can
> > invoke on
> > > > the command line, but there is also a package for apache crunch, which
> > is a
> > > > library. What is the reasoning here? Would it make sense to build
> > packages
> > > > for libraries in the Cascading eco-system?
> > > >
> > > >
> > > I'm not sure we have anything that amounts to a policy here. Crunch isn't
> > > the only case. We package the DataFu library of UDFs for Pig. We package
> > > the Phoenix SQL skin add-on for HBase. We also package Tez, which is a
> > YARN
> > > application requiring Hadoop, and although it could be useful on its own
> > > it's meant to be picked up and used by the Hive and Pig packages.
> > >
> > > If a champion for a component shows up we will give it a look. We could
> > > absolutely build a core Cascading package and then a number of library or
> > > add-on packages, if that's how you would like to set things up as
> > champion
> > > or maintainer of same.
> > >
> > >
> > >
> > > > Thanks for your answers!
> > > >
> > > > - André
> > > >
> > > > --
> > > > André Kelpe
> > > > andre@concurrentinc.com
> > > > http://concurrentinc.com
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > > (via Tom White)
> >
> 
> 
> 
> -- 
> jay vyas

Re: general packaging questions

Posted by Evans Ye <ev...@apache.org>.

Yup, I saw that before. (Nice looking BTW)
It's an awesome video for someone who'd like to know the detail about
bigtop provisioner(the first half) as well as the packaging stuff(the
second half).

How about we just put the video on bigtop wiki so that we can enrich the
content?

2015-05-01 21:41 GMT+08:00 jay vyas <ja...@gmail.com>:

> By the way, I did a hack session w/ the guys at BU on how to build/test
> rpms using our existing vagrant workflow
>
> https://www.youtube.com/watch?v=4GfcKEjO6e8
>
> You can get a good feel for some of the lower level details hard to capture
> in text by watching it.
>
> On Thu, Apr 30, 2015 at 6:30 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > Andrew,
> >
> > do you might putting this on our wiki? Such a great and well-put
> > explanation!
> > I am sure it will help a lot of new contributors to get up to speed much
> > quicker!
> >
> > Thanks!
> >   Cos
> >
> > On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> > > Inline
> > >
> > > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <akelpe@concurrentinc.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am currently learning the ins and outs of bigtop to work on the
> > Cascading
> > > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> > have a
> > > > few questions around packaging in bigtop:
> > > >
> > > > 1) most linux distros have packaging guidelines that should be
> > followed.
> > > > Does bigtop follow any set of rules in particular? Is there a linting
> > tool
> > > > for spec files etc?
> > > >
> > >
> > > This is distro specific. RedHat family distributions (RHEL, Fedora,
> > Centos,
> > > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> > From
> > > personal experience if you build deb packages on Ubuntu the package
> build
> > > will run the lintian tool automatically.
> > >
> > >
> > > > 2) Related to 1): Does bigtop require to follow a certain directory
> > layout?
> > > > Our tools are currently meant to be untarred and used as is, if
> bigtop
> > > > requires them to be split over the file-system, we will have to work
> on
> > > > that upstream before they can be included.
> > > >
> > >
> > > Yes, broadly speaking we follow the Linux standard base (LSB). A
> typical
> > > package build happens in four steps. We move files around in the third
> > step
> > > to make packages look more like LSB. Let me take you through one
> package
> > as
> > > an example:
> > >
> > > Step 1. Download source tarball from the software release site and
> expand
> > > it.
> > >
> > > Step 2. do-package-build
> > >
> > > Here, for example, see what we do for ZooKeeper:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > > . We kick off a build of the component's binary artifacts while first
> > > normalizing dependency versions according to the release BOM.
> > >
> > > Step 3. install_<component>.sh
> > >
> > > Again let's look at the ZK package:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > > . Here we take the resulting tarball from the component build, expand
> it,
> > > and move the locations of various types of files around to be more
> > > LSB-like.
> > >
> > > Step 4. Native packager
> > >
> > > Finally we hand off the expanded and munged result from step 3 to the
> > > native packager. For ZK, the RPM specfile used is here:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > > . The Debian package control files are here:
> > >
> >
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> > >
> > >
> > >
> > >
> > > > 3) I noticed that the packages are build from source instead of
> > re-using
> > > > binary releases. Is that a strict requirement or does it just happen
> > to be
> > > > that way? For the Cascading integration I was planning on downloading
> > our
> > > > binary releases so that bigtop ship with the same bits as our SDK.
> > > >
> > >
> > > We typically build packages from source so we can normalize
> > dependencies.
> > > For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> > > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> > > broken at worst.
> > >
> > >
> > > > 4) What is your take on packaging standalone libraries? I noticed
> that
> > most
> > > > parts of bigtop are tools in the broader sense. Something one can
> > invoke on
> > > > the command line, but there is also a package for apache crunch,
> which
> > is a
> > > > library. What is the reasoning here? Would it make sense to build
> > packages
> > > > for libraries in the Cascading eco-system?
> > > >
> > > >
> > > I'm not sure we have anything that amounts to a policy here. Crunch
> isn't
> > > the only case. We package the DataFu library of UDFs for Pig. We
> package
> > > the Phoenix SQL skin add-on for HBase. We also package Tez, which is a
> > YARN
> > > application requiring Hadoop, and although it could be useful on its
> own
> > > it's meant to be picked up and used by the Hive and Pig packages.
> > >
> > > If a champion for a component shows up we will give it a look. We could
> > > absolutely build a core Cascading package and then a number of library
> or
> > > add-on packages, if that's how you would like to set things up as
> > champion
> > > or maintainer of same.
> > >
> > >
> > >
> > > > Thanks for your answers!
> > > >
> > > > - André
> > > >
> > > > --
> > > > André Kelpe
> > > > andre@concurrentinc.com
> > > > http://concurrentinc.com
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> >
>
>
>
> --
> jay vyas
>

Re: general packaging questions

Posted by jay vyas <ja...@gmail.com>.

Sure we can add it to read me or wiki.
On May 2, 2015 3:56 PM, "Andre Kelpe" <ak...@concurrentinc.com> wrote:

> Thanks for the video! This should be in the README or wiki or on the
> website somewhere.
>
> - André
>
> On Fri, May 1, 2015 at 3:41 PM, jay vyas <ja...@gmail.com>
> wrote:
>
> > By the way, I did a hack session w/ the guys at BU on how to build/test
> > rpms using our existing vagrant workflow
> >
> > https://www.youtube.com/watch?v=4GfcKEjO6e8
> >
> > You can get a good feel for some of the lower level details hard to
> capture
> > in text by watching it.
> >
> > On Thu, Apr 30, 2015 at 6:30 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> >
> > > Andrew,
> > >
> > > do you might putting this on our wiki? Such a great and well-put
> > > explanation!
> > > I am sure it will help a lot of new contributors to get up to speed
> much
> > > quicker!
> > >
> > > Thanks!
> > >   Cos
> > >
> > > On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> > > > Inline
> > > >
> > > > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <
> akelpe@concurrentinc.com
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am currently learning the ins and outs of bigtop to work on the
> > > Cascading
> > > > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> > > have a
> > > > > few questions around packaging in bigtop:
> > > > >
> > > > > 1) most linux distros have packaging guidelines that should be
> > > followed.
> > > > > Does bigtop follow any set of rules in particular? Is there a
> linting
> > > tool
> > > > > for spec files etc?
> > > > >
> > > >
> > > > This is distro specific. RedHat family distributions (RHEL, Fedora,
> > > Centos,
> > > > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> > > From
> > > > personal experience if you build deb packages on Ubuntu the package
> > build
> > > > will run the lintian tool automatically.
> > > >
> > > >
> > > > > 2) Related to 1): Does bigtop require to follow a certain directory
> > > layout?
> > > > > Our tools are currently meant to be untarred and used as is, if
> > bigtop
> > > > > requires them to be split over the file-system, we will have to
> work
> > on
> > > > > that upstream before they can be included.
> > > > >
> > > >
> > > > Yes, broadly speaking we follow the Linux standard base (LSB). A
> > typical
> > > > package build happens in four steps. We move files around in the
> third
> > > step
> > > > to make packages look more like LSB. Let me take you through one
> > package
> > > as
> > > > an example:
> > > >
> > > > Step 1. Download source tarball from the software release site and
> > expand
> > > > it.
> > > >
> > > > Step 2. do-package-build
> > > >
> > > > Here, for example, see what we do for ZooKeeper:
> > > >
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > > > . We kick off a build of the component's binary artifacts while first
> > > > normalizing dependency versions according to the release BOM.
> > > >
> > > > Step 3. install_<component>.sh
> > > >
> > > > Again let's look at the ZK package:
> > > >
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > > > . Here we take the resulting tarball from the component build, expand
> > it,
> > > > and move the locations of various types of files around to be more
> > > > LSB-like.
> > > >
> > > > Step 4. Native packager
> > > >
> > > > Finally we hand off the expanded and munged result from step 3 to the
> > > > native packager. For ZK, the RPM specfile used is here:
> > > >
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > > > . The Debian package control files are here:
> > > >
> > >
> >
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> > > >
> > > >
> > > >
> > > >
> > > > > 3) I noticed that the packages are build from source instead of
> > > re-using
> > > > > binary releases. Is that a strict requirement or does it just
> happen
> > > to be
> > > > > that way? For the Cascading integration I was planning on
> downloading
> > > our
> > > > > binary releases so that bigtop ship with the same bits as our SDK.
> > > > >
> > > >
> > > > We typically build packages from source so we can normalize
> > > dependencies.
> > > > For example, if a given Bigtop release ships with Hadoop 2.6.0 but
> the
> > > > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best
> and
> > > > broken at worst.
> > > >
> > > >
> > > > > 4) What is your take on packaging standalone libraries? I noticed
> > that
> > > most
> > > > > parts of bigtop are tools in the broader sense. Something one can
> > > invoke on
> > > > > the command line, but there is also a package for apache crunch,
> > which
> > > is a
> > > > > library. What is the reasoning here? Would it make sense to build
> > > packages
> > > > > for libraries in the Cascading eco-system?
> > > > >
> > > > >
> > > > I'm not sure we have anything that amounts to a policy here. Crunch
> > isn't
> > > > the only case. We package the DataFu library of UDFs for Pig. We
> > package
> > > > the Phoenix SQL skin add-on for HBase. We also package Tez, which is
> a
> > > YARN
> > > > application requiring Hadoop, and although it could be useful on its
> > own
> > > > it's meant to be picked up and used by the Hive and Pig packages.
> > > >
> > > > If a champion for a component shows up we will give it a look. We
> could
> > > > absolutely build a core Cascading package and then a number of
> library
> > or
> > > > add-on packages, if that's how you would like to set things up as
> > > champion
> > > > or maintainer of same.
> > > >
> > > >
> > > >
> > > > > Thanks for your answers!
> > > > >
> > > > > - André
> > > > >
> > > > > --
> > > > > André Kelpe
> > > > > andre@concurrentinc.com
> > > > > http://concurrentinc.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > >
> >
> >
> >
> > --
> > jay vyas
> >
>
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Re: general packaging questions

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Thanks for the video! This should be in the README or wiki or on the
website somewhere.

- André

On Fri, May 1, 2015 at 3:41 PM, jay vyas <ja...@gmail.com>
wrote:

> By the way, I did a hack session w/ the guys at BU on how to build/test
> rpms using our existing vagrant workflow
>
> https://www.youtube.com/watch?v=4GfcKEjO6e8
>
> You can get a good feel for some of the lower level details hard to capture
> in text by watching it.
>
> On Thu, Apr 30, 2015 at 6:30 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > Andrew,
> >
> > do you might putting this on our wiki? Such a great and well-put
> > explanation!
> > I am sure it will help a lot of new contributors to get up to speed much
> > quicker!
> >
> > Thanks!
> >   Cos
> >
> > On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> > > Inline
> > >
> > > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <akelpe@concurrentinc.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am currently learning the ins and outs of bigtop to work on the
> > Cascading
> > > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> > have a
> > > > few questions around packaging in bigtop:
> > > >
> > > > 1) most linux distros have packaging guidelines that should be
> > followed.
> > > > Does bigtop follow any set of rules in particular? Is there a linting
> > tool
> > > > for spec files etc?
> > > >
> > >
> > > This is distro specific. RedHat family distributions (RHEL, Fedora,
> > Centos,
> > > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> > From
> > > personal experience if you build deb packages on Ubuntu the package
> build
> > > will run the lintian tool automatically.
> > >
> > >
> > > > 2) Related to 1): Does bigtop require to follow a certain directory
> > layout?
> > > > Our tools are currently meant to be untarred and used as is, if
> bigtop
> > > > requires them to be split over the file-system, we will have to work
> on
> > > > that upstream before they can be included.
> > > >
> > >
> > > Yes, broadly speaking we follow the Linux standard base (LSB). A
> typical
> > > package build happens in four steps. We move files around in the third
> > step
> > > to make packages look more like LSB. Let me take you through one
> package
> > as
> > > an example:
> > >
> > > Step 1. Download source tarball from the software release site and
> expand
> > > it.
> > >
> > > Step 2. do-package-build
> > >
> > > Here, for example, see what we do for ZooKeeper:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > > . We kick off a build of the component's binary artifacts while first
> > > normalizing dependency versions according to the release BOM.
> > >
> > > Step 3. install_<component>.sh
> > >
> > > Again let's look at the ZK package:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > > . Here we take the resulting tarball from the component build, expand
> it,
> > > and move the locations of various types of files around to be more
> > > LSB-like.
> > >
> > > Step 4. Native packager
> > >
> > > Finally we hand off the expanded and munged result from step 3 to the
> > > native packager. For ZK, the RPM specfile used is here:
> > >
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > > . The Debian package control files are here:
> > >
> >
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> > >
> > >
> > >
> > >
> > > > 3) I noticed that the packages are build from source instead of
> > re-using
> > > > binary releases. Is that a strict requirement or does it just happen
> > to be
> > > > that way? For the Cascading integration I was planning on downloading
> > our
> > > > binary releases so that bigtop ship with the same bits as our SDK.
> > > >
> > >
> > > We typically build packages from source so we can normalize
> > dependencies.
> > > For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> > > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> > > broken at worst.
> > >
> > >
> > > > 4) What is your take on packaging standalone libraries? I noticed
> that
> > most
> > > > parts of bigtop are tools in the broader sense. Something one can
> > invoke on
> > > > the command line, but there is also a package for apache crunch,
> which
> > is a
> > > > library. What is the reasoning here? Would it make sense to build
> > packages
> > > > for libraries in the Cascading eco-system?
> > > >
> > > >
> > > I'm not sure we have anything that amounts to a policy here. Crunch
> isn't
> > > the only case. We package the DataFu library of UDFs for Pig. We
> package
> > > the Phoenix SQL skin add-on for HBase. We also package Tez, which is a
> > YARN
> > > application requiring Hadoop, and although it could be useful on its
> own
> > > it's meant to be picked up and used by the Hive and Pig packages.
> > >
> > > If a champion for a component shows up we will give it a look. We could
> > > absolutely build a core Cascading package and then a number of library
> or
> > > add-on packages, if that's how you would like to set things up as
> > champion
> > > or maintainer of same.
> > >
> > >
> > >
> > > > Thanks for your answers!
> > > >
> > > > - André
> > > >
> > > > --
> > > > André Kelpe
> > > > andre@concurrentinc.com
> > > > http://concurrentinc.com
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> >
>
>
>
> --
> jay vyas
>



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: general packaging questions

Posted by jay vyas <ja...@gmail.com>.

By the way, I did a hack session w/ the guys at BU on how to build/test
rpms using our existing vagrant workflow

https://www.youtube.com/watch?v=4GfcKEjO6e8

You can get a good feel for some of the lower level details hard to capture
in text by watching it.

On Thu, Apr 30, 2015 at 6:30 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Andrew,
>
> do you might putting this on our wiki? Such a great and well-put
> explanation!
> I am sure it will help a lot of new contributors to get up to speed much
> quicker!
>
> Thanks!
>   Cos
>
> On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> > Inline
> >
> > On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am currently learning the ins and outs of bigtop to work on the
> Cascading
> > > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I
> have a
> > > few questions around packaging in bigtop:
> > >
> > > 1) most linux distros have packaging guidelines that should be
> followed.
> > > Does bigtop follow any set of rules in particular? Is there a linting
> tool
> > > for spec files etc?
> > >
> >
> > This is distro specific. RedHat family distributions (RHEL, Fedora,
> Centos,
> > Amazon Linux) offer 'rpmlint'. You can install it and run it by hand.
> From
> > personal experience if you build deb packages on Ubuntu the package build
> > will run the lintian tool automatically.
> >
> >
> > > 2) Related to 1): Does bigtop require to follow a certain directory
> layout?
> > > Our tools are currently meant to be untarred and used as is, if bigtop
> > > requires them to be split over the file-system, we will have to work on
> > > that upstream before they can be included.
> > >
> >
> > Yes, broadly speaking we follow the Linux standard base (LSB). A typical
> > package build happens in four steps. We move files around in the third
> step
> > to make packages look more like LSB. Let me take you through one package
> as
> > an example:
> >
> > Step 1. Download source tarball from the software release site and expand
> > it.
> >
> > Step 2. do-package-build
> >
> > Here, for example, see what we do for ZooKeeper:
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> > . We kick off a build of the component's binary artifacts while first
> > normalizing dependency versions according to the release BOM.
> >
> > Step 3. install_<component>.sh
> >
> > Again let's look at the ZK package:
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> > . Here we take the resulting tarball from the component build, expand it,
> > and move the locations of various types of files around to be more
> > LSB-like.
> >
> > Step 4. Native packager
> >
> > Finally we hand off the expanded and munged result from step 3 to the
> > native packager. For ZK, the RPM specfile used is here:
> >
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> > . The Debian package control files are here:
> >
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> >
> >
> >
> >
> > > 3) I noticed that the packages are build from source instead of
> re-using
> > > binary releases. Is that a strict requirement or does it just happen
> to be
> > > that way? For the Cascading integration I was planning on downloading
> our
> > > binary releases so that bigtop ship with the same bits as our SDK.
> > >
> >
> > We typically build packages from source so we can normalize
> dependencies.
> > For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> > Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> > broken at worst.
> >
> >
> > > 4) What is your take on packaging standalone libraries? I noticed that
> most
> > > parts of bigtop are tools in the broader sense. Something one can
> invoke on
> > > the command line, but there is also a package for apache crunch, which
> is a
> > > library. What is the reasoning here? Would it make sense to build
> packages
> > > for libraries in the Cascading eco-system?
> > >
> > >
> > I'm not sure we have anything that amounts to a policy here. Crunch isn't
> > the only case. We package the DataFu library of UDFs for Pig. We package
> > the Phoenix SQL skin add-on for HBase. We also package Tez, which is a
> YARN
> > application requiring Hadoop, and although it could be useful on its own
> > it's meant to be picked up and used by the Hive and Pig packages.
> >
> > If a champion for a component shows up we will give it a look. We could
> > absolutely build a core Cascading package and then a number of library or
> > add-on packages, if that's how you would like to set things up as
> champion
> > or maintainer of same.
> >
> >
> >
> > > Thanks for your answers!
> > >
> > > - André
> > >
> > > --
> > > André Kelpe
> > > andre@concurrentinc.com
> > > http://concurrentinc.com
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
jay vyas

Re: general packaging questions

Posted by Konstantin Boudnik <co...@apache.org>.

Andrew, 

do you might putting this on our wiki? Such a great and well-put explanation!
I am sure it will help a lot of new contributors to get up to speed much
quicker!

Thanks!
  Cos

On Tue, Apr 28, 2015 at 04:01PM, Andrew Purtell wrote:
> Inline
> 
> On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
> wrote:
> 
> > Hi,
> >
> > I am currently learning the ins and outs of bigtop to work on the Cascading
> > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have a
> > few questions around packaging in bigtop:
> >
> > 1) most linux distros have packaging guidelines that should be followed.
> > Does bigtop follow any set of rules in particular? Is there a linting tool
> > for spec files etc?
> >
> 
> This is distro specific. RedHat family distributions (RHEL, Fedora, Centos,
> Amazon Linux) offer 'rpmlint'. You can install it and run it by hand. From
> personal experience if you build deb packages on Ubuntu the package build
> will run the lintian tool automatically.
> 
> 
> > 2) Related to 1): Does bigtop require to follow a certain directory layout?
> > Our tools are currently meant to be untarred and used as is, if bigtop
> > requires them to be split over the file-system, we will have to work on
> > that upstream before they can be included.
> >
> 
> Yes, broadly speaking we follow the Linux standard base (LSB). A typical
> package build happens in four steps. We move files around in the third step
> to make packages look more like LSB. Let me take you through one package as
> an example:
> 
> Step 1. Download source tarball from the software release site and expand
> it.
> 
> Step 2. do-package-build
> 
> Here, for example, see what we do for ZooKeeper:
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
> . We kick off a build of the component's binary artifacts while first
> normalizing dependency versions according to the release BOM.
> 
> Step 3. install_<component>.sh
> 
> Again let's look at the ZK package:
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
> . Here we take the resulting tarball from the component build, expand it,
> and move the locations of various types of files around to be more
> LSB-like.
> 
> Step 4. Native packager
> 
> Finally we hand off the expanded and munged result from step 3 to the
> native packager. For ZK, the RPM specfile used is here:
> https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
> . The Debian package control files are here:
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper
> 
> 
> 
> 
> > 3) I noticed that the packages are build from source instead of re-using
> > binary releases. Is that a strict requirement or does it just happen to be
> > that way? For the Cascading integration I was planning on downloading our
> > binary releases so that bigtop ship with the same bits as our SDK.
> >
> 
> We typically build packages from source so we can normalize dependencies.
> For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
> Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
> broken at worst.
> 
> 
> > 4) What is your take on packaging standalone libraries? I noticed that most
> > parts of bigtop are tools in the broader sense. Something one can invoke on
> > the command line, but there is also a package for apache crunch, which is a
> > library. What is the reasoning here? Would it make sense to build packages
> > for libraries in the Cascading eco-system?
> >
> >
> I'm not sure we have anything that amounts to a policy here. Crunch isn't
> the only case. We package the DataFu library of UDFs for Pig. We package
> the Phoenix SQL skin add-on for HBase. We also package Tez, which is a YARN
> application requiring Hadoop, and although it could be useful on its own
> it's meant to be picked up and used by the Hive and Pig packages.
> 
> If a champion for a component shows up we will give it a look. We could
> absolutely build a core Cascading package and then a number of library or
> add-on packages, if that's how you would like to set things up as champion
> or maintainer of same.
> 
> 
> 
> > Thanks for your answers!
> >
> > - André
> >
> > --
> > André Kelpe
> > andre@concurrentinc.com
> > http://concurrentinc.com
> >
> 
> 
> 
> -- 
> Best regards,
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: general packaging questions

Posted by Andrew Purtell <ap...@apache.org>.

Inline

On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <ak...@concurrentinc.com>
wrote:

> Hi,
>
> I am currently learning the ins and outs of bigtop to work on the Cascading
> integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have a
> few questions around packaging in bigtop:
>
> 1) most linux distros have packaging guidelines that should be followed.
> Does bigtop follow any set of rules in particular? Is there a linting tool
> for spec files etc?
>

This is distro specific. RedHat family distributions (RHEL, Fedora, Centos,
Amazon Linux) offer 'rpmlint'. You can install it and run it by hand. From
personal experience if you build deb packages on Ubuntu the package build
will run the lintian tool automatically.

> 2) Related to 1): Does bigtop require to follow a certain directory layout?
> Our tools are currently meant to be untarred and used as is, if bigtop
> requires them to be split over the file-system, we will have to work on
> that upstream before they can be included.
>

Yes, broadly speaking we follow the Linux standard base (LSB). A typical
package build happens in four steps. We move files around in the third step
to make packages look more like LSB. Let me take you through one package as
an example:

Step 1. Download source tarball from the software release site and expand
it.

Step 2. do-package-build

Here, for example, see what we do for ZooKeeper:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build
. We kick off a build of the component's binary artifacts while first
normalizing dependency versions according to the release BOM.

Step 3. install_<component>.sh

Again let's look at the ZK package:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh
. Here we take the resulting tarball from the component build, expand it,
and move the locations of various types of files around to be more
LSB-like.

Step 4. Native packager

Finally we hand off the expanded and munged result from step 3 to the
native packager. For ZK, the RPM specfile used is here:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec
. The Debian package control files are here:
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper

> 3) I noticed that the packages are build from source instead of re-using
> binary releases. Is that a strict requirement or does it just happen to be
> that way? For the Cascading integration I was planning on downloading our
> binary releases so that bigtop ship with the same bits as our SDK.
>

We typically build packages from source so we can normalize dependencies.
For example, if a given Bigtop release ships with Hadoop 2.6.0 but the
Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and
broken at worst.

> 4) What is your take on packaging standalone libraries? I noticed that most
> parts of bigtop are tools in the broader sense. Something one can invoke on
> the command line, but there is also a package for apache crunch, which is a
> library. What is the reasoning here? Would it make sense to build packages
> for libraries in the Cascading eco-system?
>
>
I'm not sure we have anything that amounts to a policy here. Crunch isn't
the only case. We package the DataFu library of UDFs for Pig. We package
the Phoenix SQL skin add-on for HBase. We also package Tez, which is a YARN
application requiring Hadoop, and although it could be useful on its own
it's meant to be picked up and used by the Hive and Pig packages.

If a champion for a component shows up we will give it a look. We could
absolutely build a core Cascading package and then a number of library or
add-on packages, if that's how you would like to set things up as champion
or maintainer of same.

> Thanks for your answers!
>
> - André
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)