You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Roman Shaposhnik <rv...@apache.org> on 2011/08/13 02:51:49 UTC

Projects that bundle packaging infrastructure

Guys,

I've noticed that a few Apache projects that Bigtop is integrating seem to
have packaging infrastructure bundled in the trunk (Hadoop proper and
Pig come to mind right away, but there could be others, I guess).

So far, Bigtop has been solving a problem of providing a point of
integration and packaging for projects that didn't have any of that.
Now that some started to solve that same problem in an incompatible
way, what policy would make the most sense for us moving forward?

Thoughts?

Thanks,
Roman.

Re: Projects that bundle packaging infrastructure

Posted by Roman Shaposhnik <rv...@apache.org>.
On Mon, Aug 15, 2011 at 9:54 AM, Steve Loughran <st...@apache.org> wrote:
> I actually think hadoop-core putting their packaging into core is the wrong
> place; there end up being a lot of cyclic links against the mapreduce and
> hdfs parts, though that's being corrected through SVN changes.

<developer-hat-on>
I believe that part of the appeal of having it all in the same tree is
to be able
to say ant package and have a package right there that you can install
locally and test/debug. Perhaps (and I can only guess, since I'm not a
dedicated hadoop hacker) the same goal can be achieved out of tree
if we have enough hooks in Bigtop.
</developer-hat-on>

Thanks,
Roman.

Re: Projects that bundle packaging infrastructure

Posted by Steve Loughran <st...@apache.org>.
I actually think hadoop-core putting their packaging into core is the 
wrong place; there end up being a lot of cyclic links against the 
mapreduce and hdfs parts, though that's being corrected through SVN 
changes.

Ideally the package releases should be downstream of the source 
projects, so anyone can build a complete stack as {rpm,deb} from the 
versions -though that requires the complete tar being published locally 
to the m2/ivy repository (which is how we do it)

The functional testing can be isolated from the package/deploy so that 
you can run things like terasort against a full deployment.

Re: Projects that bundle packaging infrastructure

Posted by Andre Arcilla <ar...@apache.org>.
To add further:

Hadoop-based ecosystem deployments consist of many pieces working
together. There is Hadoop proper, and then there is Pig, Oozie, HBase,
proxies, proprietary components, etc. Even if packaging for a
particular component is pitch perfect (and none is, currently),
components still must work together. That is something individual
projects occasionally forget.

> We need also to be able to pick and choose versions of each project to ensure they work well with each other.
> Bigtop is about the whole thing and not each individual part.

And that is what large deployments are about. Nobody deploys all the
latest components at the same time. It is always pick'n'choose which
new versions go with which legacy ones. And the result is only valid
is the whole deployment works together. Therefore the feature of
Bigtop that allows unified selection and handling of different
components under one umbrella is very valuable.

Another potential benefit of Bigtop dealing with packaging is its
being a single control point of Hadoop stack build and deployment.
Some environments can vary substantially from a vanilla Ubuntu/CentOS.
If one needs to build or deploy Hadoop stack in a modified
environment, it is much easier to address it in Bigtop, rather than
hacking packaging for every individual project. Bigtop does not have
to support every variation of Linux under the sun, of course, but it
can provide flexibility to do so, if desired.

On Sat, Aug 13, 2011 at 3:39 PM, Bruno Mahé <bm...@apache.org> wrote:
> On 08/12/2011 05:51 PM, Roman Shaposhnik wrote:
>> Guys,
>>
>> I've noticed that a few Apache projects that Bigtop is integrating seem to
>> have packaging infrastructure bundled in the trunk (Hadoop proper and
>> Pig come to mind right away, but there could be others, I guess).
>>
>> So far, Bigtop has been solving a problem of providing a point of
>> integration and packaging for projects that didn't have any of that.
>> Now that some started to solve that same problem in an incompatible
>> way, what policy would make the most sense for us moving forward?
>>
>> Thoughts?
>>
>> Thanks,
>> Roman.
> ok, I bite:
>
> I think it is too early to think about that and we should wait for these
> packaging efforts to be more battle tested. Until then, we should focus
> on what is already in BigTop.
>
> From a technical point of view, last time I checked, Hadoop and Pig
> packaging efforts only concern RPMs for RHEL/CentOS-like platform (I
> remember seeing a few things that wouldn't work on something else than
> RHEL/CentOS-like, maybe it has changed since).  Whereas BigTop comes
> from CDH and even though it comes with some history and cruft, it works
> and has been deployed in a wide variety of production clusters across
> broad environment and GNU/Linux distributions. So I would rather work on
> cleaning up, improving or bringing new features to BigTop instead of
> helping other packaging efforts catching up on what BigTop has been
> doing since day one.
>
> Then there is the integration part. Each project has a different point
> of view on how to organize their layout, what OS to support and do
> things and we need some room to adjust for that. I would even argue we
> should allow ourselves to patch build and security issues, but this is
> out of scope of this discussion.
> We need also to be able to pick and choose versions of each project to
> ensure they work well with each other. Bigtop is about the whole thing
> and not each individual part. Besides nothing prevent having different
> packaging efforts with different objectives and focus.
> But we should also be able to quickly release fixes for packaging. For
> example, I was told that not being able to build on Fedora and openSUSE
> is not a blocker to Hadoop 0.20.204. If we were depending on its
> packaging, this would be disastrous from a packaging point of view (but
> it's fine in a Hadoop context given no one has really complained about
> it). So having our packaging work outside of the project allow each
> project to focus on their priorities while we can focus on ours without
> having to convince each time, each project (*).
>
> But this is a good thing to have these projects starting to think about
> packaging issues and to make them aware of it. This will improve their
> quality and make our lives easier as well. And nothing prevent us from
> collaborating, sending patches or reusing good ideas/code.
>
>
> (*) I am also lobbying for building and testing our packaging against
> the trunk of each project so we can pro-actively ensure releases have a
> good enough quality for us, rather than waiting for a release to go out,
> file bug/patches and wait for the next release to be able to use it (if
> no other issues have been introduced). But all of this is pointless
> until we get some build going on.
>
>

Re: Projects that bundle packaging infrastructure

Posted by Roman Shaposhnik <rv...@apache.org>.
I'll provide some of my comments inline, but I really think we should
discuss it more at the upcoming meeting.

On Sat, Aug 13, 2011 at 3:39 PM, Bruno Mahé <bm...@apache.org> wrote:
> I think it is too early to think about that and we should wait for these
> packaging efforts to be more battle tested. Until then, we should focus
> on what is already in BigTop.

Agreed. It is definitely NOT something we need to address right away
if for no other reason but simply because I don't think any of the
projects I was referring to has had a formal release with packaging
infrastructure. It is all in trunk at this point.

However, I would Bigtop to not just passively react to what happens
when they release, but rather work with those communities ahead
of time.

That's why I was really hoping for Owen/Alan/Eric Y. to chime in on this
discussion since they seem to be intimately involved in providing
packaging infrastructure for those projects.

> From a technical point of view, last time I checked, Hadoop and Pig
> packaging efforts only concern RPMs for RHEL/CentOS-like platform (I
> remember seeing a few things that wouldn't work on something else than
> RHEL/CentOS-like, maybe it has changed since).  Whereas BigTop comes
> from CDH and even though it comes with some history and cruft, it works
> and has been deployed in a wide variety of production clusters across
> broad environment and GNU/Linux distributions. So I would rather work on
> cleaning up, improving or bringing new features to BigTop instead of
> helping other packaging efforts catching up on what BigTop has been
> doing since day one.

I tend to agree with that assessment. That said, given the different
choices in packaging that were implement in Hadoop/Pig I think
we should really find out WHY they were implemented that way.
Perhaps there was some kind of customer feedback that we
really have to listen to. So again, it is rather crucial for us to get
Owen/Alan/Eric Y. into this discussion.

I'm ambivalent on whether packaging infrastructure should be kept
upstream or reside in Bigtop proper. After all, quite a few OS projects
still have top-level debian/ subdirectory in their tarballs.

That said, I'd really, really hate to see us diverge on packaging
without any good reason behind it. In my mind, the beauty of
Bigtop is exactly its goal of being a place where packaging
and integration happens for the benefit of all sorts of different
customers ranging from individuals all the way to Linux packagers.
It simply would be a disservice to our customer base to have
Apache releases of Hadoop packaged differently from, lets
say Ubuntu.

> (*) I am also lobbying for building and testing our packaging against
> the trunk of each project so we can pro-actively ensure releases have a
> good enough quality for us, rather than waiting for a release to go out,
> file bug/patches and wait for the next release to be able to use it (if
> no other issues have been introduced). But all of this is pointless
> until we get some build going on.

+1

Thanks,
Roman.

Re: Projects that bundle packaging infrastructure

Posted by Bruno Mahé <bm...@apache.org>.
On 08/12/2011 05:51 PM, Roman Shaposhnik wrote:
> Guys,
>
> I've noticed that a few Apache projects that Bigtop is integrating seem to
> have packaging infrastructure bundled in the trunk (Hadoop proper and
> Pig come to mind right away, but there could be others, I guess).
>
> So far, Bigtop has been solving a problem of providing a point of
> integration and packaging for projects that didn't have any of that.
> Now that some started to solve that same problem in an incompatible
> way, what policy would make the most sense for us moving forward?
>
> Thoughts?
>
> Thanks,
> Roman.
ok, I bite:

I think it is too early to think about that and we should wait for these
packaging efforts to be more battle tested. Until then, we should focus
on what is already in BigTop.

>From a technical point of view, last time I checked, Hadoop and Pig
packaging efforts only concern RPMs for RHEL/CentOS-like platform (I
remember seeing a few things that wouldn't work on something else than
RHEL/CentOS-like, maybe it has changed since).  Whereas BigTop comes
from CDH and even though it comes with some history and cruft, it works
and has been deployed in a wide variety of production clusters across
broad environment and GNU/Linux distributions. So I would rather work on
cleaning up, improving or bringing new features to BigTop instead of
helping other packaging efforts catching up on what BigTop has been
doing since day one.

Then there is the integration part. Each project has a different point
of view on how to organize their layout, what OS to support and do
things and we need some room to adjust for that. I would even argue we
should allow ourselves to patch build and security issues, but this is
out of scope of this discussion.
We need also to be able to pick and choose versions of each project to
ensure they work well with each other. Bigtop is about the whole thing
and not each individual part. Besides nothing prevent having different
packaging efforts with different objectives and focus.
But we should also be able to quickly release fixes for packaging. For
example, I was told that not being able to build on Fedora and openSUSE
is not a blocker to Hadoop 0.20.204. If we were depending on its
packaging, this would be disastrous from a packaging point of view (but
it's fine in a Hadoop context given no one has really complained about
it). So having our packaging work outside of the project allow each
project to focus on their priorities while we can focus on ours without
having to convince each time, each project (*).

But this is a good thing to have these projects starting to think about
packaging issues and to make them aware of it. This will improve their
quality and make our lives easier as well. And nothing prevent us from
collaborating, sending patches or reusing good ideas/code.


(*) I am also lobbying for building and testing our packaging against
the trunk of each project so we can pro-actively ensure releases have a
good enough quality for us, rather than waiting for a release to go out,
file bug/patches and wait for the next release to be able to use it (if
no other issues have been introduced). But all of this is pointless
until we get some build going on.