You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Anatoli Fomenko <af...@yahoo.com> on 2012/09/20 03:00:36 UTC

Bigtop environment setup

I found that in order to avoid unnecessary build failures I need to quickly set up additional VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.

Any suggestions how it could be accelerated?

Thank you,
Anatoli

Re: Bigtop environment setup

Posted by Anatoli Fomenko <af...@yahoo.com>.

Thanks everybody for your input.
I've created a Jira https://issues.apache.org/jira/browse/BIGTOP-720, so we can move development and further discussion there.

Anatoli

________________________________
 From: Bruno Mahé <bm...@apache.org>
To: bigtop-dev@incubator.apache.org 
Cc: Roman Shaposhnik <rv...@apache.org> 
Sent: Wednesday, September 26, 2012 1:34 AM
Subject: Re: Bigtop environment setup

On 09/24/2012 11:26 AM, Roman Shaposhnik wrote:
> On Fri, Sep 21, 2012 at 8:55 PM, Bruno Mahé <bm...@apache.org> wrote:
>> There are already tool to extract dependencies out of spec files. I imagine
>> the same is probably true for debs.
>
> What's the tool name?

My bad, I was actually referring to tools based on SRPMs, not SPEC files 
directly.

> Also, as a matter of fact for our purposes the tool
> can be as simple as just grepping for them. In fact, when I did
>    $ git grep -E 'Build(Requires|-Depends):'
>
> I didn't find anything that would be too difficult to handle via sed/awk.
>
> Thanks,
> Roman.
>

But what about the parsing of the various macros? They may or may not be 
defined for specific distributions.

Thanks,
Bruno

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

On 09/24/2012 11:26 AM, Roman Shaposhnik wrote:
> On Fri, Sep 21, 2012 at 8:55 PM, Bruno Mahé <bm...@apache.org> wrote:
>> There are already tool to extract dependencies out of spec files. I imagine
>> the same is probably true for debs.
>
> What's the tool name?

My bad, I was actually referring to tools based on SRPMs, not SPEC files 
directly.


> Also, as a matter of fact for our purposes the tool
> can be as simple as just grepping for them. In fact, when I did
>    $ git grep -E 'Build(Requires|-Depends):'
>
> I didn't find anything that would be too difficult to handle via sed/awk.
>
> Thanks,
> Roman.
>

But what about the parsing of the various macros? They may or may not be 
defined for specific distributions.

Thanks,
Bruno

Re: Bigtop environment setup

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, Sep 21, 2012 at 8:55 PM, Bruno Mahé <bm...@apache.org> wrote:
> There are already tool to extract dependencies out of spec files. I imagine
> the same is probably true for debs.

What's the tool name? Also, as a matter of fact for our purposes the tool
can be as simple as just grepping for them. In fact, when I did
  $ git grep -E 'Build(Requires|-Depends):'

I didn't find anything that would be too difficult to handle via sed/awk.

Thanks,
Roman.

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

There are already tool to extract dependencies out of spec files. I 
imagine the same is probably true for debs.

On 09/19/2012 06:55 PM, Konstantin Boudnik wrote:
> Good idea, guys.
>
> I envision something like 2) with a very thin layer of automation written in
> Groovy to generate dependency lists and run needed install commands.
>
> I argue for Groovy, because any kind of programming in shell ends up in an
> awful mess.
>
> Cos
>
> On Wed, Sep 19, 2012 at 06:32PM, Roman Shaposhnik wrote:
>> On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>>> I found that in order to avoid unnecessary build failures I need to quickly set up additional
>>> VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
>>>
>>> Any suggestions how it could be accelerated?
>>
>> You're raising a very good point, actually. In fact I've run into
>> this very issue while trying to configure an extra Jenkins
>> slave for bigtop01.
>>
>> Now, in the ideal world, all the build time dependencies
>> that we have would be packaged and we'd express the
>> fact that we depend on them via the very same packages
>> that we're maintaining. That's what RPM's BuildRequires:
>> and DEB's Build-Depends: fields are for -- to tell you
>> explicitly what's required to be installed on the system
>> before you can do the build of the package.
>>
>> Then you'd use the tools like:
>>      apt-get build-dep
>>      yum-builddep
>> to satisfy all the dependencies and you're done.
>>
>> Now, this works great in the environment where
>> you already have source packages which you
>> can give to apt-get build-dep/yum-builddep
>>
>> But Bigtop has to be bootstrapped from the source.
>> We can't assume existence of source packages.
>>
>> So here's the question to the bigger Bigtop
>> community -- how do we want to proceed to
>> manage repeatable build environments for
>> our packages?
>>
>> The options I see are:
>>     #1 maintain a parallel (very shallow) collection
>>          of puppet code that would, essentially,
>>          manage our "build slaves"
>>     #2 do #1 but automate it in such a way that
>>          the info actually gets harvested from
>>          spec/conrol files
>>
>> Thoughts?
>>
>> Thanks,
>> Roman.

Re: Bigtop environment setup

Posted by Konstantin Boudnik <co...@apache.org>.

Good idea, guys.

I envision something like 2) with a very thin layer of automation written in
Groovy to generate dependency lists and run needed install commands. 

I argue for Groovy, because any kind of programming in shell ends up in an
awful mess.

Cos

On Wed, Sep 19, 2012 at 06:32PM, Roman Shaposhnik wrote:
> On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> > I found that in order to avoid unnecessary build failures I need to quickly set up additional
> > VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
> >
> > Any suggestions how it could be accelerated?
> 
> You're raising a very good point, actually. In fact I've run into
> this very issue while trying to configure an extra Jenkins
> slave for bigtop01.
> 
> Now, in the ideal world, all the build time dependencies
> that we have would be packaged and we'd express the
> fact that we depend on them via the very same packages
> that we're maintaining. That's what RPM's BuildRequires:
> and DEB's Build-Depends: fields are for -- to tell you
> explicitly what's required to be installed on the system
> before you can do the build of the package.
> 
> Then you'd use the tools like:
>     apt-get build-dep
>     yum-builddep
> to satisfy all the dependencies and you're done.
> 
> Now, this works great in the environment where
> you already have source packages which you
> can give to apt-get build-dep/yum-builddep
> 
> But Bigtop has to be bootstrapped from the source.
> We can't assume existence of source packages.
> 
> So here's the question to the bigger Bigtop
> community -- how do we want to proceed to
> manage repeatable build environments for
> our packages?
> 
> The options I see are:
>    #1 maintain a parallel (very shallow) collection
>         of puppet code that would, essentially,
>         manage our "build slaves"
>    #2 do #1 but automate it in such a way that
>         the info actually gets harvested from
>         spec/conrol files
> 
> Thoughts?
> 
> Thanks,
> Roman.

Re: Bigtop environment setup

Posted by Anatoli Fomenko <af...@yahoo.com>.

Option #3 looks interesting.

It does not seem that either Oz 
(http://aeolusproject.org/oz.html) or BoxGrinder 
(http://boxgrinder.org/) do not fully automatically support Ubuntu.

Do you think it is a way to use these solutions for all our supported OSes?

Thanks,
Anatoli

________________________________
 From: Bruno Mahé <bm...@apache.org>
To: bigtop-dev@incubator.apache.org 
Cc: Roman Shaposhnik <rv...@apache.org>; Anatoli Fomenko <af...@yahoo.com> 
Sent: Friday, September 21, 2012 9:20 PM
Subject: Re: Bigtop environment setup

On 09/19/2012 06:32 PM, Roman Shaposhnik wrote:
> On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>> I found that in order to avoid unnecessary build failures I need to quickly set up additional
>> VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
>>
>> Any suggestions how it could be accelerated?
>
> You're raising a very good point, actually. In fact I've run into
> this very issue while trying to configure an extra Jenkins
> slave for bigtop01.
>
> Now, in the ideal world, all the build time dependencies
> that we have would be packaged and we'd express the
> fact that we depend on them via the very same packages
> that we're maintaining. That's what RPM's BuildRequires:
> and DEB's Build-Depends: fields are for -- to tell you
> explicitly what's required to be installed on the system
> before you can do the build of the package.
>
> Then you'd use the tools like:
>      apt-get build-dep
>      yum-builddep
> to satisfy all the dependencies and you're done.
>
> Now, this works great in the environment where
> you already have source packages which you
> can give to apt-get build-dep/yum-builddep
>
> But Bigtop has to be bootstrapped from the source.
> We can't assume existence of source packages.
>
> So here's the question to the bigger Bigtop
> community -- how do we want to proceed to
> manage repeatable build environments for
> our packages?
>
> The options I see are:
>     #1 maintain a parallel (very shallow) collection
>          of puppet code that would, essentially,
>          manage our "build slaves"
>     #2 do #1 but automate it in such a way that
>          the info actually gets harvested from
>          spec/conrol files
>
> Thoughts?
>
> Thanks,
> Roman.
>

#1 is nice since it can deal with non-packaging issue. But it still 
require people to install and know how to deal with puppet. From a dev 
point of view we also need to remember to not use the latest features 
since some OS lag significantly in term of versions of puppet available.

#2 is also nice since it can be dealt with the usual set of tools. But 
it still requires some effort on users. Also some dependencies are not 
and will probably never be available as packages (ex: Oracle JDK).

I also don't think there is one and only one solution.
My setup at home is quite different from the bigtop01 one.
And once you are familiar enough with Apache Bigtop and know how to set 
it up, you may find options #1 and #2 probably not well adapted to your 
situation.

So this leads me to think about option #3: VMs.
Tools like Boxgrinder and Oz can deal with multiple OSes and can create 
local images as well as push them to the cloud.
The build would be repeatable and would not require any effort from the 
end user (apart maybe providing Oracle JDK, but that would have to be 
the case whatever the solution). Future contributors would just need to 
boot their VM to get started and hopefully ease contribution.

Thoughts?

Thanks,
Bruno

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

On 09/23/2012 12:06 AM, Konstantin Boudnik wrote:
> On Fri, Sep 21, 2012 at 09:20PM, Bruno MahИ wrote:
>> #1 is nice since it can deal with non-packaging issue. But it still
>> require people to install and know how to deal with puppet. From a
>> dev point of view we also need to remember to not use the latest
>> features since some OS lag significantly in term of versions of
>> puppet available.
>>
>> #2 is also nice since it can be dealt with the usual set of tools.
>> But it still requires some effort on users. Also some dependencies
>> are not and will probably never be available as packages (ex: Oracle
>> JDK).
>>
>> I also don't think there is one and only one solution.
>> My setup at home is quite different from the bigtop01 one.
>> And once you are familiar enough with Apache Bigtop and know how to
>> set it up, you may find options #1 and #2 probably not well adapted
>> to your situation.
>>
>> So this leads me to think about option #3: VMs.
>> Tools like Boxgrinder and Oz can deal with multiple OSes and can
>> create local images as well as push them to the cloud.
>> The build would be repeatable and would not require any effort from
>> the end user (apart maybe providing Oracle JDK, but that would have
>> to be the case whatever the solution). Future contributors would
>> just need to boot their VM to get started and hopefully ease
>> contribution.
>>
>> Thoughts?
>
> (writing at the end of the last post feels totally unnatural, but for the
> benefit of the future readers I will comply :)
>
> I think boxed environments (like VMs) are an overkill. One of the issues here
> (as Anatoli pointed out) is non-even support of all OSes. Say, BoxGrinder has
> some issues with Ubuntu, etc.
>
> I suppose a ol' proven toolchain type of environment that can automatically
> bootstrap upon a fresh install (or update itself: think Maven model) and pull
> whatever apps are required in whatever form they might exist. Say, JDK (or
> Groovy or else) can be downloaded as a tarball, given that a more suitable
> packaging isn't available, etc. Such approach would be more fluid than a
> somewhat rigid VMs, that would have to be updated periodically, versioned,
> etc.
>
> Another benefit of a toolchain is that BigTop packages might have to
> redistribute/wrap some of the tools for later use once a package is installed
> on a customer's system.
>
> So, in other words, #2 above (or some modification of it) looks more appealing to me.
>
> --

I actually think boxed environments are simpler than a toolchain since 
they are more constrained. And you avoid the nasty issues with natives 
any toolchain would have.

In any case, any solution, whether it is toolchain or VMs, would have 
similar issues regarding supporting multiple OSes.

Oz seems to support partially non-rpm distributions. So that may be way 
to solve it. But worst case scenario I would not see any issue with 
having one sets of vm recipes for centos6 and another one for ubuntu 
(they can always share some common scripts).

Maintaining a toolchain with updates such as you describe also comes 
with a bunch of issues for upgrades that are non-existent for VMs. 
Considering that we are talking about helping newcomers, who will 
probably set up their own environment later on and do not necessary want 
to taint their system, VM can easily be consumed and thrown away.
You also don't have to worry about combinations of versions and packages 
or even to replace system provided packages which may disrupts other 
packages from the base OS.

I am also interested in VMs, since one of my working pattern is to have 
a base build vm that I clone for large work (new package or any 
significant work with multiple iterations). So I can locally (from my 
VM) build, install the package, test  and remove them and not care if my 
newly built package will trash my machine. This enables me to quickly 
iterates through changes without having to move packages around.

Thanks,
Bruno

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

On 09/24/2012 10:14 PM, Konstantin Boudnik wrote:
> On Mon, Sep 24, 2012 at 10:07PM, Roman Shaposhnik wrote:
>> On Sun, Sep 23, 2012 at 3:41 PM, Bruno MahИ <bm...@apache.org> wrote:
>>> Regarding the size, it depends on a lot of things (using raw vs qcow, how
>>> many packages installed...).
>>> For instance the Apache Bigtop VMs take quite some space because they
>>> reserve some free space so people can upload their datasets. But for a build
>>> VM, not that much space is needed.
>>>
>>> For reference, we could put Apache Bigtop distribution on a bootable USB key
>>> :)
>>
>> ------------------------------------------------------------------------------------------------
>> tl;dr; here's what I propose a 'virtual' package that would
>> enable the following workflow:
>>    $ make build-[rpm|deb]
>>    $ sudo [apt-get|yum|zypper] install output/build/build*.[rpm|deb]
>> and you get EVERYTHING that is needed for bigtop builds
>> installed as packages plus you get /opt/bigtop-toolchain
>> bits.
>> -------------------------------------------------------------------------------------------------
>>
>> Ok, I can certainly see a reason to maintain VMs. Personally, I'd love
>> if I could say
>>      $ vagrant add box/up
>> and get the environment up and running without much fuzz.
>>
>> That said -- this is but a single use case. So lets enumerate
>> the ones I see:
>>      1. provisioning/maintaining a long running Jenkins slave
>>      2. provisioning a host dev environment
>>      3. provisioning a VM dev environment
>>
>> I specifically split #3 from #1,2 because a lot of time you can't
>> (already running in a virtualized environment) or won't (personally
>> I do a LOT of development on my host OS) go the VM route.
>>
>> Thus, however appealing VMs are I think we can't escape
>> the simple truth that we have to have a non-VM based solution
>> for at lest #1 and #2.
>>
>> How about if we simple add a 'virtual' package called build
>> so that running:
>>     $ make build-[rpm|deb]
>> will create a package that would download all the bits and
>> pieced of a tool-chain AND also promote BuildRequires:
>> to its own Requires: so that installing this package will
>> give you all the toolchain bits and all the package deps
>> at the same time.
>>
>> I think I can prototype it in a couple of hours unless there's
>> a strong revulsion towards this type of solution (and even
>> then I think I might just do it for my own personal
>> gratification ;-)).
>
> I think having a package like this doesn't contradict to the VM need. Even
> more - it dovetails into the concept very nicely, because with such a package
> in place the VM creation would be completely separated from the essence of the
> toolchain and would be simply a breeze.
>
> If you are looking for a blessing - you have mine for sure ;)
>
> Cos
>

Oh sure. It was never an either/or choice.

Code speaks louder than debates so you have my blessing as well.

Re: Bigtop environment setup

Posted by Konstantin Boudnik <co...@apache.org>.

On Mon, Sep 24, 2012 at 10:07PM, Roman Shaposhnik wrote:
> On Sun, Sep 23, 2012 at 3:41 PM, Bruno MahИ <bm...@apache.org> wrote:
> > Regarding the size, it depends on a lot of things (using raw vs qcow, how
> > many packages installed...).
> > For instance the Apache Bigtop VMs take quite some space because they
> > reserve some free space so people can upload their datasets. But for a build
> > VM, not that much space is needed.
> >
> > For reference, we could put Apache Bigtop distribution on a bootable USB key
> > :)
> 
> ------------------------------------------------------------------------------------------------
> tl;dr; here's what I propose a 'virtual' package that would
> enable the following workflow:
>   $ make build-[rpm|deb]
>   $ sudo [apt-get|yum|zypper] install output/build/build*.[rpm|deb]
> and you get EVERYTHING that is needed for bigtop builds
> installed as packages plus you get /opt/bigtop-toolchain
> bits.
> -------------------------------------------------------------------------------------------------
> 
> Ok, I can certainly see a reason to maintain VMs. Personally, I'd love
> if I could say
>     $ vagrant add box/up
> and get the environment up and running without much fuzz.
> 
> That said -- this is but a single use case. So lets enumerate
> the ones I see:
>     1. provisioning/maintaining a long running Jenkins slave
>     2. provisioning a host dev environment
>     3. provisioning a VM dev environment
> 
> I specifically split #3 from #1,2 because a lot of time you can't
> (already running in a virtualized environment) or won't (personally
> I do a LOT of development on my host OS) go the VM route.
> 
> Thus, however appealing VMs are I think we can't escape
> the simple truth that we have to have a non-VM based solution
> for at lest #1 and #2.
> 
> How about if we simple add a 'virtual' package called build
> so that running:
>    $ make build-[rpm|deb]
> will create a package that would download all the bits and
> pieced of a tool-chain AND also promote BuildRequires:
> to its own Requires: so that installing this package will
> give you all the toolchain bits and all the package deps
> at the same time.
> 
> I think I can prototype it in a couple of hours unless there's
> a strong revulsion towards this type of solution (and even
> then I think I might just do it for my own personal
> gratification ;-)).

I think having a package like this doesn't contradict to the VM need. Even
more - it dovetails into the concept very nicely, because with such a package
in place the VM creation would be completely separated from the essence of the
toolchain and would be simply a breeze.

If you are looking for a blessing - you have mine for sure ;)

Cos

Re: Bigtop environment setup

Posted by Roman Shaposhnik <rv...@apache.org>.

On Sun, Sep 23, 2012 at 3:41 PM, Bruno Mahé <bm...@apache.org> wrote:
> Regarding the size, it depends on a lot of things (using raw vs qcow, how
> many packages installed...).
> For instance the Apache Bigtop VMs take quite some space because they
> reserve some free space so people can upload their datasets. But for a build
> VM, not that much space is needed.
>
> For reference, we could put Apache Bigtop distribution on a bootable USB key
> :)

------------------------------------------------------------------------------------------------
tl;dr; here's what I propose a 'virtual' package that would
enable the following workflow:
  $ make build-[rpm|deb]
  $ sudo [apt-get|yum|zypper] install output/build/build*.[rpm|deb]
and you get EVERYTHING that is needed for bigtop builds
installed as packages plus you get /opt/bigtop-toolchain
bits.
-------------------------------------------------------------------------------------------------

Ok, I can certainly see a reason to maintain VMs. Personally, I'd love
if I could say
    $ vagrant add box/up
and get the environment up and running without much fuzz.

That said -- this is but a single use case. So lets enumerate
the ones I see:
    1. provisioning/maintaining a long running Jenkins slave
    2. provisioning a host dev environment
    3. provisioning a VM dev environment

I specifically split #3 from #1,2 because a lot of time you can't
(already running in a virtualized environment) or won't (personally
I do a LOT of development on my host OS) go the VM route.

Thus, however appealing VMs are I think we can't escape
the simple truth that we have to have a non-VM based solution
for at lest #1 and #2.

How about if we simple add a 'virtual' package called build
so that running:
   $ make build-[rpm|deb]
will create a package that would download all the bits and
pieced of a tool-chain AND also promote BuildRequires:
to its own Requires: so that installing this package will
give you all the toolchain bits and all the package deps
at the same time.

I think I can prototype it in a couple of hours unless there's
a strong revulsion towards this type of solution (and even
then I think I might just do it for my own personal
gratification ;-)).

Thanks,
Roman.

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

Regarding the size, it depends on a lot of things (using raw vs qcow, 
how many packages installed...).
For instance the Apache Bigtop VMs take quite some space because they 
reserve some free space so people can upload their datasets. But for a 
build VM, not that much space is needed.

For reference, we could put Apache Bigtop distribution on a bootable USB 
key :)



On 09/23/2012 03:26 PM, Anatoli Fomenko wrote:
> While liking very much tools that Bruno brought up, I tend to think along with Cos' comments. It looks like 2GB is not an unusual size of BoxGrinder based appliance, perhaps for specific use cases. At this point, I would think of something more agile for Bigtop.
>
>
> Thanks,
> Anatoli
>
>
>
> ________________________________
>   From: Konstantin Boudnik <co...@apache.org>
> To: bigtop-dev@incubator.apache.org
> Sent: Sunday, September 23, 2012 12:06 AM
> Subject: Re: Bigtop environment setup
>
> On Fri, Sep 21, 2012 at 09:20PM, Bruno MahИ wrote:
>> On 09/19/2012 06:32 PM, Roman Shaposhnik wrote:
>>> On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>>>> I found that in order to avoid unnecessary build failures I need to quickly set up additional
>>>> VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
>>>>
>>>> Any suggestions how it could be accelerated?
>>>
>>> You're raising a very good point, actually. In fact I've run into
>>> this very issue while trying to configure an extra Jenkins
>>> slave for bigtop01.
>>>
>>> Now, in the ideal world, all the build time dependencies
>>> that we have would be packaged and we'd express the
>>> fact that we depend on them via the very same packages
>>> that we're maintaining. That's what RPM's BuildRequires:
>>> and DEB's Build-Depends: fields are for -- to tell you
>>> explicitly what's required to be installed on the system
>>> before you can do the build of the package.
>>>
>>> Then you'd use the tools like:
>>>       apt-get build-dep
>>>       yum-builddep
>>> to satisfy all the dependencies and you're done.
>>>
>>> Now, this works great in the environment where
>>> you already have source packages which you
>>> can give to apt-get build-dep/yum-builddep
>>>
>>> But Bigtop has to be bootstrapped from the source.
>>> We can't assume existence of source packages.
>>>
>>> So here's the question to the bigger Bigtop
>>> community -- how do we want to proceed to
>>> manage repeatable build environments for
>>> our packages?
>>>
>>> The options I see are:
>>>      #1 maintain a parallel (very shallow) collection
>>>           of puppet code that would, essentially,
>>>           manage our "build slaves"
>>>      #2 do #1 but automate it in such a way that
>>>           the info actually gets harvested from
>>>           spec/conrol files
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Roman.
>>>
>>
>> #1 is nice since it can deal with non-packaging issue. But it still
>> require people to install and know how to deal with puppet. From a
>> dev point of view we also need to remember to not use the latest
>> features since some OS lag significantly in term of versions of
>> puppet available.
>>
>> #2 is also nice since it can be dealt with the usual set of tools.
>> But it still requires some effort on users. Also some dependencies
>> are not and will probably never be available as packages (ex: Oracle
>> JDK).
>>
>> I also don't think there is one and only one solution.
>> My setup at home is quite different from the bigtop01 one.
>> And once you are familiar enough with Apache Bigtop and know how to
>> set it up, you may find options #1 and #2 probably not well adapted
>> to your situation.
>>
>> So this leads me to think about option #3: VMs.
>> Tools like Boxgrinder and Oz can deal with multiple OSes and can
>> create local images as well as push them to the cloud.
>> The build would be repeatable and would not require any effort from
>> the end user (apart maybe providing Oracle JDK, but that would have
>> to be the case whatever the solution). Future contributors would
>> just need to boot their VM to get started and hopefully ease
>> contribution.
>>
>> Thoughts?
>
> (writing at the end of the last post feels totally unnatural, but for the
> benefit of the future readers I will comply :)
>
> I think boxed environments (like VMs) are an overkill. One of the issues here
> (as Anatoli pointed out) is non-even support of all OSes. Say, BoxGrinder has
> some issues with Ubuntu, etc.
>
> I suppose a ol' proven toolchain type of environment that can automatically
> bootstrap upon a fresh install (or update itself: think Maven model) and pull
> whatever apps are required in whatever form they might exist. Say, JDK (or
> Groovy or else) can be downloaded as a tarball, given that a more suitable
> packaging isn't available, etc. Such approach would be more fluid than a
> somewhat rigid VMs, that would have to be updated periodically, versioned,
> etc.
>
> Another benefit of a toolchain is that BigTop packages might have to
> redistribute/wrap some of the tools for later use once a package is installed
> on a customer's system.
>
> So, in other words, #2 above (or some modification of it) looks more appealing to me.
>
> --
>    Take care,
> Konstantin (Cos) Boudnik
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>
> Disclaimer: Opinions expressed in this email are those of the author, and do
> not necessarily represent the views of any company the author might be
> affiliated with at the moment of writing.
>

Re: Spark in-memory analytics in BigTop stack

Posted by Konstantin Boudnik <co...@apache.org>.

On Wed, Sep 26, 2012 at 04:06PM, Wing Yew Poon wrote:
> Hi,
> please correct me if I'm wrong, but I thought Spark runs on top of
> Mesos. Does it not require Mesos to run?

Yes, you are right. That's why Mesos library is a part of Spark's dependencies.

Now, there are 'on top of Mesos' and 'on Mesos cluster', which are quite
different apparently ;)

Cos

> - Wing Yew
> 
> On Wed, Sep 26, 2012 at 10:42 AM, MTG dev <de...@magnatempusgroup.net> wrote:
> > Thanks Bruno!
> >
> > I have created  BIGTOP-715
> >
> > Cheers,
> >   MTG dev
> >
> > On Wed, Sep 26, 2012 at 01:19AM, Bruno MahИ wrote:
> >> zOn 09/25/2012 10:46 AM, MTG dev wrote:
> >> >Hi there.
> >> >
> >> >Apparently, I am not in a position to say what role Spark can play in the
> >> >Bigtop for I am not speaking for neither of those projects.
> >> >
> >> >However, I can tell that Spark provides a number of the advantages compare to
> >> >a traditional MapReduce model: stateful computational model with a need to
> >> >write everything back to file system after step, in-memory calculations,
> >> >higher level of primitives expressed in a functional language, etc. These
> >> >advantages combined with low-latency planner result in a very significant
> >> >performance improvement. I'd suggest to go over spark-project.org for more
> >> >information.
> >> >
> >> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
> >> >over the former because it is already here and can be used by anyone ;)
> >> >
> >> >As for integration with Bigtop: Spark doesn't require any special integration
> >> >with the rest of the stack - it might use HDFS as the underlying storage, but
> >> >that's about it.
> >> >
> >> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> >> >but I am not completely sure about its status.
> >> >
> >> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> >> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >> >>>Hi Alef,
> >> >>>
> >> >>>Great news!
> >> >>>
> >> >>>Spark developers are interested in developing Spark packages and
> >> >>>contributing them to open source. Since you already have them,
> >> >>>what would you think about contributing the source to BigTop?
> >> >
> >> >We don't have any plans of holding the sources of the packages back, but we
> >> >are working on rpm packaging right now. Once the work is over, we should be
> >> >able to contribute it back to the community. Shall there be a JIRA ticket for
> >> >that or something?
> >> >
> >> >With regards,
> >> >   Alef
> >> >   MTG dev team
> >> >
> >>
> >>
> >> Great news!
> >>
> >>
> >> And yes, there should be a ticket. It will be helpful to organize
> >> any work around it.
> >>
> >> Thanks,
> >> Bruno
> >>
> >>

Re: Spark in-memory analytics in BigTop stack

Posted by Wing Yew Poon <wy...@cloudera.com>.

Hi,
please correct me if I'm wrong, but I thought Spark runs on top of
Mesos. Does it not require Mesos to run?
- Wing Yew

On Wed, Sep 26, 2012 at 10:42 AM, MTG dev <de...@magnatempusgroup.net> wrote:
> Thanks Bruno!
>
> I have created  BIGTOP-715
>
> Cheers,
>   MTG dev
>
> On Wed, Sep 26, 2012 at 01:19AM, Bruno Mahé wrote:
>> zOn 09/25/2012 10:46 AM, MTG dev wrote:
>> >Hi there.
>> >
>> >Apparently, I am not in a position to say what role Spark can play in the
>> >Bigtop for I am not speaking for neither of those projects.
>> >
>> >However, I can tell that Spark provides a number of the advantages compare to
>> >a traditional MapReduce model: stateful computational model with a need to
>> >write everything back to file system after step, in-memory calculations,
>> >higher level of primitives expressed in a functional language, etc. These
>> >advantages combined with low-latency planner result in a very significant
>> >performance improvement. I'd suggest to go over spark-project.org for more
>> >information.
>> >
>> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
>> >over the former because it is already here and can be used by anyone ;)
>> >
>> >As for integration with Bigtop: Spark doesn't require any special integration
>> >with the rest of the stack - it might use HDFS as the underlying storage, but
>> >that's about it.
>> >
>> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
>> >but I am not completely sure about its status.
>> >
>> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
>> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>> >>>Hi Alef,
>> >>>
>> >>>Great news!
>> >>>
>> >>>Spark developers are interested in developing Spark packages and
>> >>>contributing them to open source. Since you already have them,
>> >>>what would you think about contributing the source to BigTop?
>> >
>> >We don't have any plans of holding the sources of the packages back, but we
>> >are working on rpm packaging right now. Once the work is over, we should be
>> >able to contribute it back to the community. Shall there be a JIRA ticket for
>> >that or something?
>> >
>> >With regards,
>> >   Alef
>> >   MTG dev team
>> >
>>
>>
>> Great news!
>>
>>
>> And yes, there should be a ticket. It will be helpful to organize
>> any work around it.
>>
>> Thanks,
>> Bruno
>>
>>

Re: Spark in-memory analytics in BigTop stack

Posted by MTG dev <de...@magnatempusgroup.net>.

Thanks Bruno!

I have created  BIGTOP-715

Cheers,
  MTG dev
  
On Wed, Sep 26, 2012 at 01:19AM, Bruno Mahé wrote:
> zOn 09/25/2012 10:46 AM, MTG dev wrote:
> >Hi there.
> >
> >Apparently, I am not in a position to say what role Spark can play in the
> >Bigtop for I am not speaking for neither of those projects.
> >
> >However, I can tell that Spark provides a number of the advantages compare to
> >a traditional MapReduce model: stateful computational model with a need to
> >write everything back to file system after step, in-memory calculations,
> >higher level of primitives expressed in a functional language, etc. These
> >advantages combined with low-latency planner result in a very significant
> >performance improvement. I'd suggest to go over spark-project.org for more
> >information.
> >
> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
> >over the former because it is already here and can be used by anyone ;)
> >
> >As for integration with Bigtop: Spark doesn't require any special integration
> >with the rest of the stack - it might use HDFS as the underlying storage, but
> >that's about it.
> >
> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> >but I am not completely sure about its status.
> >
> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >>>Hi Alef,
> >>>
> >>>Great news!
> >>>
> >>>Spark developers are interested in developing Spark packages and
> >>>contributing them to open source. Since you already have them,
> >>>what would you think about contributing the source to BigTop?
> >
> >We don't have any plans of holding the sources of the packages back, but we
> >are working on rpm packaging right now. Once the work is over, we should be
> >able to contribute it back to the community. Shall there be a JIRA ticket for
> >that or something?
> >
> >With regards,
> >   Alef
> >   MTG dev team
> >
> 
> 
> Great news!
> 
> 
> And yes, there should be a ticket. It will be helpful to organize
> any work around it.
> 
> Thanks,
> Bruno
> 
>

Re: Spark in-memory analytics in BigTop stack

Posted by Bruno Mahé <bm...@apache.org>.

zOn 09/25/2012 10:46 AM, MTG dev wrote:
> Hi there.
>
> Apparently, I am not in a position to say what role Spark can play in the
> Bigtop for I am not speaking for neither of those projects.
>
> However, I can tell that Spark provides a number of the advantages compare to
> a traditional MapReduce model: stateful computational model with a need to
> write everything back to file system after step, in-memory calculations,
> higher level of primitives expressed in a functional language, etc. These
> advantages combined with low-latency planner result in a very significant
> performance improvement. I'd suggest to go over spark-project.org for more
> information.
>
> I am not an expert on Drill, but I'd say that Spark give immediate benefits
> over the former because it is already here and can be used by anyone ;)
>
> As for integration with Bigtop: Spark doesn't require any special integration
> with the rest of the stack - it might use HDFS as the underlying storage, but
> that's about it.
>
> Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> but I am not completely sure about its status.
>
> On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
>> On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>>> Hi Alef,
>>>
>>> Great news!
>>>
>>> Spark developers are interested in developing Spark packages and
>>> contributing them to open source. Since you already have them,
>>> what would you think about contributing the source to BigTop?
>
> We don't have any plans of holding the sources of the packages back, but we
> are working on rpm packaging right now. Once the work is over, we should be
> able to contribute it back to the community. Shall there be a JIRA ticket for
> that or something?
>
> With regards,
>    Alef
>    MTG dev team
>


Great news!


And yes, there should be a ticket. It will be helpful to organize any 
work around it.

Thanks,
Bruno

Re: Spark in-memory analytics in BigTop stack

Posted by MTG dev <de...@magnatempusgroup.net>.

Hi there.

Apparently, I am not in a position to say what role Spark can play in the
Bigtop for I am not speaking for neither of those projects.

However, I can tell that Spark provides a number of the advantages compare to
a traditional MapReduce model: stateful computational model with a need to
write everything back to file system after step, in-memory calculations,
higher level of primitives expressed in a functional language, etc. These
advantages combined with low-latency planner result in a very significant
performance improvement. I'd suggest to go over spark-project.org for more
information.

I am not an expert on Drill, but I'd say that Spark give immediate benefits
over the former because it is already here and can be used by anyone ;)

As for integration with Bigtop: Spark doesn't require any special integration
with the rest of the stack - it might use HDFS as the underlying storage, but
that's about it.

Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
but I am not completely sure about its status.

On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> > Hi Alef,
> >
> > Great news!
> >
> > Spark developers are interested in developing Spark packages and
> > contributing them to open source. Since you already have them,
> > what would you think about contributing the source to BigTop?

We don't have any plans of holding the sources of the packages back, but we
are working on rpm packaging right now. Once the work is over, we should be
able to contribute it back to the community. Shall there be a JIRA ticket for
that or something?

With regards,
  Alef
  MTG dev team

> This is very, very interesting indeed! I'd also like to hear a bit
> more about what role Spark can play in Bigtop project -- from
> just skimming the web it feels like it can be seen as an
> alternative to Apache Drill (incubating) or am I completely off
> base here?
> 
> Also, what level of integration is required between Spark and
> the rest of Hadoop ecosystem components (Hive, Pig, etc.)?
> 
> Thanks,
> Roman.

Re: Spark in-memory analytics in BigTop stack

Posted by Roman Shaposhnik <rv...@apache.org>.

On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> Hi Alef,
>
> Great news!
>
> Spark developers are interested in developing Spark packages and
> contributing them to open source. Since you already have them,
> what would you think about contributing the source to BigTop?

This is very, very interesting indeed! I'd also like to hear a bit
more about what role Spark can play in Bigtop project -- from
just skimming the web it feels like it can be seen as an
alternative to Apache Drill (incubating) or am I completely off
base here?

Also, what level of integration is required between Spark and
the rest of Hadoop ecosystem components (Hive, Pig, etc.)?

Thanks,
Roman.

Re: Spark in-memory analytics in BigTop stack

Posted by Anatoli Fomenko <af...@yahoo.com>.

Hi Alef,

Great news!

Spark developers are interested in developing Spark packages and contributing them to open source. Since you already have them, what would you think about contributing the source to BigTop?

Thank you,
Anatoli




________________________________
 From: MTG dev <de...@magnatempusgroup.net>
To: bigtop-dev@incubator.apache.org 
Sent: Monday, September 24, 2012 9:31 AM
Subject: Spark in-memory analytics in BigTop stack
 
Fellow BigTop'pers.

We have just rolled out a readily available Spark 0.5 (www.spark-project.org)
packaged for Ubuntu distribution. This package is build against current
official Apache Hadoop 1.0.3, so it should be compatible with everything from
0.20.205 up to Hadoop 1.1 release candidate. Redhat/CentOS version is coming in
a few days (in case someone is interested).

You can find all related information at
    http://www.magnatempusgroup.net/blog/2012/09/24/incredibly-fast-in-memory-analytics-for-bigdata-technology-preview/
and download installable package from
    http://magnatempusgroup.net/ftphost/releases/Spark-0.5-1.0.3/

I am posting this here, because the package is created in the exact standards of BigTop stack. In other words, BigTop rules!

We would love to hear your feedback and comments!

-- 
With regards,
    Alef
        MTG development team

Spark in-memory analytics in BigTop stack

Posted by MTG dev <de...@magnatempusgroup.net>.

Fellow BigTop'pers.

We have just rolled out a readily available Spark 0.5 (www.spark-project.org)
packaged for Ubuntu distribution. This package is build against current
official Apache Hadoop 1.0.3, so it should be compatible with everything from
0.20.205 up to Hadoop 1.1 release candidate. Redhat/CentOS version is coming in
a few days (in case someone is interested).

You can find all related information at
    http://www.magnatempusgroup.net/blog/2012/09/24/incredibly-fast-in-memory-analytics-for-bigdata-technology-preview/
and download installable package from
    http://magnatempusgroup.net/ftphost/releases/Spark-0.5-1.0.3/

I am posting this here, because the package is created in the exact standards of BigTop stack. In other words, BigTop rules!

We would love to hear your feedback and comments!

-- 
With regards,
	Alef
        MTG development team

Re: Bigtop environment setup

Posted by Anatoli Fomenko <af...@yahoo.com>.

While liking very much tools that Bruno brought up, I tend to think along with Cos' comments. It looks like 2GB is not an unusual size of BoxGrinder based appliance, perhaps for specific use cases. At this point, I would think of something more agile for Bigtop.


Thanks,
Anatoli



________________________________
 From: Konstantin Boudnik <co...@apache.org>
To: bigtop-dev@incubator.apache.org 
Sent: Sunday, September 23, 2012 12:06 AM
Subject: Re: Bigtop environment setup
 
On Fri, Sep 21, 2012 at 09:20PM, Bruno MahИ wrote:
> On 09/19/2012 06:32 PM, Roman Shaposhnik wrote:
> >On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >>I found that in order to avoid unnecessary build failures I need to quickly set up additional
> >>VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
> >>
> >>Any suggestions how it could be accelerated?
> >
> >You're raising a very good point, actually. In fact I've run into
> >this very issue while trying to configure an extra Jenkins
> >slave for bigtop01.
> >
> >Now, in the ideal world, all the build time dependencies
> >that we have would be packaged and we'd express the
> >fact that we depend on them via the very same packages
> >that we're maintaining. That's what RPM's BuildRequires:
> >and DEB's Build-Depends: fields are for -- to tell you
> >explicitly what's required to be installed on the system
> >before you can do the build of the package.
> >
> >Then you'd use the tools like:
> >     apt-get build-dep
> >     yum-builddep
> >to satisfy all the dependencies and you're done.
> >
> >Now, this works great in the environment where
> >you already have source packages which you
> >can give to apt-get build-dep/yum-builddep
> >
> >But Bigtop has to be bootstrapped from the source.
> >We can't assume existence of source packages.
> >
> >So here's the question to the bigger Bigtop
> >community -- how do we want to proceed to
> >manage repeatable build environments for
> >our packages?
> >
> >The options I see are:
> >    #1 maintain a parallel (very shallow) collection
> >         of puppet code that would, essentially,
> >         manage our "build slaves"
> >    #2 do #1 but automate it in such a way that
> >         the info actually gets harvested from
> >         spec/conrol files
> >
> >Thoughts?
> >
> >Thanks,
> >Roman.
> >
> 
> #1 is nice since it can deal with non-packaging issue. But it still
> require people to install and know how to deal with puppet. From a
> dev point of view we also need to remember to not use the latest
> features since some OS lag significantly in term of versions of
> puppet available.
> 
> #2 is also nice since it can be dealt with the usual set of tools.
> But it still requires some effort on users. Also some dependencies
> are not and will probably never be available as packages (ex: Oracle
> JDK).
> 
> I also don't think there is one and only one solution.
> My setup at home is quite different from the bigtop01 one.
> And once you are familiar enough with Apache Bigtop and know how to
> set it up, you may find options #1 and #2 probably not well adapted
> to your situation.
> 
> So this leads me to think about option #3: VMs.
> Tools like Boxgrinder and Oz can deal with multiple OSes and can
> create local images as well as push them to the cloud.
> The build would be repeatable and would not require any effort from
> the end user (apart maybe providing Oracle JDK, but that would have
> to be the case whatever the solution). Future contributors would
> just need to boot their VM to get started and hopefully ease
> contribution.
> 
> Thoughts?

(writing at the end of the last post feels totally unnatural, but for the
benefit of the future readers I will comply :)

I think boxed environments (like VMs) are an overkill. One of the issues here
(as Anatoli pointed out) is non-even support of all OSes. Say, BoxGrinder has
some issues with Ubuntu, etc.

I suppose a ol' proven toolchain type of environment that can automatically
bootstrap upon a fresh install (or update itself: think Maven model) and pull
whatever apps are required in whatever form they might exist. Say, JDK (or
Groovy or else) can be downloaded as a tarball, given that a more suitable
packaging isn't available, etc. Such approach would be more fluid than a
somewhat rigid VMs, that would have to be updated periodically, versioned,
etc.

Another benefit of a toolchain is that BigTop packages might have to
redistribute/wrap some of the tools for later use once a package is installed
on a customer's system.

So, in other words, #2 above (or some modification of it) looks more appealing to me.

--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author, and do
not necessarily represent the views of any company the author might be
affiliated with at the moment of writing.

Re: Bigtop environment setup

Posted by Konstantin Boudnik <co...@apache.org>.

On Fri, Sep 21, 2012 at 09:20PM, Bruno MahИ wrote:
> On 09/19/2012 06:32 PM, Roman Shaposhnik wrote:
> >On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >>I found that in order to avoid unnecessary build failures I need to quickly set up additional
> >>VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
> >>
> >>Any suggestions how it could be accelerated?
> >
> >You're raising a very good point, actually. In fact I've run into
> >this very issue while trying to configure an extra Jenkins
> >slave for bigtop01.
> >
> >Now, in the ideal world, all the build time dependencies
> >that we have would be packaged and we'd express the
> >fact that we depend on them via the very same packages
> >that we're maintaining. That's what RPM's BuildRequires:
> >and DEB's Build-Depends: fields are for -- to tell you
> >explicitly what's required to be installed on the system
> >before you can do the build of the package.
> >
> >Then you'd use the tools like:
> >     apt-get build-dep
> >     yum-builddep
> >to satisfy all the dependencies and you're done.
> >
> >Now, this works great in the environment where
> >you already have source packages which you
> >can give to apt-get build-dep/yum-builddep
> >
> >But Bigtop has to be bootstrapped from the source.
> >We can't assume existence of source packages.
> >
> >So here's the question to the bigger Bigtop
> >community -- how do we want to proceed to
> >manage repeatable build environments for
> >our packages?
> >
> >The options I see are:
> >    #1 maintain a parallel (very shallow) collection
> >         of puppet code that would, essentially,
> >         manage our "build slaves"
> >    #2 do #1 but automate it in such a way that
> >         the info actually gets harvested from
> >         spec/conrol files
> >
> >Thoughts?
> >
> >Thanks,
> >Roman.
> >
> 
> #1 is nice since it can deal with non-packaging issue. But it still
> require people to install and know how to deal with puppet. From a
> dev point of view we also need to remember to not use the latest
> features since some OS lag significantly in term of versions of
> puppet available.
> 
> #2 is also nice since it can be dealt with the usual set of tools.
> But it still requires some effort on users. Also some dependencies
> are not and will probably never be available as packages (ex: Oracle
> JDK).
> 
> I also don't think there is one and only one solution.
> My setup at home is quite different from the bigtop01 one.
> And once you are familiar enough with Apache Bigtop and know how to
> set it up, you may find options #1 and #2 probably not well adapted
> to your situation.
> 
> So this leads me to think about option #3: VMs.
> Tools like Boxgrinder and Oz can deal with multiple OSes and can
> create local images as well as push them to the cloud.
> The build would be repeatable and would not require any effort from
> the end user (apart maybe providing Oracle JDK, but that would have
> to be the case whatever the solution). Future contributors would
> just need to boot their VM to get started and hopefully ease
> contribution.
> 
> Thoughts?

(writing at the end of the last post feels totally unnatural, but for the
benefit of the future readers I will comply :)

I think boxed environments (like VMs) are an overkill. One of the issues here
(as Anatoli pointed out) is non-even support of all OSes. Say, BoxGrinder has
some issues with Ubuntu, etc.

I suppose a ol' proven toolchain type of environment that can automatically
bootstrap upon a fresh install (or update itself: think Maven model) and pull
whatever apps are required in whatever form they might exist. Say, JDK (or
Groovy or else) can be downloaded as a tarball, given that a more suitable
packaging isn't available, etc. Such approach would be more fluid than a
somewhat rigid VMs, that would have to be updated periodically, versioned,
etc.

Another benefit of a toolchain is that BigTop packages might have to
redistribute/wrap some of the tools for later use once a package is installed
on a customer's system.

So, in other words, #2 above (or some modification of it) looks more appealing to me.

--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author, and do
not necessarily represent the views of any company the author might be
affiliated with at the moment of writing.

Re: Bigtop environment setup

Posted by Konstantin Boudnik <co...@apache.org>.

On Mon, Sep 24, 2012 at 07:38PM, Sean Mackrory wrote:
> Personally I'm in favor of the idea of extracting dependencies from the
> control and spec files and building a script that will install the
> necessary tool chain. I am sure that as we embark on this we will discover
> dependencies that have not been declared in the package scripts - but I

It should work, except for build-time dependencies which aren't declared in
the bigtop packages (e.g. libtool, etc.). 

So, I'd vote for a separated list of build-time dependencies being maintained.

Cos

> think that is only more of a reason to do this exercise. Maintaining VMs
> may also be desirable, but writing a system to build and install the tool
> chain as a first step would make maintaining VMs much easier in the long
> term too.
> 
> I don't have a terribly strong opinion against any of the proposed ideas,
> however - so I'll gladly volunteer to contribute towards whatever seems to
> be the most accepted solution.

Re: Bigtop environment setup

Posted by Sean Mackrory <ma...@gmail.com>.

Personally I'm in favor of the idea of extracting dependencies from the
control and spec files and building a script that will install the
necessary tool chain. I am sure that as we embark on this we will discover
dependencies that have not been declared in the package scripts - but I
think that is only more of a reason to do this exercise. Maintaining VMs
may also be desirable, but writing a system to build and install the tool
chain as a first step would make maintaining VMs much easier in the long
term too.

I don't have a terribly strong opinion against any of the proposed ideas,
however - so I'll gladly volunteer to contribute towards whatever seems to
be the most accepted solution.

Re: Bigtop environment setup

Posted by Bruno Mahé <bm...@apache.org>.

On 09/19/2012 06:32 PM, Roman Shaposhnik wrote:
> On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>> I found that in order to avoid unnecessary build failures I need to quickly set up additional
>> VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
>>
>> Any suggestions how it could be accelerated?
>
> You're raising a very good point, actually. In fact I've run into
> this very issue while trying to configure an extra Jenkins
> slave for bigtop01.
>
> Now, in the ideal world, all the build time dependencies
> that we have would be packaged and we'd express the
> fact that we depend on them via the very same packages
> that we're maintaining. That's what RPM's BuildRequires:
> and DEB's Build-Depends: fields are for -- to tell you
> explicitly what's required to be installed on the system
> before you can do the build of the package.
>
> Then you'd use the tools like:
>      apt-get build-dep
>      yum-builddep
> to satisfy all the dependencies and you're done.
>
> Now, this works great in the environment where
> you already have source packages which you
> can give to apt-get build-dep/yum-builddep
>
> But Bigtop has to be bootstrapped from the source.
> We can't assume existence of source packages.
>
> So here's the question to the bigger Bigtop
> community -- how do we want to proceed to
> manage repeatable build environments for
> our packages?
>
> The options I see are:
>     #1 maintain a parallel (very shallow) collection
>          of puppet code that would, essentially,
>          manage our "build slaves"
>     #2 do #1 but automate it in such a way that
>          the info actually gets harvested from
>          spec/conrol files
>
> Thoughts?
>
> Thanks,
> Roman.
>

#1 is nice since it can deal with non-packaging issue. But it still 
require people to install and know how to deal with puppet. From a dev 
point of view we also need to remember to not use the latest features 
since some OS lag significantly in term of versions of puppet available.

#2 is also nice since it can be dealt with the usual set of tools. But 
it still requires some effort on users. Also some dependencies are not 
and will probably never be available as packages (ex: Oracle JDK).

I also don't think there is one and only one solution.
My setup at home is quite different from the bigtop01 one.
And once you are familiar enough with Apache Bigtop and know how to set 
it up, you may find options #1 and #2 probably not well adapted to your 
situation.

So this leads me to think about option #3: VMs.
Tools like Boxgrinder and Oz can deal with multiple OSes and can create 
local images as well as push them to the cloud.
The build would be repeatable and would not require any effort from the 
end user (apart maybe providing Oracle JDK, but that would have to be 
the case whatever the solution). Future contributors would just need to 
boot their VM to get started and hopefully ease contribution.

Thoughts?

Thanks,
Bruno

Re: Bigtop environment setup

Posted by Roman Shaposhnik <rv...@apache.org>.

On Wed, Sep 19, 2012 at 6:00 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> I found that in order to avoid unnecessary build failures I need to quickly set up additional
> VMs for Bigtop supported platforms. From my experience with Precise, I would say that it's a task that may take time.
>
> Any suggestions how it could be accelerated?

You're raising a very good point, actually. In fact I've run into
this very issue while trying to configure an extra Jenkins
slave for bigtop01.

Now, in the ideal world, all the build time dependencies
that we have would be packaged and we'd express the
fact that we depend on them via the very same packages
that we're maintaining. That's what RPM's BuildRequires:
and DEB's Build-Depends: fields are for -- to tell you
explicitly what's required to be installed on the system
before you can do the build of the package.

Then you'd use the tools like:
    apt-get build-dep
    yum-builddep
to satisfy all the dependencies and you're done.

Now, this works great in the environment where
you already have source packages which you
can give to apt-get build-dep/yum-builddep

But Bigtop has to be bootstrapped from the source.
We can't assume existence of source packages.

So here's the question to the bigger Bigtop
community -- how do we want to proceed to
manage repeatable build environments for
our packages?

The options I see are:
   #1 maintain a parallel (very shallow) collection
        of puppet code that would, essentially,
        manage our "build slaves"
   #2 do #1 but automate it in such a way that
        the info actually gets harvested from
        spec/conrol files

Thoughts?

Thanks,
Roman.