You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Konstantin Boudnik <co...@apache.org> on 2011/02/26 04:55:16 UTC

Build/test infrastructure

Looking at re-occurring build/test-patch problems on hadoop? build machines I
thought of a way to make them:
  a) all the same (configuration, installed software wise)
  b) have an effortless system to run upgrades/updates on all of them in a
  controlled fashion.

I would suggest to create Puppet configs (the exact content to be defined)
which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
is needed to be restored/updated a simple run of Puppet across the machines
or change in config and run of Puppet will do the magic for us.

If there are no objections from the community I can put together some
Puppet recipes which might be evolved as we go.

-- 
Take care,
	Cos
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

After all, it is only the mediocre who are always at their best.
		 Jean Giraudoux

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

On Mon, Feb 28, 2011 at 12:03, Rajiv Chittajallu <ra...@yahoo-inc.com> wrote:
> --- On Sun, 2/27/11, Konstantin Boudnik <co...@apache.org> wrote:
>
>> And now jailed environments with
>> Hudson - oh yes, lovely yinst all over again.
>
> From where did the 'yinst' come in to this discussion? Since you know something about a company's
> internal tools, don't assume every one posting from that company is talking about it.

True, it doesn't imply. It was my assumption based on
chrooting/jailing suggestion above. Of course, it doesn't means yinst
or anything related to it. Sorry if I have offended anyone by this
reference.

Cos

Re: Build/test infrastructure

Posted by Rajiv Chittajallu <ra...@yahoo-inc.com>.

--- On Sun, 2/27/11, Konstantin Boudnik <co...@apache.org> wrote:

> And now jailed environments with
> Hudson - oh yes, lovely yinst all over again.

>From where did the 'yinst' come in to this discussion? Since you know something about a company's internal tools, don't assume every one posting from that company is talking about it.

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

And now jailed environments with Hudson - oh yes, lovely yinst all over again.

Eric, the glorious plans about a framework to deploy thousand of nodes
by the click of a button are awesome indeed. But you apparently
ignoring the point:
  - this is about synchronization of 10 build machines at the level of
system packaging and operating system configuration.

With warmest regards,
  Cos

On Sun, Feb 27, 2011 at 01:32, Eric Yang <ey...@yahoo-inc.com> wrote:
> On 2/26/11 7:10 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>
>> On Sat, Feb 26, 2011 at 05:38PM, Eric Yang wrote:
>>
>> Furthermore, my Puppet knowledge is very limited and I am for sure no expert
>> in maven. I have some concern however:
>>   - how to provide privileged access
>>   - how and where to store host configurations (i.e. packages names, versions,
>>     which are gonna be different for difference OSes)
>>   - how to do native packages (see above example) and native dependency
>>     management from maven? With shell scripting?
>>   - how to maintain such a construct?
>> I can continue for a long time, but I'd rather want to solve an issue of
>> managing build host configurations/package sets in a most efficient and
>> sustainable manner.
>
> Hudson already supports chroot jail environment.  It is easy to setup
> privileged access in the jailed environment by giving the hudson running
> user sudo access to the jailed environment.  The host configuration can be
> mirrored into chroot environment with minimum set of shell commands..?
>
>> Sorry, but Maven + shell script can be called simplification only in a pipe
>> dream ;) Maven is a build tool. A relatively good one perhaps, but just a
>> build tool. Certainly everything can be done with a combination of a shell
>> scripting + tar balls and a little SSH sugar topping. But I'd rather use a
>> accurately designed and supported tool (Puppet, Chef, etc.).
>
> Maven supports various kind of remote deployment plugin.  Exec plugin with
> shell script is the easiest one to implement.  There are also plugin like
> cargo for more complex container deployment.  There is a plan to write a
> deployment framework for hadoop for large scale deployment.  This project is
> in planning stage.  The scope is deploying the entire hadoop stack (hdfs,
> mr, zookeeper, hbase, pig, hive, and chukwa) to multiple large clusters.
> Similar to what you are planning except at the scale that it would make
> sense to use puppet+mcollective.  We had done the evaluation, and found per
> puppet master would not scale well after 1800 nodes, and multilayer
> puppeteer spamming tree to cover all our nodes, is not ideal.  We choose to
> use chef-solo for edge deployment.  The rest of the details are to be worked
> out.  This is the reason that I am interested on the test environment that
> is being planned here.  It will be possible to use "to be invented"
> framework in the hudson.  This system is not going to be grown in ant/maven
> build script, hence it will be better to keep build system simple for now.
>
> Regards,
> Eric
>
>>
>> And BTW - Hadoop builds aren't maven'ized yet. Which renders most of the
>> argument a time waste until that problem is solved.
>>
>> At any rate, HADOOP-7157 is the JIRA for this. Please comment on it.
>>
>> Cos
>>
>>> Regards,
>>> Eric
>>>
>>>> You don't need to setup puppet muster in order to bounce a node. Puppet
>>>> works
>>>> i a client-only mode just as perfect.
>>>>
>>>> Cos
>>>>
>>>
>>>>> packaging only, but express my opinions on improving build system and
>>>>> making
>>>>> the system easier to reproduce.
>>>>>
>>>>> Regards,
>>>>> Eric
>>>>>
>>>>> On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>>>>
>>>>> This discussion isn't about build of the product nor about packaging
>>>>> of it. We are discussing patch validation and snapshot build
>>>>> infrastructure.
>>>>>
>>>>> On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
>>>>>> We should be very careful about the approach that we chosen for
>>>>>> build/packaging.  The current state of hadoop is coupled together due to
>>>>>> lack of standardized RPC format.  Once this issue is cleared, the
>>>>>> community will want to split hdfs and m/r into separated projects at some
>>>>>> point.  It may be better to ensure project is modularized, and work from
>>>>>> the same svn repository.  Maven is great for doing this, and most of the
>>>>>> build and scripts can be defined in pom.xml.  Deployment/test server
>>>>>> configuration can be pass in from hudson.  We should ensure that build and
>>>>>> deployment script do not further couple the project.
>>>>>>
>>>>>> Regards,
>>>>>> Eric
>>>>>>
>>>>>> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>>>>>
>>>>>> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
>>>>>>> +1.
>>>>>>>
>>>>>>> Once HADOOP-7106 is committed, I'd like to propose we create a directory
>>>>>>> at
>>>>>>> the same level of common/hdfs/mapreduce to hold build (and deploy) type
>>>>>>> scripts and files.  These would then get branches/tagged with the rest of
>>>>>>> the release.
>>>>>>
>>>>>> That makes sense, although I don't see changes of the host
>>>>>> configurations to happen very often.
>>>>>>
>>>>>> Cos
>>>>>>
>>>>>>> Nige
>>>>>>>
>>>>>>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>>>>>>>
>>>>>>>> Looking at re-occurring build/test-patch problems on hadoop? build
>>>>>>>> machines I
>>>>>>>> thought of a way to make them:
>>>>>>>>  a) all the same (configuration, installed software wise)
>>>>>>>>  b) have an effortless system to run upgrades/updates on all of them in
>>>>>>>> a
>>>>>>>>  controlled fashion.
>>>>>>>>
>>>>>>>> I would suggest to create Puppet configs (the exact content to be
>>>>>>>> defined)
>>>>>>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's
>>>>>>>> software
>>>>>>>> is needed to be restored/updated a simple run of Puppet across the
>>>>>>>> machines
>>>>>>>> or change in config and run of Puppet will do the magic for us.
>>>>>>>>
>>>>>>>> If there are no objections from the community I can put together some
>>>>>>>> Puppet recipes which might be evolved as we go.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Take care,
>>>>>>>>       Cos
>>>>>>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>>>>>>>
>>>>>>>> After all, it is only the mediocre who are always at their best.
>>>>>>>>                Jean Giraudoux
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>
>

Re: Build/test infrastructure

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Feb 26, 2011, at 7:10 PM, Konstantin Boudnik wrote:
> BTW, Puppet and Chef recipes are very widely used by all sorts of Ops and
> cluster management companies. Perhaps, Maven and shell too - I'm not in a
> position to make a judgement call. I'll let Y! Grid Ops to comment on it -
> they know everything about sizable clusters configuration management and tools
> for the job.

	I'm not in Y! Grid Ops, but from where I sit, it sounds like you are solving the wrong problem.

	Getting the build machines to be the same should mostly be a one-time issue.  If non-ops-types are messing with the package load, then that's a privilege and process problem not an automation problem.   The success of a configuration management utility is directly correlated to the amount of work that people are willing to put into using them.

Re: Build/test infrastructure

Posted by Eric Yang <ey...@yahoo-inc.com>.

On 2/26/11 7:10 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

> On Sat, Feb 26, 2011 at 05:38PM, Eric Yang wrote:
> 
> Furthermore, my Puppet knowledge is very limited and I am for sure no expert
> in maven. I have some concern however:
>   - how to provide privileged access
>   - how and where to store host configurations (i.e. packages names, versions,
>     which are gonna be different for difference OSes)
>   - how to do native packages (see above example) and native dependency
>     management from maven? With shell scripting?
>   - how to maintain such a construct?
> I can continue for a long time, but I'd rather want to solve an issue of
> managing build host configurations/package sets in a most efficient and
> sustainable manner.

Hudson already supports chroot jail environment.  It is easy to setup
privileged access in the jailed environment by giving the hudson running
user sudo access to the jailed environment.  The host configuration can be
mirrored into chroot environment with minimum set of shell commands.

> Sorry, but Maven + shell script can be called simplification only in a pipe
> dream ;) Maven is a build tool. A relatively good one perhaps, but just a
> build tool. Certainly everything can be done with a combination of a shell
> scripting + tar balls and a little SSH sugar topping. But I'd rather use a
> accurately designed and supported tool (Puppet, Chef, etc.).

Maven supports various kind of remote deployment plugin.  Exec plugin with
shell script is the easiest one to implement.  There are also plugin like
cargo for more complex container deployment.  There is a plan to write a
deployment framework for hadoop for large scale deployment.  This project is
in planning stage.  The scope is deploying the entire hadoop stack (hdfs,
mr, zookeeper, hbase, pig, hive, and chukwa) to multiple large clusters.
Similar to what you are planning except at the scale that it would make
sense to use puppet+mcollective.  We had done the evaluation, and found per
puppet master would not scale well after 1800 nodes, and multilayer
puppeteer spamming tree to cover all our nodes, is not ideal.  We choose to
use chef-solo for edge deployment.  The rest of the details are to be worked
out.  This is the reason that I am interested on the test environment that
is being planned here.  It will be possible to use "to be invented"
framework in the hudson.  This system is not going to be grown in ant/maven
build script, hence it will be better to keep build system simple for now.

Regards,
Eric

> 
> And BTW - Hadoop builds aren't maven'ized yet. Which renders most of the
> argument a time waste until that problem is solved.
> 
> At any rate, HADOOP-7157 is the JIRA for this. Please comment on it.
> 
> Cos
> 
>> Regards,
>> Eric
>> 
>>> You don't need to setup puppet muster in order to bounce a node. Puppet
>>> works
>>> i a client-only mode just as perfect.
>>> 
>>> Cos
>>> 
>> 
>>>> packaging only, but express my opinions on improving build system and
>>>> making
>>>> the system easier to reproduce.
>>>> 
>>>> Regards,
>>>> Eric
>>>> 
>>>> On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>>> 
>>>> This discussion isn't about build of the product nor about packaging
>>>> of it. We are discussing patch validation and snapshot build
>>>> infrastructure.
>>>> 
>>>> On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
>>>>> We should be very careful about the approach that we chosen for
>>>>> build/packaging.  The current state of hadoop is coupled together due to
>>>>> lack of standardized RPC format.  Once this issue is cleared, the
>>>>> community will want to split hdfs and m/r into separated projects at some
>>>>> point.  It may be better to ensure project is modularized, and work from
>>>>> the same svn repository.  Maven is great for doing this, and most of the
>>>>> build and scripts can be defined in pom.xml.  Deployment/test server
>>>>> configuration can be pass in from hudson.  We should ensure that build and
>>>>> deployment script do not further couple the project.
>>>>> 
>>>>> Regards,
>>>>> Eric
>>>>> 
>>>>> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>>>> 
>>>>> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
>>>>>> +1.
>>>>>> 
>>>>>> Once HADOOP-7106 is committed, I'd like to propose we create a directory
>>>>>> at
>>>>>> the same level of common/hdfs/mapreduce to hold build (and deploy) type
>>>>>> scripts and files.  These would then get branches/tagged with the rest of
>>>>>> the release.
>>>>> 
>>>>> That makes sense, although I don't see changes of the host
>>>>> configurations to happen very often.
>>>>> 
>>>>> Cos
>>>>> 
>>>>>> Nige
>>>>>> 
>>>>>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>>>>>> 
>>>>>>> Looking at re-occurring build/test-patch problems on hadoop? build
>>>>>>> machines I
>>>>>>> thought of a way to make them:
>>>>>>>  a) all the same (configuration, installed software wise)
>>>>>>>  b) have an effortless system to run upgrades/updates on all of them in
>>>>>>> a
>>>>>>>  controlled fashion.
>>>>>>> 
>>>>>>> I would suggest to create Puppet configs (the exact content to be
>>>>>>> defined)
>>>>>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's
>>>>>>> software
>>>>>>> is needed to be restored/updated a simple run of Puppet across the
>>>>>>> machines
>>>>>>> or change in config and run of Puppet will do the magic for us.
>>>>>>> 
>>>>>>> If there are no objections from the community I can put together some
>>>>>>> Puppet recipes which might be evolved as we go.
>>>>>>> 
>>>>>>> --
>>>>>>> Take care,
>>>>>>>       Cos
>>>>>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>>>>>> 
>>>>>>> After all, it is only the mediocre who are always at their best.
>>>>>>>                Jean Giraudoux
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

On Sat, Feb 26, 2011 at 05:38PM, Eric Yang wrote:
> On 2/26/11 4:34 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> > Apparently you are talking about something else, but I will bite...
> > 
> > On Sat, Feb 26, 2011 at 04:03PM, Eric Yang wrote:
> >> The proposed test automation process hasn't been thought through.   Apache
> >> Hudson has been setup to trigger patch builds, and setup pre-commit test
> >> environment.  Unfortunately, the current setup needs refinement with proper
> >> source code setup to make the builds working again.  Ideally, the test cycle
> >> have a commit build which runs simple unit tests, and a secondary build
> >> (every 24 hours) to run more through tests on multiple machine setup.  The
> >> test cluster should be cleansed after every secondary build, and ideally
> > 
> > We don't have a test cluster for Apache Hadoop validation. All I am focusing
> > on is build and patch validation infrastructure.
> 
> If the plan is using puppet agent without puppet master for configuring the
> system locally to test patch builds.  It is probably using the wrong tool
> for the job.  The value of puppet is to be able to configure heterogeneous
> services across machines in a consistent manner.  Is there plan to deploy

This is simply not the only value of the tool. It allows to maintain OS
configurations and system packages installation as easy for 1 host as for 1000
of them. Here's one of a many examples
    http://hstack.org/hstack-automated-deployment-using-puppet/
BTW, Puppet and Chef recipes are very widely used by all sorts of Ops and
cluster management companies. Perhaps, Maven and shell too - I'm not in a
position to make a judgement call. I'll let Y! Grid Ops to comment on it -
they know everything about sizable clusters configuration management and tools
for the job.

> multiple services across machines?  If the purpose is using puppet for
> config templates, ant or maven can do the job equally well.

Are you suggesting that it is easier to install patch and gcc packages of
version X.X.Z from a Maven build than from Puppet or Chef? If so - please cut
such a patch for the community to review. That'd be great a great
contribution!

Furthermore, my Puppet knowledge is very limited and I am for sure no expert
in maven. I have some concern however:
  - how to provide privileged access
  - how and where to store host configurations (i.e. packages names, versions,
    which are gonna be different for difference OSes)
  - how to do native packages (see above example) and native dependency
    management from maven? With shell scripting?
  - how to maintain such a construct?
I can continue for a long time, but I'd rather want to solve an issue of
managing build host configurations/package sets in a most efficient and
sustainable manner.

In a properly designed CI system build shouldn't be responsible to configure
its operation environment. It might and should check if everything is in place
(and crash/report accordingly). But if my Ant script goes around to download,
install and god forbids compiles some chunks of my OS I soon will end up with
an elegance of Python or some such.

> > Doing deployment from a build system is certainly possible, but is suboptimal
> > because it pollutes the build with HW/OS details, deployment scripts and such.
> > Besides, last time I've checked Hadoop was built by Ant.
> 
> Deploy to remote machine can be as simple as scp tarball, extra, apply
> template, and run it.  None of this requires puppet.  Instead of ant +
> puppet combination, the patch test build structure could be simplified by
> using maven + shell scripts.

Sorry, but Maven + shell script can be called simplification only in a pipe
dream ;) Maven is a build tool. A relatively good one perhaps, but just a
build tool. Certainly everything can be done with a combination of a shell
scripting + tar balls and a little SSH sugar topping. But I'd rather use a
accurately designed and supported tool (Puppet, Chef, etc.).

And BTW - Hadoop builds aren't maven'ized yet. Which renders most of the
argument a time waste until that problem is solved.

At any rate, HADOOP-7157 is the JIRA for this. Please comment on it.

Cos

> Regards,
> Eric
> 
> > You don't need to setup puppet muster in order to bounce a node. Puppet works
> > i a client-only mode just as perfect.
> > 
> > Cos
> > 
> 
> >> packaging only, but express my opinions on improving build system and making
> >> the system easier to reproduce.
> >> 
> >> Regards,
> >> Eric
> >> 
> >> On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> >> 
> >> This discussion isn't about build of the product nor about packaging
> >> of it. We are discussing patch validation and snapshot build
> >> infrastructure.
> >> 
> >> On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
> >>> We should be very careful about the approach that we chosen for
> >>> build/packaging.  The current state of hadoop is coupled together due to
> >>> lack of standardized RPC format.  Once this issue is cleared, the
> >>> community will want to split hdfs and m/r into separated projects at some
> >>> point.  It may be better to ensure project is modularized, and work from
> >>> the same svn repository.  Maven is great for doing this, and most of the
> >>> build and scripts can be defined in pom.xml.  Deployment/test server
> >>> configuration can be pass in from hudson.  We should ensure that build and
> >>> deployment script do not further couple the project.
> >>> 
> >>> Regards,
> >>> Eric
> >>> 
> >>> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
> >>> 
> >>> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
> >>>> +1.
> >>>> 
> >>>> Once HADOOP-7106 is committed, I'd like to propose we create a directory at
> >>>> the same level of common/hdfs/mapreduce to hold build (and deploy) type
> >>>> scripts and files.  These would then get branches/tagged with the rest of
> >>>> the release.
> >>> 
> >>> That makes sense, although I don't see changes of the host
> >>> configurations to happen very often.
> >>> 
> >>> Cos
> >>> 
> >>>> Nige
> >>>> 
> >>>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
> >>>> 
> >>>>> Looking at re-occurring build/test-patch problems on hadoop? build
> >>>>> machines I
> >>>>> thought of a way to make them:
> >>>>>  a) all the same (configuration, installed software wise)
> >>>>>  b) have an effortless system to run upgrades/updates on all of them in a
> >>>>>  controlled fashion.
> >>>>> 
> >>>>> I would suggest to create Puppet configs (the exact content to be defined)
> >>>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
> >>>>> is needed to be restored/updated a simple run of Puppet across the
> >>>>> machines
> >>>>> or change in config and run of Puppet will do the magic for us.
> >>>>> 
> >>>>> If there are no objections from the community I can put together some
> >>>>> Puppet recipes which might be evolved as we go.
> >>>>> 
> >>>>> --
> >>>>> Take care,
> >>>>>       Cos
> >>>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
> >>>>> 
> >>>>> After all, it is only the mediocre who are always at their best.
> >>>>>                Jean Giraudoux
> >>>> 
> >>>> 
> >>> 
> >>> 
> >> 
>

Re: Build/test infrastructure

Posted by Steve Loughran <st...@apache.org>.

On 27/02/11 01:38, Eric Yang wrote:
>    Instead of ant +
> puppet combination, the patch test build structure could be simplified by
> using maven + shell scripts.

The trouble with any scripted approach vs state driven CM tooling is 
their inability to cope with a starting point that doesn't match the 
scripts assumptions.

This isn't me picking on maven -different topic- more the CM problem. 
Lots of people do use shell scripts, and if your starting point is known 
-such as a golden VM image- it mostly works. Mostly. Even RPM 
installation (remember, it's scripts underneath) ensure that if you 
kickstart install and then update a thousand servers, they will end up 
executing their scripts in different orders, and so end up in different 
final states.

Of course, the weakness of CM tools is that they often end up being 
given goals that can't be consistently satisfied, so they cycle between 
the set of sub-optimal configurations.

Re: Build/test infrastructure

Posted by Eric Yang <ey...@yahoo-inc.com>.

On 2/26/11 4:34 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

> Apparently you are talking about something else, but I will bite...
> 
> On Sat, Feb 26, 2011 at 04:03PM, Eric Yang wrote:
>> The proposed test automation process hasn't been thought through.   Apache
>> Hudson has been setup to trigger patch builds, and setup pre-commit test
>> environment.  Unfortunately, the current setup needs refinement with proper
>> source code setup to make the builds working again.  Ideally, the test cycle
>> have a commit build which runs simple unit tests, and a secondary build
>> (every 24 hours) to run more through tests on multiple machine setup.  The
>> test cluster should be cleansed after every secondary build, and ideally
> 
> We don't have a test cluster for Apache Hadoop validation. All I am focusing
> on is build and patch validation infrastructure.

If the plan is using puppet agent without puppet master for configuring the
system locally to test patch builds.  It is probably using the wrong tool
for the job.  The value of puppet is to be able to configure heterogeneous
services across machines in a consistent manner.  Is there plan to deploy
multiple services across machines?  If the purpose is using puppet for
config templates, ant or maven can do the job equally well.

> Doing deployment from a build system is certainly possible, but is suboptimal
> because it pollutes the build with HW/OS details, deployment scripts and such.
> Besides, last time I've checked Hadoop was built by Ant.

Deploy to remote machine can be as simple as scp tarball, extra, apply
template, and run it.  None of this requires puppet.  Instead of ant +
puppet combination, the patch test build structure could be simplified by
using maven + shell scripts.

Regards,
Eric

> You don't need to setup puppet muster in order to bounce a node. Puppet works
> i a client-only mode just as perfect.
> 
> Cos
> 

>> packaging only, but express my opinions on improving build system and making
>> the system easier to reproduce.
>> 
>> Regards,
>> Eric
>> 
>> On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>> 
>> This discussion isn't about build of the product nor about packaging
>> of it. We are discussing patch validation and snapshot build
>> infrastructure.
>> 
>> On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
>>> We should be very careful about the approach that we chosen for
>>> build/packaging.  The current state of hadoop is coupled together due to
>>> lack of standardized RPC format.  Once this issue is cleared, the
>>> community will want to split hdfs and m/r into separated projects at some
>>> point.  It may be better to ensure project is modularized, and work from
>>> the same svn repository.  Maven is great for doing this, and most of the
>>> build and scripts can be defined in pom.xml.  Deployment/test server
>>> configuration can be pass in from hudson.  We should ensure that build and
>>> deployment script do not further couple the project.
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>> 
>>> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
>>>> +1.
>>>> 
>>>> Once HADOOP-7106 is committed, I'd like to propose we create a directory at
>>>> the same level of common/hdfs/mapreduce to hold build (and deploy) type
>>>> scripts and files.  These would then get branches/tagged with the rest of
>>>> the release.
>>> 
>>> That makes sense, although I don't see changes of the host
>>> configurations to happen very often.
>>> 
>>> Cos
>>> 
>>>> Nige
>>>> 
>>>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>>>> 
>>>>> Looking at re-occurring build/test-patch problems on hadoop? build
>>>>> machines I
>>>>> thought of a way to make them:
>>>>>  a) all the same (configuration, installed software wise)
>>>>>  b) have an effortless system to run upgrades/updates on all of them in a
>>>>>  controlled fashion.
>>>>> 
>>>>> I would suggest to create Puppet configs (the exact content to be defined)
>>>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
>>>>> is needed to be restored/updated a simple run of Puppet across the
>>>>> machines
>>>>> or change in config and run of Puppet will do the magic for us.
>>>>> 
>>>>> If there are no objections from the community I can put together some
>>>>> Puppet recipes which might be evolved as we go.
>>>>> 
>>>>> --
>>>>> Take care,
>>>>>       Cos
>>>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>>>> 
>>>>> After all, it is only the mediocre who are always at their best.
>>>>>                Jean Giraudoux
>>>> 
>>>> 
>>> 
>>> 
>>

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

Apparently you are talking about something else, but I will bite...

On Sat, Feb 26, 2011 at 04:03PM, Eric Yang wrote:
> The proposed test automation process hasn't been thought through.   Apache
> Hudson has been setup to trigger patch builds, and setup pre-commit test
> environment.  Unfortunately, the current setup needs refinement with proper
> source code setup to make the builds working again.  Ideally, the test cycle
> have a commit build which runs simple unit tests, and a secondary build
> (every 24 hours) to run more through tests on multiple machine setup.  The
> test cluster should be cleansed after every secondary build, and ideally

We don't have a test cluster for Apache Hadoop validation. All I am focusing
on is build and patch validation infrastructure.

> this is done in a sandbox approach.  However, I don't think bring in puppet
> environment setup is making the test system reproducible.  Consequently, it

If a specialized and highly scalable host configuration system such as Puppet
doesn't guarantee configuration reproducibility then I am not sure what else
will. Say, Y! uses that proprietary Igor environment exactly for these
purposes but of course it is highly coupled with yinst format and can't be
used anywhere else.

> may be better to have the cluster test setup as part of scripts in maven
> integration test phase. This will enable any hadoop developer to setup his

Doing deployment from a build system is certainly possible, but is suboptimal
because it pollutes the build with HW/OS details, deployment scripts and such.
Besides, last time I've checked Hadoop was built by Ant.

> own test cluster without setup puppet master. I am not fixated on build and

You don't need to setup puppet muster in order to bounce a node. Puppet works
i a client-only mode just as perfect. 

Cos

> packaging only, but express my opinions on improving build system and making
> the system easier to reproduce.
> 
> Regards,
> Eric
> 
> On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> This discussion isn't about build of the product nor about packaging
> of it. We are discussing patch validation and snapshot build
> infrastructure.
> 
> On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
> > We should be very careful about the approach that we chosen for
> > build/packaging.  The current state of hadoop is coupled together due to
> > lack of standardized RPC format.  Once this issue is cleared, the
> > community will want to split hdfs and m/r into separated projects at some
> > point.  It may be better to ensure project is modularized, and work from
> > the same svn repository.  Maven is great for doing this, and most of the
> > build and scripts can be defined in pom.xml.  Deployment/test server
> > configuration can be pass in from hudson.  We should ensure that build and
> > deployment script do not further couple the project.
> >
> > Regards,
> > Eric
> >
> > On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
> >
> > On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
> >> +1.
> >>
> >> Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.
> >
> > That makes sense, although I don't see changes of the host
> > configurations to happen very often.
> >
> > Cos
> >
> >> Nige
> >>
> >> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
> >>
> >>> Looking at re-occurring build/test-patch problems on hadoop? build machines I
> >>> thought of a way to make them:
> >>>  a) all the same (configuration, installed software wise)
> >>>  b) have an effortless system to run upgrades/updates on all of them in a
> >>>  controlled fashion.
> >>>
> >>> I would suggest to create Puppet configs (the exact content to be defined)
> >>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
> >>> is needed to be restored/updated a simple run of Puppet across the machines
> >>> or change in config and run of Puppet will do the magic for us.
> >>>
> >>> If there are no objections from the community I can put together some
> >>> Puppet recipes which might be evolved as we go.
> >>>
> >>> --
> >>> Take care,
> >>>       Cos
> >>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
> >>>
> >>> After all, it is only the mediocre who are always at their best.
> >>>                Jean Giraudoux
> >>
> >>
> >
> >
>

Re: Build/test infrastructure

Posted by Eric Yang <ey...@yahoo-inc.com>.

The proposed test automation process hasn't been thought through.   Apache Hudson has been setup to trigger patch builds, and setup pre-commit test environment.  Unfortunately, the current setup needs refinement with proper source code setup to make the builds working again.  Ideally, the test cycle have a commit build which runs simple unit tests, and a secondary build (every 24 hours) to run more through tests on multiple machine setup.  The test cluster should be cleansed after every secondary build, and ideally this is done in a sandbox approach.  However, I don't think bring in puppet environment setup is making the test system reproducible.  Consequently, it may be better to have the cluster test setup as part of scripts in maven integration test phase.  This will enable any hadoop developer to setup his own test cluster without setup puppet master.  I am not fixated on build and packaging only, but express my opinions on improving build system and making the system easier to reproduce.

Regards,
Eric

On 2/26/11 2:18 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

This discussion isn't about build of the product nor about packaging
of it. We are discussing patch validation and snapshot build
infrastructure.

On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
> We should be very careful about the approach that we chosen for build/packaging.  The current state of hadoop is coupled together due to lack of standardized RPC format.  Once this issue is cleared, the community will want to split hdfs and m/r into separated projects at some point.  It may be better to ensure project is modularized, and work from the same svn repository.  Maven is great for doing this, and most of the build and scripts can be defined in pom.xml.  Deployment/test server configuration can be pass in from hudson.  We should ensure that build and deployment script do not further couple the project.
>
> Regards,
> Eric
>
> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
>
> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
>> +1.
>>
>> Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.
>
> That makes sense, although I don't see changes of the host
> configurations to happen very often.
>
> Cos
>
>> Nige
>>
>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>>
>>> Looking at re-occurring build/test-patch problems on hadoop? build machines I
>>> thought of a way to make them:
>>>  a) all the same (configuration, installed software wise)
>>>  b) have an effortless system to run upgrades/updates on all of them in a
>>>  controlled fashion.
>>>
>>> I would suggest to create Puppet configs (the exact content to be defined)
>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
>>> is needed to be restored/updated a simple run of Puppet across the machines
>>> or change in config and run of Puppet will do the magic for us.
>>>
>>> If there are no objections from the community I can put together some
>>> Puppet recipes which might be evolved as we go.
>>>
>>> --
>>> Take care,
>>>       Cos
>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>>
>>> After all, it is only the mediocre who are always at their best.
>>>                Jean Giraudoux
>>
>>
>
>

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

This discussion isn't about build of the product nor about packaging
of it. We are discussing patch validation and snapshot build
infrastructure.

On Sat, Feb 26, 2011 at 12:43, Eric Yang <ey...@yahoo-inc.com> wrote:
> We should be very careful about the approach that we chosen for build/packaging.  The current state of hadoop is coupled together due to lack of standardized RPC format.  Once this issue is cleared, the community will want to split hdfs and m/r into separated projects at some point.  It may be better to ensure project is modularized, and work from the same svn repository.  Maven is great for doing this, and most of the build and scripts can be defined in pom.xml.  Deployment/test server configuration can be pass in from hudson.  We should ensure that build and deployment script do not further couple the project.
>
> Regards,
> Eric
>
> On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:
>
> On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
>> +1.
>>
>> Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.
>
> That makes sense, although I don't see changes of the host
> configurations to happen very often.
>
> Cos
>
>> Nige
>>
>> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>>
>>> Looking at re-occurring build/test-patch problems on hadoop? build machines I
>>> thought of a way to make them:
>>>  a) all the same (configuration, installed software wise)
>>>  b) have an effortless system to run upgrades/updates on all of them in a
>>>  controlled fashion.
>>>
>>> I would suggest to create Puppet configs (the exact content to be defined)
>>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
>>> is needed to be restored/updated a simple run of Puppet across the machines
>>> or change in config and run of Puppet will do the magic for us.
>>>
>>> If there are no objections from the community I can put together some
>>> Puppet recipes which might be evolved as we go.
>>>
>>> --
>>> Take care,
>>>       Cos
>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>>
>>> After all, it is only the mediocre who are always at their best.
>>>                Jean Giraudoux
>>
>>
>
>

Re: Build/test infrastructure

Posted by Eric Yang <ey...@yahoo-inc.com>.

We should be very careful about the approach that we chosen for build/packaging.  The current state of hadoop is coupled together due to lack of standardized RPC format.  Once this issue is cleared, the community will want to split hdfs and m/r into separated projects at some point.  It may be better to ensure project is modularized, and work from the same svn repository.  Maven is great for doing this, and most of the build and scripts can be defined in pom.xml.  Deployment/test server configuration can be pass in from hudson.  We should ensure that build and deployment script do not further couple the project.

Regards,
Eric

On 2/26/11 11:14 AM, "Konstantin Boudnik" <co...@apache.org> wrote:

On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
> +1.
>
> Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.

That makes sense, although I don't see changes of the host
configurations to happen very often.

Cos

> Nige
>
> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>
>> Looking at re-occurring build/test-patch problems on hadoop? build machines I
>> thought of a way to make them:
>>  a) all the same (configuration, installed software wise)
>>  b) have an effortless system to run upgrades/updates on all of them in a
>>  controlled fashion.
>>
>> I would suggest to create Puppet configs (the exact content to be defined)
>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
>> is needed to be restored/updated a simple run of Puppet across the machines
>> or change in config and run of Puppet will do the magic for us.
>>
>> If there are no objections from the community I can put together some
>> Puppet recipes which might be evolved as we go.
>>
>> --
>> Take care,
>>       Cos
>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>
>> After all, it is only the mediocre who are always at their best.
>>                Jean Giraudoux
>
>

Re: Build/test infrastructure

Posted by Konstantin Boudnik <co...@apache.org>.

On Fri, Feb 25, 2011 at 23:47, Nigel Daley <nd...@mac.com> wrote:
> +1.
>
> Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.

That makes sense, although I don't see changes of the host
configurations to happen very often.

Cos

> Nige
>
> On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:
>
>> Looking at re-occurring build/test-patch problems on hadoop? build machines I
>> thought of a way to make them:
>>  a) all the same (configuration, installed software wise)
>>  b) have an effortless system to run upgrades/updates on all of them in a
>>  controlled fashion.
>>
>> I would suggest to create Puppet configs (the exact content to be defined)
>> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
>> is needed to be restored/updated a simple run of Puppet across the machines
>> or change in config and run of Puppet will do the magic for us.
>>
>> If there are no objections from the community I can put together some
>> Puppet recipes which might be evolved as we go.
>>
>> --
>> Take care,
>>       Cos
>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>
>> After all, it is only the mediocre who are always at their best.
>>                Jean Giraudoux
>
>

Re: Build/test infrastructure

Posted by Nigel Daley <nd...@mac.com>.

+1.  

Once HADOOP-7106 is committed, I'd like to propose we create a directory at the same level of common/hdfs/mapreduce to hold build (and deploy) type scripts and files.  These would then get branches/tagged with the rest of the release.

Nige

On Feb 25, 2011, at 7:55 PM, Konstantin Boudnik wrote:

> Looking at re-occurring build/test-patch problems on hadoop? build machines I
> thought of a way to make them:
>  a) all the same (configuration, installed software wise)
>  b) have an effortless system to run upgrades/updates on all of them in a
>  controlled fashion.
> 
> I would suggest to create Puppet configs (the exact content to be defined)
> which we'll be checked in SCM (e.g. SVN), Whenever a build host's software
> is needed to be restored/updated a simple run of Puppet across the machines
> or change in config and run of Puppet will do the magic for us.
> 
> If there are no objections from the community I can put together some
> Puppet recipes which might be evolved as we go.
> 
> -- 
> Take care,
> 	Cos
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
> 
> After all, it is only the mediocre who are always at their best.
> 		 Jean Giraudoux