You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@cloudera.com> on 2015/06/07 05:43:37 UTC

[DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Sorry for the resend. I figured this deserves a [DISCUSS] flag.



On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Hi Folks!
>
> After working on test-patch with other folks for the last few months, I
> think we've reached the point where we can make the fastest progress
> towards the goal of a general use pre-commit patch tester by spinning
> things into a project focused on just that. I think we have a mature enough
> code base and a sufficient fledgling community, so I'm going to put
> together a tlp proposal.
>
> Thanks for the feedback thus far from use within Hadoop. I hope we can
> continue to make things more useful.
>
> -Sean
>
> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> HBase's dev-support folder is where the scripts and support files live.
>> We've only recently started adding anything to the maven builds that's
>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>> add in more if we ran into the same permissions problems y'all are having.
>>
>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>> we don't properly back this up anywhere, we just notify each other of
>> changes on a particular mail thread[3].
>>
>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>> read because I just finished fixing "mvn site" running out of permgen)
>> [3]: http://s.apache.org/NT0
>>
>>
>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
>> wrote:
>>
>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>> HBase
>>> repo?  Is there any additional context we need to be aware of?
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>
>>> >+dev@hbase
>>> >
>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>> >them
>>> >more robust. From what I can tell our stuff started off as an earlier
>>> >version of what Hadoop uses for testing.
>>> >
>>> >Folks on either side open to an experiment of combining our precommit
>>> >check
>>> >tooling? In principle we should be looking for the same kinds of things.
>>> >
>>> >Naturally we'll still need different jenkins jobs to handle different
>>> >resource needs and we'd need to figure out where stuff eventually lives,
>>> >but that could come later.
>>> >
>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>> cnauroth@hortonworks.com>
>>> >wrote:
>>> >
>>> >> The only thing I'm aware of is the failOnError option:
>>> >>
>>> >>
>>> >>
>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>> >>rs
>>> >> .html
>>> >>
>>> >>
>>> >> I prefer that we don't disable this, because ignoring different kinds
>>> of
>>> >> failures could leave our build directories in an indeterminate state.
>>> >>For
>>> >> example, we could end up with an old class file on the classpath for
>>> >>test
>>> >> runs that was supposedly deleted.
>>> >>
>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>> failure
>>> >> by placing a file where the code expects to see a directory.  That
>>> might
>>> >> even let us enable some of these tests that are skipped on Windows,
>>> >> because Windows allows access for the owner even after permissions
>>> have
>>> >> been stripped.
>>> >>
>>> >> Chris Nauroth
>>> >> Hortonworks
>>> >> http://hortonworks.com/
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>> >>
>>> >> >Is there a maven plugin or setting we can use to simply remove
>>> >> >directories that have no executable permissions on them?  Clearly we
>>> >> >have the permission to do this from a technical point of view (since
>>> >> >we created the directories as the jenkins user), it's simply that the
>>> >> >code refuses to do it.
>>> >> >
>>> >> >Otherwise I guess we can just fix those tests...
>>> >> >
>>> >> >Colin
>>> >> >
>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>> >> >>
>>> >> >> In HDFS-7722:
>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>> >> >>TearDown().
>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>> >> >>
>>> >> >> Also I ran mvn test several times on my machine and all tests
>>> passed.
>>> >> >>
>>> >> >> However, since in DiskChecker#checkDirAccess():
>>> >> >>
>>> >> >> private static void checkDirAccess(File dir) throws
>>> >>DiskErrorException {
>>> >> >>   if (!dir.isDirectory()) {
>>> >> >>     throw new DiskErrorException("Not a directory: "
>>> >> >>                                  + dir.toString());
>>> >> >>   }
>>> >> >>
>>> >> >>   checkAccessByFileMethods(dir);
>>> >> >> }
>>> >> >>
>>> >> >> One potentially safer alternative is replacing data dir with a
>>> >>regular
>>> >> >> file to stimulate disk failures.
>>> >> >>
>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>> >> >><cn...@hortonworks.com> wrote:
>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>> >>permissions
>>> >> >>>from
>>> >> >>> directories like the one Colin mentioned to simulate disk failures
>>> >>at
>>> >> >>>data
>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>> >>to be
>>> >> >>> doing the necessary work to restore executable permissions at the
>>> >>end
>>> >> >>>of
>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>> >> >>>changes
>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>> >> >>>though.  I
>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>> these
>>> >> >>>test
>>> >> >>> suites.
>>> >> >>>
>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>> >>died
>>> >> >>> after removing executable permissions but before restoring them.
>>> >>That
>>> >> >>> always would have been a weakness of these test suites, regardless
>>> >>of
>>> >> >>>any
>>> >> >>> recent changes.
>>> >> >>>
>>> >> >>> Chris Nauroth
>>> >> >>> Hortonworks
>>> >> >>> http://hortonworks.com/
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>> >> >>>
>>> >> >>>>Hey Colin,
>>> >> >>>>
>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>> >>with
>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>> being
>>> >> >>>>set in
>>> >> >>>>those directories by our unit tests which are precluding those
>>> files
>>> >> >>>>from
>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>> >>should
>>> >> >>>>expect this to keep happening until we can fix the test in
>>> question
>>> >>to
>>> >> >>>>properly clean up after itself.
>>> >> >>>>
>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>> >>sent
>>> >> >>>>me
>>> >> >>>>this info:
>>> >> >>>>
>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> >>
>>>
>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>> >>>>>>/
>>> >> >>>>has
>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>> >>9:32
>>> >> >>>>UTC
>>> >> >>>>on March 5th."
>>> >> >>>>
>>> >> >>>>--
>>> >> >>>>Aaron T. Myers
>>> >> >>>>Software Engineer, Cloudera
>>> >> >>>>
>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>> >><cm...@apache.org>
>>> >> >>>>wrote:
>>> >> >>>>
>>> >> >>>>> Hi all,
>>> >> >>>>>
>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>> any
>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>> them
>>> >> >>>>>seem
>>> >> >>>>> to be failing with some variant of this message:
>>> >> >>>>>
>>> >> >>>>> [ERROR] Failed to execute goal
>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>> >>(default-clean)
>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>> delete
>>> >> >>>>>
>>> >> >>>>>
>>> >>
>>>
>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>> >>>>>>>fs
>>> >> >>>>>-pr
>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>> >> >>>>> -> [Help 1]
>>> >> >>>>>
>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>> >> >>>>> permissions?
>>> >> >>>>>
>>> >> >>>>> Colin
>>> >> >>>>>
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Lei (Eddy) Xu
>>> >> >> Software Engineer, Cloudera
>>> >>
>>> >>
>>> >
>>> >
>>> >--
>>> >Sean
>>>
>>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

[DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Also +1 for the idea in general.  Sound like a lot of projects could
benefit.

How would we have to add testing to our pre commit testing?

Each project has likely customized their own core commit scripts (multiple
Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
soliciit interest from other projects who already have fancy precommit
tests beyond HBase/Hadoop too.

Jon

On Tuesday, June 16, 2015, Jonathan Hsieh <jo...@cloudera.com> wrote:

> How about "harbinger" for a name :)
>
> On Sunday, June 7, 2015, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>> > Hi Folks!
>> >
>> > After working on test-patch with other folks for the last few months, I
>> > think we've reached the point where we can make the fastest progress
>> > towards the goal of a general use pre-commit patch tester by spinning
>> > things into a project focused on just that. I think we have a mature
>> enough
>> > code base and a sufficient fledgling community, so I'm going to put
>> > together a tlp proposal.
>> >
>> > Thanks for the feedback thus far from use within Hadoop. I hope we can
>> > continue to make things more useful.
>> >
>> > -Sean
>> >
>> > On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>> wrote:
>> >
>> >> HBase's dev-support folder is where the scripts and support files live.
>> >> We've only recently started adding anything to the maven builds that's
>> >> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
>> I'd
>> >> add in more if we ran into the same permissions problems y'all are
>> having.
>> >>
>> >> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>> >> we don't properly back this up anywhere, we just notify each other of
>> >> changes on a particular mail thread[3].
>> >>
>> >> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>> >> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>> >> read because I just finished fixing "mvn site" running out of permgen)
>> >> [3]: http://s.apache.org/NT0
>> >>
>> >>
>> >> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>> cnauroth@hortonworks.com>
>> >> wrote:
>> >>
>> >>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>> >>> HBase
>> >>> repo?  Is there any additional context we need to be aware of?
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>> >>>
>> >>> >+dev@hbase
>> >>> >
>> >>> >HBase has recently been cleaning up our precommit jenkins jobs to
>> make
>> >>> >them
>> >>> >more robust. From what I can tell our stuff started off as an earlier
>> >>> >version of what Hadoop uses for testing.
>> >>> >
>> >>> >Folks on either side open to an experiment of combining our precommit
>> >>> >check
>> >>> >tooling? In principle we should be looking for the same kinds of
>> things.
>> >>> >
>> >>> >Naturally we'll still need different jenkins jobs to handle different
>> >>> >resource needs and we'd need to figure out where stuff eventually
>> lives,
>> >>> >but that could come later.
>> >>> >
>> >>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>> >>> cnauroth@hortonworks.com>
>> >>> >wrote:
>> >>> >
>> >>> >> The only thing I'm aware of is the failOnError option:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>> >>rs
>> >>> >> .html
>> >>> >>
>> >>> >>
>> >>> >> I prefer that we don't disable this, because ignoring different
>> kinds
>> >>> of
>> >>> >> failures could leave our build directories in an indeterminate
>> state.
>> >>> >>For
>> >>> >> example, we could end up with an old class file on the classpath
>> for
>> >>> >>test
>> >>> >> runs that was supposedly deleted.
>> >>> >>
>> >>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> >>> failure
>> >>> >> by placing a file where the code expects to see a directory.  That
>> >>> might
>> >>> >> even let us enable some of these tests that are skipped on Windows,
>> >>> >> because Windows allows access for the owner even after permissions
>> >>> have
>> >>> >> been stripped.
>> >>> >>
>> >>> >> Chris Nauroth
>> >>> >> Hortonworks
>> >>> >> http://hortonworks.com/
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>> wrote:
>> >>> >>
>> >>> >> >Is there a maven plugin or setting we can use to simply remove
>> >>> >> >directories that have no executable permissions on them?  Clearly
>> we
>> >>> >> >have the permission to do this from a technical point of view
>> (since
>> >>> >> >we created the directories as the jenkins user), it's simply that
>> the
>> >>> >> >code refuses to do it.
>> >>> >> >
>> >>> >> >Otherwise I guess we can just fix those tests...
>> >>> >> >
>> >>> >> >Colin
>> >>> >> >
>> >>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>> >> >>
>> >>> >> >> In HDFS-7722:
>> >>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>> >> >>TearDown().
>> >>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>> clause.
>> >>> >> >>
>> >>> >> >> Also I ran mvn test several times on my machine and all tests
>> >>> passed.
>> >>> >> >>
>> >>> >> >> However, since in DiskChecker#checkDirAccess():
>> >>> >> >>
>> >>> >> >> private static void checkDirAccess(File dir) throws
>> >>> >>DiskErrorException {
>> >>> >> >>   if (!dir.isDirectory()) {
>> >>> >> >>     throw new DiskErrorException("Not a directory: "
>> >>> >> >>                                  + dir.toString());
>> >>> >> >>   }
>> >>> >> >>
>> >>> >> >>   checkAccessByFileMethods(dir);
>> >>> >> >> }
>> >>> >> >>
>> >>> >> >> One potentially safer alternative is replacing data dir with a
>> >>> >>regular
>> >>> >> >> file to stimulate disk failures.
>> >>> >> >>
>> >>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >>> >> >><cn...@hortonworks.com> wrote:
>> >>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>> >>permissions
>> >>> >> >>>from
>> >>> >> >>> directories like the one Colin mentioned to simulate disk
>> failures
>> >>> >>at
>> >>> >> >>>data
>> >>> >> >>> nodes.  I reviewed the code for all of those, and they all
>> appear
>> >>> >>to be
>> >>> >> >>> doing the necessary work to restore executable permissions at
>> the
>> >>> >>end
>> >>> >> >>>of
>> >>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>> makes
>> >>> >> >>>changes
>> >>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>> >> >>>though.  I
>> >>> >> >>> don¹t know if there are other uncommitted patches that changed
>> >>> these
>> >>> >> >>>test
>> >>> >> >>> suites.
>> >>> >> >>>
>> >>> >> >>> I suppose it¹s also possible that the JUnit process
>> unexpectedly
>> >>> >>died
>> >>> >> >>> after removing executable permissions but before restoring
>> them.
>> >>> >>That
>> >>> >> >>> always would have been a weakness of these test suites,
>> regardless
>> >>> >>of
>> >>> >> >>>any
>> >>> >> >>> recent changes.
>> >>> >> >>>
>> >>> >> >>> Chris Nauroth
>> >>> >> >>> Hortonworks
>> >>> >> >>> http://hortonworks.com/
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>> wrote:
>> >>> >> >>>
>> >>> >> >>>>Hey Colin,
>> >>> >> >>>>
>> >>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>> going on
>> >>> >>with
>> >>> >> >>>>these boxes. He took a look and concluded that some perms are
>> >>> being
>> >>> >> >>>>set in
>> >>> >> >>>>those directories by our unit tests which are precluding those
>> >>> files
>> >>> >> >>>>from
>> >>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>> we
>> >>> >>should
>> >>> >> >>>>expect this to keep happening until we can fix the test in
>> >>> question
>> >>> >>to
>> >>> >> >>>>properly clean up after itself.
>> >>> >> >>>>
>> >>> >> >>>>To help narrow down which commit it was that started this,
>> Andrew
>> >>> >>sent
>> >>> >> >>>>me
>> >>> >> >>>>this info:
>> >>> >> >>>>
>> >>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>> >>
>> >>>
>> >>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>> >>>>>>/
>> >>> >> >>>>has
>> >>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>> since
>> >>> >>9:32
>> >>> >> >>>>UTC
>> >>> >> >>>>on March 5th."
>> >>> >> >>>>
>> >>> >> >>>>--
>> >>> >> >>>>Aaron T. Myers
>> >>> >> >>>>Software Engineer, Cloudera
>> >>> >> >>>>
>> >>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >>> >><cm...@apache.org>
>> >>> >> >>>>wrote:
>> >>> >> >>>>
>> >>> >> >>>>> Hi all,
>> >>> >> >>>>>
>> >>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>> find
>> >>> any
>> >>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>> >>> them
>> >>> >> >>>>>seem
>> >>> >> >>>>> to be failing with some variant of this message:
>> >>> >> >>>>>
>> >>> >> >>>>> [ERROR] Failed to execute goal
>> >>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>> >>(default-clean)
>> >>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>> >>> delete
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >>
>> >>>
>> >>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>> >>>>>>>fs
>> >>> >> >>>>>-pr
>> >>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>> >> >>>>> -> [Help 1]
>> >>> >> >>>>>
>> >>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>> wrong
>> >>> >> >>>>> permissions?
>> >>> >> >>>>>
>> >>> >> >>>>> Colin
>> >>> >> >>>>>
>> >>> >> >>>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Lei (Eddy) Xu
>> >>> >> >> Software Engineer, Cloudera
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >--
>> >>> >Sean
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Sean
>> >>
>> >
>> >
>> >
>> > --
>> > Sean
>> >
>>
>>
>>
>> --
>> Sean
>>
>
>
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // jon@cloudera.com // @jmhsieh
>
>
>

-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Jonathan Hsieh <jo...@cloudera.com>.
How about "harbinger" for a name :)

On Sunday, June 7, 2015, Sean Busbey <bu...@cloudera.com> wrote:

> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>
>
>
> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
>
> > Hi Folks!
> >
> > After working on test-patch with other folks for the last few months, I
> > think we've reached the point where we can make the fastest progress
> > towards the goal of a general use pre-commit patch tester by spinning
> > things into a project focused on just that. I think we have a mature
> enough
> > code base and a sufficient fledgling community, so I'm going to put
> > together a tlp proposal.
> >
> > Thanks for the feedback thus far from use within Hadoop. I hope we can
> > continue to make things more useful.
> >
> > -Sean
> >
> > On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >
> >> HBase's dev-support folder is where the scripts and support files live.
> >> We've only recently started adding anything to the maven builds that's
> >> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
> I'd
> >> add in more if we ran into the same permissions problems y'all are
> having.
> >>
> >> There's also our precommit job itself, though it isn't large[2]. AFAIK,
> >> we don't properly back this up anywhere, we just notify each other of
> >> changes on a particular mail thread[3].
> >>
> >> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> >> read because I just finished fixing "mvn site" running out of permgen)
> >> [3]: http://s.apache.org/NT0
> >>
> >>
> >> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com <javascript:;>>
> >> wrote:
> >>
> >>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>> HBase
> >>> repo?  Is there any additional context we need to be aware of?
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com <javascript:;>>
> wrote:
> >>>
> >>> >+dev@hbase
> >>> >
> >>> >HBase has recently been cleaning up our precommit jenkins jobs to make
> >>> >them
> >>> >more robust. From what I can tell our stuff started off as an earlier
> >>> >version of what Hadoop uses for testing.
> >>> >
> >>> >Folks on either side open to an experiment of combining our precommit
> >>> >check
> >>> >tooling? In principle we should be looking for the same kinds of
> things.
> >>> >
> >>> >Naturally we'll still need different jenkins jobs to handle different
> >>> >resource needs and we'd need to figure out where stuff eventually
> lives,
> >>> >but that could come later.
> >>> >
> >>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>> cnauroth@hortonworks.com <javascript:;>>
> >>> >wrote:
> >>> >
> >>> >> The only thing I'm aware of is the failOnError option:
> >>> >>
> >>> >>
> >>> >>
> >>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>> >>rs
> >>> >> .html
> >>> >>
> >>> >>
> >>> >> I prefer that we don't disable this, because ignoring different
> kinds
> >>> of
> >>> >> failures could leave our build directories in an indeterminate
> state.
> >>> >>For
> >>> >> example, we could end up with an old class file on the classpath for
> >>> >>test
> >>> >> runs that was supposedly deleted.
> >>> >>
> >>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>> failure
> >>> >> by placing a file where the code expects to see a directory.  That
> >>> might
> >>> >> even let us enable some of these tests that are skipped on Windows,
> >>> >> because Windows allows access for the owner even after permissions
> >>> have
> >>> >> been stripped.
> >>> >>
> >>> >> Chris Nauroth
> >>> >> Hortonworks
> >>> >> http://hortonworks.com/
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
> <javascript:;>> wrote:
> >>> >>
> >>> >> >Is there a maven plugin or setting we can use to simply remove
> >>> >> >directories that have no executable permissions on them?  Clearly
> we
> >>> >> >have the permission to do this from a technical point of view
> (since
> >>> >> >we created the directories as the jenkins user), it's simply that
> the
> >>> >> >code refuses to do it.
> >>> >> >
> >>> >> >Otherwise I guess we can just fix those tests...
> >>> >> >
> >>> >> >Colin
> >>> >> >
> >>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
> <javascript:;>> wrote:
> >>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>> >> >>
> >>> >> >> In HDFS-7722:
> >>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>> >> >>TearDown().
> >>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>> >> >>
> >>> >> >> Also I ran mvn test several times on my machine and all tests
> >>> passed.
> >>> >> >>
> >>> >> >> However, since in DiskChecker#checkDirAccess():
> >>> >> >>
> >>> >> >> private static void checkDirAccess(File dir) throws
> >>> >>DiskErrorException {
> >>> >> >>   if (!dir.isDirectory()) {
> >>> >> >>     throw new DiskErrorException("Not a directory: "
> >>> >> >>                                  + dir.toString());
> >>> >> >>   }
> >>> >> >>
> >>> >> >>   checkAccessByFileMethods(dir);
> >>> >> >> }
> >>> >> >>
> >>> >> >> One potentially safer alternative is replacing data dir with a
> >>> >>regular
> >>> >> >> file to stimulate disk failures.
> >>> >> >>
> >>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>> >> >><cnauroth@hortonworks.com <javascript:;>> wrote:
> >>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>> >>permissions
> >>> >> >>>from
> >>> >> >>> directories like the one Colin mentioned to simulate disk
> failures
> >>> >>at
> >>> >> >>>data
> >>> >> >>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>> >>to be
> >>> >> >>> doing the necessary work to restore executable permissions at
> the
> >>> >>end
> >>> >> >>>of
> >>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>> >> >>>changes
> >>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>> >> >>>though.  I
> >>> >> >>> don¹t know if there are other uncommitted patches that changed
> >>> these
> >>> >> >>>test
> >>> >> >>> suites.
> >>> >> >>>
> >>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>> >>died
> >>> >> >>> after removing executable permissions but before restoring them.
> >>> >>That
> >>> >> >>> always would have been a weakness of these test suites,
> regardless
> >>> >>of
> >>> >> >>>any
> >>> >> >>> recent changes.
> >>> >> >>>
> >>> >> >>> Chris Nauroth
> >>> >> >>> Hortonworks
> >>> >> >>> http://hortonworks.com/
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> <javascript:;>> wrote:
> >>> >> >>>
> >>> >> >>>>Hey Colin,
> >>> >> >>>>
> >>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
> on
> >>> >>with
> >>> >> >>>>these boxes. He took a look and concluded that some perms are
> >>> being
> >>> >> >>>>set in
> >>> >> >>>>those directories by our unit tests which are precluding those
> >>> files
> >>> >> >>>>from
> >>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
> >>> >>should
> >>> >> >>>>expect this to keep happening until we can fix the test in
> >>> question
> >>> >>to
> >>> >> >>>>properly clean up after itself.
> >>> >> >>>>
> >>> >> >>>>To help narrow down which commit it was that started this,
> Andrew
> >>> >>sent
> >>> >> >>>>me
> >>> >> >>>>this info:
> >>> >> >>>>
> >>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>> >>
> >>>
> >>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>> >>>>>>/
> >>> >> >>>>has
> >>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
> since
> >>> >>9:32
> >>> >> >>>>UTC
> >>> >> >>>>on March 5th."
> >>> >> >>>>
> >>> >> >>>>--
> >>> >> >>>>Aaron T. Myers
> >>> >> >>>>Software Engineer, Cloudera
> >>> >> >>>>
> >>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>> >><cmccabe@apache.org <javascript:;>>
> >>> >> >>>>wrote:
> >>> >> >>>>
> >>> >> >>>>> Hi all,
> >>> >> >>>>>
> >>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
> >>> any
> >>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>> them
> >>> >> >>>>>seem
> >>> >> >>>>> to be failing with some variant of this message:
> >>> >> >>>>>
> >>> >> >>>>> [ERROR] Failed to execute goal
> >>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>> >>(default-clean)
> >>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>> delete
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >>
> >>>
> >>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>> >>>>>>>fs
> >>> >> >>>>>-pr
> >>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>> >> >>>>> -> [Help 1]
> >>> >> >>>>>
> >>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>> >> >>>>> permissions?
> >>> >> >>>>>
> >>> >> >>>>> Colin
> >>> >> >>>>>
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Lei (Eddy) Xu
> >>> >> >> Software Engineer, Cloudera
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> >--
> >>> >Sean
> >>>
> >>>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>


-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nigel Daley <ni...@gmail.com>.
+1 for a separate project and going directly to TLP if possible (as Hadoop itself did when split out of Nutch)

+1 for having language discussions once it's a TLP :-)

Cheers,
Nigel

> On Jun 22, 2015, at 1:55 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
> 
> In my opinion, historically, test-patch hasn't needed to be cross platform
> because the only first class development environment for Hadoop has been
> Linux. Growing beyond this could absolutely be one focus of Yetus should
> that be a consensus goal of the community. The seed of the project, though,
> is today's test-patch, which is implemented in bash. That's where we are
> today. Language "discussions" (smile) can and should be forward looking.
> 
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
>> 
>> -n
>> 
>> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>> I'm going to try responding to several things at once here, so
>> apologies
>>> if
>>>> I miss anyone and sorry for the long email. :)
>>>> 
>>>> 
>>>> On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
>> stevel@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> I think it's good to have a general build/test process projects can
>>> share,
>>>>> so +1 to pulling it out. You should get help from others.
>>>>> 
>>>>> regarding incubation, it is a lot of work, especially for something
>>> that's
>>>>> more of an in-house tool than an artifact to release and redistribute.
>>>>> 
>>>>> You can't just use apache labs or the build project's repo to work on
>>> this?
>>>>> 
>>>>> if you do want to incubate, we may want to nominate the hadoop project
>>> as
>>>>> the monitoring PMC, rather than incubator@.
>>>>> 
>>>>> -steve
>>>> Important note: we're proposing a board resolution that would directly
>>> pull
>>>> this code base out into a new TLP; there'd be no incubator, we'd just
>>>> continue building community and start making releases.
>>>> 
>>>> The proposed PMC believes the tooling we're talking about has direct
>>>> applicability to projects well outside of the ASF. Lot's of other open
>>>> source projects run on community contributions and have a general need
>>> for
>>>> better QA tools. Given that problem set and the presence of a community
>>>> working to solve it, there's no reason this needs to be treated as an
>>>> in-house build project. We certainly want to be useful to ASF projects
>>> and
>>>> getting them on-board given our current optimization for ASF infra will
>>>> certainly be easier, but we're not limited to that (and our current
>>>> prerequisites, a CI tool and jira or github, are pretty broadly
>>> available).
>>>> 
>>>> 
>>>>> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Since we're tossing out names, how about Apache Bootstrap? It's a
>>>>> meta-project to help other projects get off the ground, after all.
>>>> 
>>>> 
>>>> There's already a web development framework named Bootstrap[1]. It's
>> also
>>>> used by several ASF projects, so I think it best to avoid the
>> confusion.
>>>> 
>>>> The name is, of course, up to the proposed PMC. As a bit of background,
>>> the
>>>> current name Yetus fulfills Allen's desire to have something shell
>>> related
>>>> and my desire to have a project that starts with Y (there are currently
>>> no
>>>> ASF projects that start with Y). The universe of names that fill in
>> these
>>>> two is very small, AFAICT. I did a brief suitability search and didn't
>>> find
>>>> any blockers.
>>>> 
>>>> 
>>>> On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Since a couple of people have brought it up:
>>>>> 
>>>>>        I think the release question is probably one of the big
>> question
>>>>> marks.  Other than tar balls, how does something like this actually
>> get
>>>>> used downstream?
>>>>> 
>>>>>        For test-patch, in particular, I have a few thoughts on this:
>>>>> 
>>>>> Short term:
>>>>> 
>>>>>        * Projects that want to move RIGHT NOW would modify their
>>> Jenkins
>>>>> jobs to checkout from the Yetus repo (preferably at a well known tag
>> or
>>>>> branch) in one directory and their project repo in another directory.
>>> Then
>>>>> it’s just a matter of passing the correct flags to test-patch.  This
>> is
>>>>> pretty much how I’ve been personally running test-patch for about 6
>>> months
>>>>> now. Under Jenkins, we’ve seen this work with NiFi (incubating)
>> already.
>>>>> 
>>>>>        * Create a stub version of test-patch that projects could
>> check
>>>>> into their repo, replacing the existing test-patch.  This stub version
>>>>> would git clone from either ASF or github and then execute test-patch
>>>>> accordingly on demand.  With the correct smarts, it could make sure it
>>> has
>>>>> a cached version to prevent continual clones.
>>>>> 
>>>>> Longer term:
>>>>> 
>>>>>        * I’ve been toying with the idea of (ab)using Java repos and
>>>>> packaging as a transportation layer, either in addition or in
>>> combination
>>>>> with something like a maven plugin.  Something like this would clearly
>>> be
>>>>> better for offline usage and/or to lower the network traffic.
>>>> 
>>>> It's important that the project follow ASF guidelines on publishing
>>>> releases[2]. So long as we publish releases to the distribution
>>> directory I
>>>> think we'd be fine having folks work off of the corresponding tag. I'm
>>> not
>>>> sure there's much reason to do that, however. A Jenkins job can just as
>>>> easily grab a release tarball as a git tag and we're not talking about
>> a
>>>> large amount of stuff. The kind of build setup that Chris N mentioned
>> is
>>>> also totally doable now that there's a build description DSL for
>>> Jenkins[3].
>>>> 
>>>> For individual developers, I don't see any reason we can't package
>> things
>>>> up as a tool, similar to how findbugs or shellcheck work. We can make
>> OS
>>>> packages (or homebrew for OS X) if we want to make stand alone
>>> installation
>>>> on developer machines real easy. Those same packages could be installed
>>> on
>>>> the ASF build machines, provided some ASF project wanted to make use of
>>>> Yetus.
>>>> 
>>>> Having releases will incur some turn around time for when folks want to
>>> see
>>>> fixes, but that's a trade off around release cadence we can work out
>>> longer
>>>> term.
>>>> 
>>>> I would like to have one or two projects that can work off of the
>>> bleeding
>>>> edge repo, but we'd have to get that to mesh with foundation policy. My
>>> gut
>>>> tells me we should be able to come up with an agreement that makes
>> such a
>>>> project "part of the development community" but the specifics will have
>>> to
>>>> be worked out.
>>>> 
>>>> 
>>>>> On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
>>>> wrote:
>>>> 
>>>>> How would we have to add testing to our pre commit testing?
>>>>> 
>>>>> Each project has likely customized their own core commit scripts
>>> (multiple
>>>>> Jvms versions, checkstyle, javadoc exceptions etc.). We should
>> probably
>>>>> soliciit interest from other projects who already have fancy precommit
>>>>> tests beyond HBase/Hadoop too.
>>>> 
>>>> I'm not sure if Allen's response above answered this for you. The
>> current
>>>> state of test-patch has a plugin system for adding new tests. It is
>> also
>>>> customizable to handle project idiosyncrasies (atleast the ones we've
>>> seen
>>>> so far, like unspecified module dependencies, multiple projects per
>> repo,
>>>> or use of ant). Releases will naturally include docs on how to leverage
>>>> those to customize what a particular project needs tested pre-commit.
>>>> 
>>>> Given the nature of the work Yetus is hoping to enable, I think it's
>> safe
>>>> to assume the project community is going to be doing a fair bit of
>>> outreach
>>>> to help show projects outside of HBase/Hadoop how things can work for
>>> them.
>>>> 
>>>> 
>>>> 
>>>> [1]: http://getbootstrap.com/
>>>> [2]: http://www.apache.org/dev/release.html
>>>> [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
>>>> 
>>>> --
>>>> Sean
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nigel Daley <ni...@gmail.com>.
+1 for a separate project and going directly to TLP if possible (as Hadoop itself did when split out of Nutch)

+1 for having language discussions once it's a TLP :-)

Cheers,
Nigel

> On Jun 22, 2015, at 1:55 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
> 
> In my opinion, historically, test-patch hasn't needed to be cross platform
> because the only first class development environment for Hadoop has been
> Linux. Growing beyond this could absolutely be one focus of Yetus should
> that be a consensus goal of the community. The seed of the project, though,
> is today's test-patch, which is implemented in bash. That's where we are
> today. Language "discussions" (smile) can and should be forward looking.
> 
> 
>> On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>> 
>> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>> 
>>> You mentioned that "most of our project will be focused on shell
>>> scripts" I guess based on the existing test-patch code.  Allen did a
>>> lot of good work in this area recently.  I am curious if you evaluated
>>> languages such as Python or Node.js for this use-case.  Shell scripts
>>> can get a little... tricky beyond a certain size.  On the other hand,
>>> if we are standardizing on shell, which shell and which version?
>>> Perhaps bash 3.5+?
>> 
>> I'll also add that shell is not helpful for a cross-platform set of
>> tooling. I recently added a daemon to Apache Phoenix; an explicit
>> requirement was Windows support. I ended up implementing a solution in
>> python because that environment is platform-agnostic and still systems-y
>> enough. I think this is something this project should seriously consider.
>> 
>> -n
>> 
>> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>> I'm going to try responding to several things at once here, so
>> apologies
>>> if
>>>> I miss anyone and sorry for the long email. :)
>>>> 
>>>> 
>>>> On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
>> stevel@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> I think it's good to have a general build/test process projects can
>>> share,
>>>>> so +1 to pulling it out. You should get help from others.
>>>>> 
>>>>> regarding incubation, it is a lot of work, especially for something
>>> that's
>>>>> more of an in-house tool than an artifact to release and redistribute.
>>>>> 
>>>>> You can't just use apache labs or the build project's repo to work on
>>> this?
>>>>> 
>>>>> if you do want to incubate, we may want to nominate the hadoop project
>>> as
>>>>> the monitoring PMC, rather than incubator@.
>>>>> 
>>>>> -steve
>>>> Important note: we're proposing a board resolution that would directly
>>> pull
>>>> this code base out into a new TLP; there'd be no incubator, we'd just
>>>> continue building community and start making releases.
>>>> 
>>>> The proposed PMC believes the tooling we're talking about has direct
>>>> applicability to projects well outside of the ASF. Lot's of other open
>>>> source projects run on community contributions and have a general need
>>> for
>>>> better QA tools. Given that problem set and the presence of a community
>>>> working to solve it, there's no reason this needs to be treated as an
>>>> in-house build project. We certainly want to be useful to ASF projects
>>> and
>>>> getting them on-board given our current optimization for ASF infra will
>>>> certainly be easier, but we're not limited to that (and our current
>>>> prerequisites, a CI tool and jira or github, are pretty broadly
>>> available).
>>>> 
>>>> 
>>>>> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Since we're tossing out names, how about Apache Bootstrap? It's a
>>>>> meta-project to help other projects get off the ground, after all.
>>>> 
>>>> 
>>>> There's already a web development framework named Bootstrap[1]. It's
>> also
>>>> used by several ASF projects, so I think it best to avoid the
>> confusion.
>>>> 
>>>> The name is, of course, up to the proposed PMC. As a bit of background,
>>> the
>>>> current name Yetus fulfills Allen's desire to have something shell
>>> related
>>>> and my desire to have a project that starts with Y (there are currently
>>> no
>>>> ASF projects that start with Y). The universe of names that fill in
>> these
>>>> two is very small, AFAICT. I did a brief suitability search and didn't
>>> find
>>>> any blockers.
>>>> 
>>>> 
>>>> On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Since a couple of people have brought it up:
>>>>> 
>>>>>        I think the release question is probably one of the big
>> question
>>>>> marks.  Other than tar balls, how does something like this actually
>> get
>>>>> used downstream?
>>>>> 
>>>>>        For test-patch, in particular, I have a few thoughts on this:
>>>>> 
>>>>> Short term:
>>>>> 
>>>>>        * Projects that want to move RIGHT NOW would modify their
>>> Jenkins
>>>>> jobs to checkout from the Yetus repo (preferably at a well known tag
>> or
>>>>> branch) in one directory and their project repo in another directory.
>>> Then
>>>>> it’s just a matter of passing the correct flags to test-patch.  This
>> is
>>>>> pretty much how I’ve been personally running test-patch for about 6
>>> months
>>>>> now. Under Jenkins, we’ve seen this work with NiFi (incubating)
>> already.
>>>>> 
>>>>>        * Create a stub version of test-patch that projects could
>> check
>>>>> into their repo, replacing the existing test-patch.  This stub version
>>>>> would git clone from either ASF or github and then execute test-patch
>>>>> accordingly on demand.  With the correct smarts, it could make sure it
>>> has
>>>>> a cached version to prevent continual clones.
>>>>> 
>>>>> Longer term:
>>>>> 
>>>>>        * I’ve been toying with the idea of (ab)using Java repos and
>>>>> packaging as a transportation layer, either in addition or in
>>> combination
>>>>> with something like a maven plugin.  Something like this would clearly
>>> be
>>>>> better for offline usage and/or to lower the network traffic.
>>>> 
>>>> It's important that the project follow ASF guidelines on publishing
>>>> releases[2]. So long as we publish releases to the distribution
>>> directory I
>>>> think we'd be fine having folks work off of the corresponding tag. I'm
>>> not
>>>> sure there's much reason to do that, however. A Jenkins job can just as
>>>> easily grab a release tarball as a git tag and we're not talking about
>> a
>>>> large amount of stuff. The kind of build setup that Chris N mentioned
>> is
>>>> also totally doable now that there's a build description DSL for
>>> Jenkins[3].
>>>> 
>>>> For individual developers, I don't see any reason we can't package
>> things
>>>> up as a tool, similar to how findbugs or shellcheck work. We can make
>> OS
>>>> packages (or homebrew for OS X) if we want to make stand alone
>>> installation
>>>> on developer machines real easy. Those same packages could be installed
>>> on
>>>> the ASF build machines, provided some ASF project wanted to make use of
>>>> Yetus.
>>>> 
>>>> Having releases will incur some turn around time for when folks want to
>>> see
>>>> fixes, but that's a trade off around release cadence we can work out
>>> longer
>>>> term.
>>>> 
>>>> I would like to have one or two projects that can work off of the
>>> bleeding
>>>> edge repo, but we'd have to get that to mesh with foundation policy. My
>>> gut
>>>> tells me we should be able to come up with an agreement that makes
>> such a
>>>> project "part of the development community" but the specifics will have
>>> to
>>>> be worked out.
>>>> 
>>>> 
>>>>> On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
>>>> wrote:
>>>> 
>>>>> How would we have to add testing to our pre commit testing?
>>>>> 
>>>>> Each project has likely customized their own core commit scripts
>>> (multiple
>>>>> Jvms versions, checkstyle, javadoc exceptions etc.). We should
>> probably
>>>>> soliciit interest from other projects who already have fancy precommit
>>>>> tests beyond HBase/Hadoop too.
>>>> 
>>>> I'm not sure if Allen's response above answered this for you. The
>> current
>>>> state of test-patch has a plugin system for adding new tests. It is
>> also
>>>> customizable to handle project idiosyncrasies (atleast the ones we've
>>> seen
>>>> so far, like unspecified module dependencies, multiple projects per
>> repo,
>>>> or use of ant). Releases will naturally include docs on how to leverage
>>>> those to customize what a particular project needs tested pre-commit.
>>>> 
>>>> Given the nature of the work Yetus is hoping to enable, I think it's
>> safe
>>>> to assume the project community is going to be doing a fair bit of
>>> outreach
>>>> to help show projects outside of HBase/Hadoop how things can work for
>>> them.
>>>> 
>>>> 
>>>> 
>>>> [1]: http://getbootstrap.com/
>>>> [2]: http://www.apache.org/dev/release.html
>>>> [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
>>>> 
>>>> --
>>>> Sean
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Andrew Purtell <ap...@apache.org>.
On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>

In my opinion, historically, test-patch hasn't needed to be cross platform
because the only first class development environment for Hadoop has been
Linux. Growing beyond this could absolutely be one focus of Yetus should
that be a consensus goal of the community. The seed of the project, though,
is today's test-patch, which is implemented in bash. That's where we are
today. Language "discussions" (smile) can and should be forward looking.


On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>
> -n
>
> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > > I'm going to try responding to several things at once here, so
> apologies
> > if
> > > I miss anyone and sorry for the long email. :)
> > >
> > >
> > > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
> stevel@hortonworks.com>
> > > wrote:
> > >
> > >> I think it's good to have a general build/test process projects can
> > share,
> > >> so +1 to pulling it out. You should get help from others.
> > >>
> > >> regarding incubation, it is a lot of work, especially for something
> > that's
> > >> more of an in-house tool than an artifact to release and redistribute.
> > >>
> > >> You can't just use apache labs or the build project's repo to work on
> > this?
> > >>
> > >> if you do want to incubate, we may want to nominate the hadoop project
> > as
> > >> the monitoring PMC, rather than incubator@.
> > >>
> > >> -steve
> > >>
> > >>
> > > Important note: we're proposing a board resolution that would directly
> > pull
> > > this code base out into a new TLP; there'd be no incubator, we'd just
> > > continue building community and start making releases.
> > >
> > > The proposed PMC believes the tooling we're talking about has direct
> > > applicability to projects well outside of the ASF. Lot's of other open
> > > source projects run on community contributions and have a general need
> > for
> > > better QA tools. Given that problem set and the presence of a community
> > > working to solve it, there's no reason this needs to be treated as an
> > > in-house build project. We certainly want to be useful to ASF projects
> > and
> > > getting them on-board given our current optimization for ASF infra will
> > > certainly be easier, but we're not limited to that (and our current
> > > prerequisites, a CI tool and jira or github, are pretty broadly
> > available).
> > >
> > >
> > > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
> > wrote:
> > >
> > >>
> > >> Since we're tossing out names, how about Apache Bootstrap? It's a
> > >> meta-project to help other projects get off the ground, after all.
> > >>
> > >
> > >
> > > There's already a web development framework named Bootstrap[1]. It's
> also
> > > used by several ASF projects, so I think it best to avoid the
> confusion.
> > >
> > > The name is, of course, up to the proposed PMC. As a bit of background,
> > the
> > > current name Yetus fulfills Allen's desire to have something shell
> > related
> > > and my desire to have a project that starts with Y (there are currently
> > no
> > > ASF projects that start with Y). The universe of names that fill in
> these
> > > two is very small, AFAICT. I did a brief suitability search and didn't
> > find
> > > any blockers.
> > >
> > >
> > >  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
> > >  wrote:
> > >
> > >>
> > >> Since a couple of people have brought it up:
> > >>
> > >>         I think the release question is probably one of the big
> question
> > >> marks.  Other than tar balls, how does something like this actually
> get
> > >> used downstream?
> > >>
> > >>         For test-patch, in particular, I have a few thoughts on this:
> > >>
> > >> Short term:
> > >>
> > >>         * Projects that want to move RIGHT NOW would modify their
> > Jenkins
> > >> jobs to checkout from the Yetus repo (preferably at a well known tag
> or
> > >> branch) in one directory and their project repo in another directory.
> > Then
> > >> it’s just a matter of passing the correct flags to test-patch.  This
> is
> > >> pretty much how I’ve been personally running test-patch for about 6
> > months
> > >> now. Under Jenkins, we’ve seen this work with NiFi (incubating)
> already.
> > >>
> > >>         * Create a stub version of test-patch that projects could
> check
> > >> into their repo, replacing the existing test-patch.  This stub version
> > >> would git clone from either ASF or github and then execute test-patch
> > >> accordingly on demand.  With the correct smarts, it could make sure it
> > has
> > >> a cached version to prevent continual clones.
> > >>
> > >> Longer term:
> > >>
> > >>         * I’ve been toying with the idea of (ab)using Java repos and
> > >> packaging as a transportation layer, either in addition or in
> > combination
> > >> with something like a maven plugin.  Something like this would clearly
> > be
> > >> better for offline usage and/or to lower the network traffic.
> > >>
> > >
> > > It's important that the project follow ASF guidelines on publishing
> > > releases[2]. So long as we publish releases to the distribution
> > directory I
> > > think we'd be fine having folks work off of the corresponding tag. I'm
> > not
> > > sure there's much reason to do that, however. A Jenkins job can just as
> > > easily grab a release tarball as a git tag and we're not talking about
> a
> > > large amount of stuff. The kind of build setup that Chris N mentioned
> is
> > > also totally doable now that there's a build description DSL for
> > Jenkins[3].
> > >
> > > For individual developers, I don't see any reason we can't package
> things
> > > up as a tool, similar to how findbugs or shellcheck work. We can make
> OS
> > > packages (or homebrew for OS X) if we want to make stand alone
> > installation
> > > on developer machines real easy. Those same packages could be installed
> > on
> > > the ASF build machines, provided some ASF project wanted to make use of
> > > Yetus.
> > >
> > > Having releases will incur some turn around time for when folks want to
> > see
> > > fixes, but that's a trade off around release cadence we can work out
> > longer
> > > term.
> > >
> > > I would like to have one or two projects that can work off of the
> > bleeding
> > > edge repo, but we'd have to get that to mesh with foundation policy. My
> > gut
> > > tells me we should be able to come up with an agreement that makes
> such a
> > > project "part of the development community" but the specifics will have
> > to
> > > be worked out.
> > >
> > >
> > > On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > >> How would we have to add testing to our pre commit testing?
> > >>
> > >> Each project has likely customized their own core commit scripts
> > (multiple
> > >> Jvms versions, checkstyle, javadoc exceptions etc.). We should
> probably
> > >> soliciit interest from other projects who already have fancy precommit
> > >> tests beyond HBase/Hadoop too.
> > >>
> > >
> > > I'm not sure if Allen's response above answered this for you. The
> current
> > > state of test-patch has a plugin system for adding new tests. It is
> also
> > > customizable to handle project idiosyncrasies (atleast the ones we've
> > seen
> > > so far, like unspecified module dependencies, multiple projects per
> repo,
> > > or use of ant). Releases will naturally include docs on how to leverage
> > > those to customize what a particular project needs tested pre-commit.
> > >
> > > Given the nature of the work Yetus is hoping to enable, I think it's
> safe
> > > to assume the project community is going to be doing a fair bit of
> > outreach
> > > to help show projects outside of HBase/Hadoop how things can work for
> > them.
> > >
> > >
> > >
> > > [1]: http://getbootstrap.com/
> > > [2]: http://www.apache.org/dev/release.html
> > > [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
> > >
> > > --
> > > Sean
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Andrew Purtell <ap...@apache.org>.
On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>

In my opinion, historically, test-patch hasn't needed to be cross platform
because the only first class development environment for Hadoop has been
Linux. Growing beyond this could absolutely be one focus of Yetus should
that be a consensus goal of the community. The seed of the project, though,
is today's test-patch, which is implemented in bash. That's where we are
today. Language "discussions" (smile) can and should be forward looking.


On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
> wrote:
>
> > You mentioned that "most of our project will be focused on shell
> > scripts" I guess based on the existing test-patch code.  Allen did a
> > lot of good work in this area recently.  I am curious if you evaluated
> > languages such as Python or Node.js for this use-case.  Shell scripts
> > can get a little... tricky beyond a certain size.  On the other hand,
> > if we are standardizing on shell, which shell and which version?
> > Perhaps bash 3.5+?
> >
>
> I'll also add that shell is not helpful for a cross-platform set of
> tooling. I recently added a daemon to Apache Phoenix; an explicit
> requirement was Windows support. I ended up implementing a solution in
> python because that environment is platform-agnostic and still systems-y
> enough. I think this is something this project should seriously consider.
>
> -n
>
> On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > > I'm going to try responding to several things at once here, so
> apologies
> > if
> > > I miss anyone and sorry for the long email. :)
> > >
> > >
> > > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <
> stevel@hortonworks.com>
> > > wrote:
> > >
> > >> I think it's good to have a general build/test process projects can
> > share,
> > >> so +1 to pulling it out. You should get help from others.
> > >>
> > >> regarding incubation, it is a lot of work, especially for something
> > that's
> > >> more of an in-house tool than an artifact to release and redistribute.
> > >>
> > >> You can't just use apache labs or the build project's repo to work on
> > this?
> > >>
> > >> if you do want to incubate, we may want to nominate the hadoop project
> > as
> > >> the monitoring PMC, rather than incubator@.
> > >>
> > >> -steve
> > >>
> > >>
> > > Important note: we're proposing a board resolution that would directly
> > pull
> > > this code base out into a new TLP; there'd be no incubator, we'd just
> > > continue building community and start making releases.
> > >
> > > The proposed PMC believes the tooling we're talking about has direct
> > > applicability to projects well outside of the ASF. Lot's of other open
> > > source projects run on community contributions and have a general need
> > for
> > > better QA tools. Given that problem set and the presence of a community
> > > working to solve it, there's no reason this needs to be treated as an
> > > in-house build project. We certainly want to be useful to ASF projects
> > and
> > > getting them on-board given our current optimization for ASF infra will
> > > certainly be easier, but we're not limited to that (and our current
> > > prerequisites, a CI tool and jira or github, are pretty broadly
> > available).
> > >
> > >
> > > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
> > wrote:
> > >
> > >>
> > >> Since we're tossing out names, how about Apache Bootstrap? It's a
> > >> meta-project to help other projects get off the ground, after all.
> > >>
> > >
> > >
> > > There's already a web development framework named Bootstrap[1]. It's
> also
> > > used by several ASF projects, so I think it best to avoid the
> confusion.
> > >
> > > The name is, of course, up to the proposed PMC. As a bit of background,
> > the
> > > current name Yetus fulfills Allen's desire to have something shell
> > related
> > > and my desire to have a project that starts with Y (there are currently
> > no
> > > ASF projects that start with Y). The universe of names that fill in
> these
> > > two is very small, AFAICT. I did a brief suitability search and didn't
> > find
> > > any blockers.
> > >
> > >
> > >  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
> > >  wrote:
> > >
> > >>
> > >> Since a couple of people have brought it up:
> > >>
> > >>         I think the release question is probably one of the big
> question
> > >> marks.  Other than tar balls, how does something like this actually
> get
> > >> used downstream?
> > >>
> > >>         For test-patch, in particular, I have a few thoughts on this:
> > >>
> > >> Short term:
> > >>
> > >>         * Projects that want to move RIGHT NOW would modify their
> > Jenkins
> > >> jobs to checkout from the Yetus repo (preferably at a well known tag
> or
> > >> branch) in one directory and their project repo in another directory.
> > Then
> > >> it’s just a matter of passing the correct flags to test-patch.  This
> is
> > >> pretty much how I’ve been personally running test-patch for about 6
> > months
> > >> now. Under Jenkins, we’ve seen this work with NiFi (incubating)
> already.
> > >>
> > >>         * Create a stub version of test-patch that projects could
> check
> > >> into their repo, replacing the existing test-patch.  This stub version
> > >> would git clone from either ASF or github and then execute test-patch
> > >> accordingly on demand.  With the correct smarts, it could make sure it
> > has
> > >> a cached version to prevent continual clones.
> > >>
> > >> Longer term:
> > >>
> > >>         * I’ve been toying with the idea of (ab)using Java repos and
> > >> packaging as a transportation layer, either in addition or in
> > combination
> > >> with something like a maven plugin.  Something like this would clearly
> > be
> > >> better for offline usage and/or to lower the network traffic.
> > >>
> > >
> > > It's important that the project follow ASF guidelines on publishing
> > > releases[2]. So long as we publish releases to the distribution
> > directory I
> > > think we'd be fine having folks work off of the corresponding tag. I'm
> > not
> > > sure there's much reason to do that, however. A Jenkins job can just as
> > > easily grab a release tarball as a git tag and we're not talking about
> a
> > > large amount of stuff. The kind of build setup that Chris N mentioned
> is
> > > also totally doable now that there's a build description DSL for
> > Jenkins[3].
> > >
> > > For individual developers, I don't see any reason we can't package
> things
> > > up as a tool, similar to how findbugs or shellcheck work. We can make
> OS
> > > packages (or homebrew for OS X) if we want to make stand alone
> > installation
> > > on developer machines real easy. Those same packages could be installed
> > on
> > > the ASF build machines, provided some ASF project wanted to make use of
> > > Yetus.
> > >
> > > Having releases will incur some turn around time for when folks want to
> > see
> > > fixes, but that's a trade off around release cadence we can work out
> > longer
> > > term.
> > >
> > > I would like to have one or two projects that can work off of the
> > bleeding
> > > edge repo, but we'd have to get that to mesh with foundation policy. My
> > gut
> > > tells me we should be able to come up with an agreement that makes
> such a
> > > project "part of the development community" but the specifics will have
> > to
> > > be worked out.
> > >
> > >
> > > On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > >> How would we have to add testing to our pre commit testing?
> > >>
> > >> Each project has likely customized their own core commit scripts
> > (multiple
> > >> Jvms versions, checkstyle, javadoc exceptions etc.). We should
> probably
> > >> soliciit interest from other projects who already have fancy precommit
> > >> tests beyond HBase/Hadoop too.
> > >>
> > >
> > > I'm not sure if Allen's response above answered this for you. The
> current
> > > state of test-patch has a plugin system for adding new tests. It is
> also
> > > customizable to handle project idiosyncrasies (atleast the ones we've
> > seen
> > > so far, like unspecified module dependencies, multiple projects per
> repo,
> > > or use of ant). Releases will naturally include docs on how to leverage
> > > those to customize what a particular project needs tested pre-commit.
> > >
> > > Given the nature of the work Yetus is hoping to enable, I think it's
> safe
> > > to assume the project community is going to be doing a fair bit of
> > outreach
> > > to help show projects outside of HBase/Hadoop how things can work for
> > them.
> > >
> > >
> > >
> > > [1]: http://getbootstrap.com/
> > > [2]: http://www.apache.org/dev/release.html
> > > [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
> > >
> > > --
> > > Sean
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nick Dimiduk <nd...@gmail.com>.
On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
wrote:

> You mentioned that "most of our project will be focused on shell
> scripts" I guess based on the existing test-patch code.  Allen did a
> lot of good work in this area recently.  I am curious if you evaluated
> languages such as Python or Node.js for this use-case.  Shell scripts
> can get a little... tricky beyond a certain size.  On the other hand,
> if we are standardizing on shell, which shell and which version?
> Perhaps bash 3.5+?
>

I'll also add that shell is not helpful for a cross-platform set of
tooling. I recently added a daemon to Apache Phoenix; an explicit
requirement was Windows support. I ended up implementing a solution in
python because that environment is platform-agnostic and still systems-y
enough. I think this is something this project should seriously consider.

-n

On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > I'm going to try responding to several things at once here, so apologies
> if
> > I miss anyone and sorry for the long email. :)
> >
> >
> > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
> > wrote:
> >
> >> I think it's good to have a general build/test process projects can
> share,
> >> so +1 to pulling it out. You should get help from others.
> >>
> >> regarding incubation, it is a lot of work, especially for something
> that's
> >> more of an in-house tool than an artifact to release and redistribute.
> >>
> >> You can't just use apache labs or the build project's repo to work on
> this?
> >>
> >> if you do want to incubate, we may want to nominate the hadoop project
> as
> >> the monitoring PMC, rather than incubator@.
> >>
> >> -steve
> >>
> >>
> > Important note: we're proposing a board resolution that would directly
> pull
> > this code base out into a new TLP; there'd be no incubator, we'd just
> > continue building community and start making releases.
> >
> > The proposed PMC believes the tooling we're talking about has direct
> > applicability to projects well outside of the ASF. Lot's of other open
> > source projects run on community contributions and have a general need
> for
> > better QA tools. Given that problem set and the presence of a community
> > working to solve it, there's no reason this needs to be treated as an
> > in-house build project. We certainly want to be useful to ASF projects
> and
> > getting them on-board given our current optimization for ASF infra will
> > certainly be easier, but we're not limited to that (and our current
> > prerequisites, a CI tool and jira or github, are pretty broadly
> available).
> >
> >
> > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
> wrote:
> >
> >>
> >> Since we're tossing out names, how about Apache Bootstrap? It's a
> >> meta-project to help other projects get off the ground, after all.
> >>
> >
> >
> > There's already a web development framework named Bootstrap[1]. It's also
> > used by several ASF projects, so I think it best to avoid the confusion.
> >
> > The name is, of course, up to the proposed PMC. As a bit of background,
> the
> > current name Yetus fulfills Allen's desire to have something shell
> related
> > and my desire to have a project that starts with Y (there are currently
> no
> > ASF projects that start with Y). The universe of names that fill in these
> > two is very small, AFAICT. I did a brief suitability search and didn't
> find
> > any blockers.
> >
> >
> >  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
> >  wrote:
> >
> >>
> >> Since a couple of people have brought it up:
> >>
> >>         I think the release question is probably one of the big question
> >> marks.  Other than tar balls, how does something like this actually get
> >> used downstream?
> >>
> >>         For test-patch, in particular, I have a few thoughts on this:
> >>
> >> Short term:
> >>
> >>         * Projects that want to move RIGHT NOW would modify their
> Jenkins
> >> jobs to checkout from the Yetus repo (preferably at a well known tag or
> >> branch) in one directory and their project repo in another directory.
> Then
> >> it’s just a matter of passing the correct flags to test-patch.  This is
> >> pretty much how I’ve been personally running test-patch for about 6
> months
> >> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
> >>
> >>         * Create a stub version of test-patch that projects could check
> >> into their repo, replacing the existing test-patch.  This stub version
> >> would git clone from either ASF or github and then execute test-patch
> >> accordingly on demand.  With the correct smarts, it could make sure it
> has
> >> a cached version to prevent continual clones.
> >>
> >> Longer term:
> >>
> >>         * I’ve been toying with the idea of (ab)using Java repos and
> >> packaging as a transportation layer, either in addition or in
> combination
> >> with something like a maven plugin.  Something like this would clearly
> be
> >> better for offline usage and/or to lower the network traffic.
> >>
> >
> > It's important that the project follow ASF guidelines on publishing
> > releases[2]. So long as we publish releases to the distribution
> directory I
> > think we'd be fine having folks work off of the corresponding tag. I'm
> not
> > sure there's much reason to do that, however. A Jenkins job can just as
> > easily grab a release tarball as a git tag and we're not talking about a
> > large amount of stuff. The kind of build setup that Chris N mentioned is
> > also totally doable now that there's a build description DSL for
> Jenkins[3].
> >
> > For individual developers, I don't see any reason we can't package things
> > up as a tool, similar to how findbugs or shellcheck work. We can make OS
> > packages (or homebrew for OS X) if we want to make stand alone
> installation
> > on developer machines real easy. Those same packages could be installed
> on
> > the ASF build machines, provided some ASF project wanted to make use of
> > Yetus.
> >
> > Having releases will incur some turn around time for when folks want to
> see
> > fixes, but that's a trade off around release cadence we can work out
> longer
> > term.
> >
> > I would like to have one or two projects that can work off of the
> bleeding
> > edge repo, but we'd have to get that to mesh with foundation policy. My
> gut
> > tells me we should be able to come up with an agreement that makes such a
> > project "part of the development community" but the specifics will have
> to
> > be worked out.
> >
> >
> > On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> >> How would we have to add testing to our pre commit testing?
> >>
> >> Each project has likely customized their own core commit scripts
> (multiple
> >> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
> >> soliciit interest from other projects who already have fancy precommit
> >> tests beyond HBase/Hadoop too.
> >>
> >
> > I'm not sure if Allen's response above answered this for you. The current
> > state of test-patch has a plugin system for adding new tests. It is also
> > customizable to handle project idiosyncrasies (atleast the ones we've
> seen
> > so far, like unspecified module dependencies, multiple projects per repo,
> > or use of ant). Releases will naturally include docs on how to leverage
> > those to customize what a particular project needs tested pre-commit.
> >
> > Given the nature of the work Yetus is hoping to enable, I think it's safe
> > to assume the project community is going to be doing a fair bit of
> outreach
> > to help show projects outside of HBase/Hadoop how things can work for
> them.
> >
> >
> >
> > [1]: http://getbootstrap.com/
> > [2]: http://www.apache.org/dev/release.html
> > [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
> >
> > --
> > Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nick Dimiduk <nd...@gmail.com>.
On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe <cm...@apache.org>
wrote:

> You mentioned that "most of our project will be focused on shell
> scripts" I guess based on the existing test-patch code.  Allen did a
> lot of good work in this area recently.  I am curious if you evaluated
> languages such as Python or Node.js for this use-case.  Shell scripts
> can get a little... tricky beyond a certain size.  On the other hand,
> if we are standardizing on shell, which shell and which version?
> Perhaps bash 3.5+?
>

I'll also add that shell is not helpful for a cross-platform set of
tooling. I recently added a daemon to Apache Phoenix; an explicit
requirement was Windows support. I ended up implementing a solution in
python because that environment is platform-agnostic and still systems-y
enough. I think this is something this project should seriously consider.

-n

On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > I'm going to try responding to several things at once here, so apologies
> if
> > I miss anyone and sorry for the long email. :)
> >
> >
> > On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
> > wrote:
> >
> >> I think it's good to have a general build/test process projects can
> share,
> >> so +1 to pulling it out. You should get help from others.
> >>
> >> regarding incubation, it is a lot of work, especially for something
> that's
> >> more of an in-house tool than an artifact to release and redistribute.
> >>
> >> You can't just use apache labs or the build project's repo to work on
> this?
> >>
> >> if you do want to incubate, we may want to nominate the hadoop project
> as
> >> the monitoring PMC, rather than incubator@.
> >>
> >> -steve
> >>
> >>
> > Important note: we're proposing a board resolution that would directly
> pull
> > this code base out into a new TLP; there'd be no incubator, we'd just
> > continue building community and start making releases.
> >
> > The proposed PMC believes the tooling we're talking about has direct
> > applicability to projects well outside of the ASF. Lot's of other open
> > source projects run on community contributions and have a general need
> for
> > better QA tools. Given that problem set and the presence of a community
> > working to solve it, there's no reason this needs to be treated as an
> > in-house build project. We certainly want to be useful to ASF projects
> and
> > getting them on-board given our current optimization for ASF infra will
> > certainly be easier, but we're not limited to that (and our current
> > prerequisites, a CI tool and jira or github, are pretty broadly
> available).
> >
> >
> > On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org>
> wrote:
> >
> >>
> >> Since we're tossing out names, how about Apache Bootstrap? It's a
> >> meta-project to help other projects get off the ground, after all.
> >>
> >
> >
> > There's already a web development framework named Bootstrap[1]. It's also
> > used by several ASF projects, so I think it best to avoid the confusion.
> >
> > The name is, of course, up to the proposed PMC. As a bit of background,
> the
> > current name Yetus fulfills Allen's desire to have something shell
> related
> > and my desire to have a project that starts with Y (there are currently
> no
> > ASF projects that start with Y). The universe of names that fill in these
> > two is very small, AFAICT. I did a brief suitability search and didn't
> find
> > any blockers.
> >
> >
> >  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
> >  wrote:
> >
> >>
> >> Since a couple of people have brought it up:
> >>
> >>         I think the release question is probably one of the big question
> >> marks.  Other than tar balls, how does something like this actually get
> >> used downstream?
> >>
> >>         For test-patch, in particular, I have a few thoughts on this:
> >>
> >> Short term:
> >>
> >>         * Projects that want to move RIGHT NOW would modify their
> Jenkins
> >> jobs to checkout from the Yetus repo (preferably at a well known tag or
> >> branch) in one directory and their project repo in another directory.
> Then
> >> it’s just a matter of passing the correct flags to test-patch.  This is
> >> pretty much how I’ve been personally running test-patch for about 6
> months
> >> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
> >>
> >>         * Create a stub version of test-patch that projects could check
> >> into their repo, replacing the existing test-patch.  This stub version
> >> would git clone from either ASF or github and then execute test-patch
> >> accordingly on demand.  With the correct smarts, it could make sure it
> has
> >> a cached version to prevent continual clones.
> >>
> >> Longer term:
> >>
> >>         * I’ve been toying with the idea of (ab)using Java repos and
> >> packaging as a transportation layer, either in addition or in
> combination
> >> with something like a maven plugin.  Something like this would clearly
> be
> >> better for offline usage and/or to lower the network traffic.
> >>
> >
> > It's important that the project follow ASF guidelines on publishing
> > releases[2]. So long as we publish releases to the distribution
> directory I
> > think we'd be fine having folks work off of the corresponding tag. I'm
> not
> > sure there's much reason to do that, however. A Jenkins job can just as
> > easily grab a release tarball as a git tag and we're not talking about a
> > large amount of stuff. The kind of build setup that Chris N mentioned is
> > also totally doable now that there's a build description DSL for
> Jenkins[3].
> >
> > For individual developers, I don't see any reason we can't package things
> > up as a tool, similar to how findbugs or shellcheck work. We can make OS
> > packages (or homebrew for OS X) if we want to make stand alone
> installation
> > on developer machines real easy. Those same packages could be installed
> on
> > the ASF build machines, provided some ASF project wanted to make use of
> > Yetus.
> >
> > Having releases will incur some turn around time for when folks want to
> see
> > fixes, but that's a trade off around release cadence we can work out
> longer
> > term.
> >
> > I would like to have one or two projects that can work off of the
> bleeding
> > edge repo, but we'd have to get that to mesh with foundation policy. My
> gut
> > tells me we should be able to come up with an agreement that makes such a
> > project "part of the development community" but the specifics will have
> to
> > be worked out.
> >
> >
> > On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> >> How would we have to add testing to our pre commit testing?
> >>
> >> Each project has likely customized their own core commit scripts
> (multiple
> >> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
> >> soliciit interest from other projects who already have fancy precommit
> >> tests beyond HBase/Hadoop too.
> >>
> >
> > I'm not sure if Allen's response above answered this for you. The current
> > state of test-patch has a plugin system for adding new tests. It is also
> > customizable to handle project idiosyncrasies (atleast the ones we've
> seen
> > so far, like unspecified module dependencies, multiple projects per repo,
> > or use of ant). Releases will naturally include docs on how to leverage
> > those to customize what a particular project needs tested pre-commit.
> >
> > Given the nature of the work Yetus is hoping to enable, I think it's safe
> > to assume the project community is going to be doing a fair bit of
> outreach
> > to help show projects outside of HBase/Hadoop how things can work for
> them.
> >
> >
> >
> > [1]: http://getbootstrap.com/
> > [2]: http://www.apache.org/dev/release.html
> > [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
> >
> > --
> > Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by "Colin P. McCabe" <cm...@apache.org>.
+1 for making this a separate project.  We've always struggled with a
lot of forks of the test-patch code and perhaps this project can help
create something that works well for multiple projects.

Bypassing the incubator seems kind of weird (I didn't know that was an
option) but I will let other people with more experience in the ASF
comment on that.

You mentioned that "most of our project will be focused on shell
scripts" I guess based on the existing test-patch code.  Allen did a
lot of good work in this area recently.  I am curious if you evaluated
languages such as Python or Node.js for this use-case.  Shell scripts
can get a little... tricky beyond a certain size.  On the other hand,
if we are standardizing on shell, which shell and which version?
Perhaps bash 3.5+?

Also, what will be the mechanism for customizing this for each
project?  Ideally the customizations needed would be small so we could
share the most code.

cheers,
Colin


On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> I'm going to try responding to several things at once here, so apologies if
> I miss anyone and sorry for the long email. :)
>
>
> On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>> I think it's good to have a general build/test process projects can share,
>> so +1 to pulling it out. You should get help from others.
>>
>> regarding incubation, it is a lot of work, especially for something that's
>> more of an in-house tool than an artifact to release and redistribute.
>>
>> You can't just use apache labs or the build project's repo to work on this?
>>
>> if you do want to incubate, we may want to nominate the hadoop project as
>> the monitoring PMC, rather than incubator@.
>>
>> -steve
>>
>>
> Important note: we're proposing a board resolution that would directly pull
> this code base out into a new TLP; there'd be no incubator, we'd just
> continue building community and start making releases.
>
> The proposed PMC believes the tooling we're talking about has direct
> applicability to projects well outside of the ASF. Lot's of other open
> source projects run on community contributions and have a general need for
> better QA tools. Given that problem set and the presence of a community
> working to solve it, there's no reason this needs to be treated as an
> in-house build project. We certainly want to be useful to ASF projects and
> getting them on-board given our current optimization for ASF infra will
> certainly be easier, but we're not limited to that (and our current
> prerequisites, a CI tool and jira or github, are pretty broadly available).
>
>
> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org> wrote:
>
>>
>> Since we're tossing out names, how about Apache Bootstrap? It's a
>> meta-project to help other projects get off the ground, after all.
>>
>
>
> There's already a web development framework named Bootstrap[1]. It's also
> used by several ASF projects, so I think it best to avoid the confusion.
>
> The name is, of course, up to the proposed PMC. As a bit of background, the
> current name Yetus fulfills Allen's desire to have something shell related
> and my desire to have a project that starts with Y (there are currently no
> ASF projects that start with Y). The universe of names that fill in these
> two is very small, AFAICT. I did a brief suitability search and didn't find
> any blockers.
>
>
>  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
>  wrote:
>
>>
>> Since a couple of people have brought it up:
>>
>>         I think the release question is probably one of the big question
>> marks.  Other than tar balls, how does something like this actually get
>> used downstream?
>>
>>         For test-patch, in particular, I have a few thoughts on this:
>>
>> Short term:
>>
>>         * Projects that want to move RIGHT NOW would modify their Jenkins
>> jobs to checkout from the Yetus repo (preferably at a well known tag or
>> branch) in one directory and their project repo in another directory.  Then
>> it’s just a matter of passing the correct flags to test-patch.  This is
>> pretty much how I’ve been personally running test-patch for about 6 months
>> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>>
>>         * Create a stub version of test-patch that projects could check
>> into their repo, replacing the existing test-patch.  This stub version
>> would git clone from either ASF or github and then execute test-patch
>> accordingly on demand.  With the correct smarts, it could make sure it has
>> a cached version to prevent continual clones.
>>
>> Longer term:
>>
>>         * I’ve been toying with the idea of (ab)using Java repos and
>> packaging as a transportation layer, either in addition or in combination
>> with something like a maven plugin.  Something like this would clearly be
>> better for offline usage and/or to lower the network traffic.
>>
>
> It's important that the project follow ASF guidelines on publishing
> releases[2]. So long as we publish releases to the distribution directory I
> think we'd be fine having folks work off of the corresponding tag. I'm not
> sure there's much reason to do that, however. A Jenkins job can just as
> easily grab a release tarball as a git tag and we're not talking about a
> large amount of stuff. The kind of build setup that Chris N mentioned is
> also totally doable now that there's a build description DSL for Jenkins[3].
>
> For individual developers, I don't see any reason we can't package things
> up as a tool, similar to how findbugs or shellcheck work. We can make OS
> packages (or homebrew for OS X) if we want to make stand alone installation
> on developer machines real easy. Those same packages could be installed on
> the ASF build machines, provided some ASF project wanted to make use of
> Yetus.
>
> Having releases will incur some turn around time for when folks want to see
> fixes, but that's a trade off around release cadence we can work out longer
> term.
>
> I would like to have one or two projects that can work off of the bleeding
> edge repo, but we'd have to get that to mesh with foundation policy. My gut
> tells me we should be able to come up with an agreement that makes such a
> project "part of the development community" but the specifics will have to
> be worked out.
>
>
> On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
>> How would we have to add testing to our pre commit testing?
>>
>> Each project has likely customized their own core commit scripts (multiple
>> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
>> soliciit interest from other projects who already have fancy precommit
>> tests beyond HBase/Hadoop too.
>>
>
> I'm not sure if Allen's response above answered this for you. The current
> state of test-patch has a plugin system for adding new tests. It is also
> customizable to handle project idiosyncrasies (atleast the ones we've seen
> so far, like unspecified module dependencies, multiple projects per repo,
> or use of ant). Releases will naturally include docs on how to leverage
> those to customize what a particular project needs tested pre-commit.
>
> Given the nature of the work Yetus is hoping to enable, I think it's safe
> to assume the project community is going to be doing a fair bit of outreach
> to help show projects outside of HBase/Hadoop how things can work for them.
>
>
>
> [1]: http://getbootstrap.com/
> [2]: http://www.apache.org/dev/release.html
> [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Ray Chiang <rc...@cloudera.com>.
For words beginning with "Y"

Yale: Mythical animal elephant/boar.  After some quick Googling, it's
apparently originally documented by "Pliny the Elder", so it's got a
beer-related connotation too.  Downside, might get confused with the
university of the same name.
Yare: adj for quick/agile/lively
Yucca: noun for a plant that an elephant *could* eat
Yair: Scottish term for a fish trap.  Also a Hebrew name for "He will
enlighten"

-Ray


On Wed, Jun 17, 2015 at 5:03 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 17 Jun 2015, at 03:55, Sean Busbey <bu...@cloudera.com> wrote:
> >
> > The name is, of course, up to the proposed PMC. As a bit of background,
> the
> > current name Yetus fulfills Allen's desire to have something shell
> related
> > and my desire to have a project that starts with Y (there are currently
> no
> > ASF projects that start with Y). The universe of names that fill in these
> > two is very small, AFAICT. I did a brief suitability search and didn't
> find
> > any blockers.
>
>
> Apache YouBrokeTheBuild?
>
> I'd thought of "yeti", but there's a couple of software projects/products
> called that already.
>
> Here's a complete list of things that live alongside elephants in
> Tanzania; nothing beginning with Y
>
> http://www.serengeti.org/animals.html
>
> if you pick one from that list I may have a photo for your slides
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Steve Loughran <st...@hortonworks.com>.
> On 17 Jun 2015, at 03:55, Sean Busbey <bu...@cloudera.com> wrote:
> 
> The name is, of course, up to the proposed PMC. As a bit of background, the
> current name Yetus fulfills Allen's desire to have something shell related
> and my desire to have a project that starts with Y (there are currently no
> ASF projects that start with Y). The universe of names that fill in these
> two is very small, AFAICT. I did a brief suitability search and didn't find
> any blockers.


Apache YouBrokeTheBuild?

I'd thought of "yeti", but there's a couple of software projects/products called that already.

Here's a complete list of things that live alongside elephants in Tanzania; nothing beginning with Y

http://www.serengeti.org/animals.html

if you pick one from that list I may have a photo for your slides

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by "Colin P. McCabe" <cm...@apache.org>.
+1 for making this a separate project.  We've always struggled with a
lot of forks of the test-patch code and perhaps this project can help
create something that works well for multiple projects.

Bypassing the incubator seems kind of weird (I didn't know that was an
option) but I will let other people with more experience in the ASF
comment on that.

You mentioned that "most of our project will be focused on shell
scripts" I guess based on the existing test-patch code.  Allen did a
lot of good work in this area recently.  I am curious if you evaluated
languages such as Python or Node.js for this use-case.  Shell scripts
can get a little... tricky beyond a certain size.  On the other hand,
if we are standardizing on shell, which shell and which version?
Perhaps bash 3.5+?

Also, what will be the mechanism for customizing this for each
project?  Ideally the customizations needed would be small so we could
share the most code.

cheers,
Colin


On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey <bu...@cloudera.com> wrote:
> I'm going to try responding to several things at once here, so apologies if
> I miss anyone and sorry for the long email. :)
>
>
> On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>> I think it's good to have a general build/test process projects can share,
>> so +1 to pulling it out. You should get help from others.
>>
>> regarding incubation, it is a lot of work, especially for something that's
>> more of an in-house tool than an artifact to release and redistribute.
>>
>> You can't just use apache labs or the build project's repo to work on this?
>>
>> if you do want to incubate, we may want to nominate the hadoop project as
>> the monitoring PMC, rather than incubator@.
>>
>> -steve
>>
>>
> Important note: we're proposing a board resolution that would directly pull
> this code base out into a new TLP; there'd be no incubator, we'd just
> continue building community and start making releases.
>
> The proposed PMC believes the tooling we're talking about has direct
> applicability to projects well outside of the ASF. Lot's of other open
> source projects run on community contributions and have a general need for
> better QA tools. Given that problem set and the presence of a community
> working to solve it, there's no reason this needs to be treated as an
> in-house build project. We certainly want to be useful to ASF projects and
> getting them on-board given our current optimization for ASF infra will
> certainly be easier, but we're not limited to that (and our current
> prerequisites, a CI tool and jira or github, are pretty broadly available).
>
>
> On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org> wrote:
>
>>
>> Since we're tossing out names, how about Apache Bootstrap? It's a
>> meta-project to help other projects get off the ground, after all.
>>
>
>
> There's already a web development framework named Bootstrap[1]. It's also
> used by several ASF projects, so I think it best to avoid the confusion.
>
> The name is, of course, up to the proposed PMC. As a bit of background, the
> current name Yetus fulfills Allen's desire to have something shell related
> and my desire to have a project that starts with Y (there are currently no
> ASF projects that start with Y). The universe of names that fill in these
> two is very small, AFAICT. I did a brief suitability search and didn't find
> any blockers.
>
>
>  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
>  wrote:
>
>>
>> Since a couple of people have brought it up:
>>
>>         I think the release question is probably one of the big question
>> marks.  Other than tar balls, how does something like this actually get
>> used downstream?
>>
>>         For test-patch, in particular, I have a few thoughts on this:
>>
>> Short term:
>>
>>         * Projects that want to move RIGHT NOW would modify their Jenkins
>> jobs to checkout from the Yetus repo (preferably at a well known tag or
>> branch) in one directory and their project repo in another directory.  Then
>> it’s just a matter of passing the correct flags to test-patch.  This is
>> pretty much how I’ve been personally running test-patch for about 6 months
>> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>>
>>         * Create a stub version of test-patch that projects could check
>> into their repo, replacing the existing test-patch.  This stub version
>> would git clone from either ASF or github and then execute test-patch
>> accordingly on demand.  With the correct smarts, it could make sure it has
>> a cached version to prevent continual clones.
>>
>> Longer term:
>>
>>         * I’ve been toying with the idea of (ab)using Java repos and
>> packaging as a transportation layer, either in addition or in combination
>> with something like a maven plugin.  Something like this would clearly be
>> better for offline usage and/or to lower the network traffic.
>>
>
> It's important that the project follow ASF guidelines on publishing
> releases[2]. So long as we publish releases to the distribution directory I
> think we'd be fine having folks work off of the corresponding tag. I'm not
> sure there's much reason to do that, however. A Jenkins job can just as
> easily grab a release tarball as a git tag and we're not talking about a
> large amount of stuff. The kind of build setup that Chris N mentioned is
> also totally doable now that there's a build description DSL for Jenkins[3].
>
> For individual developers, I don't see any reason we can't package things
> up as a tool, similar to how findbugs or shellcheck work. We can make OS
> packages (or homebrew for OS X) if we want to make stand alone installation
> on developer machines real easy. Those same packages could be installed on
> the ASF build machines, provided some ASF project wanted to make use of
> Yetus.
>
> Having releases will incur some turn around time for when folks want to see
> fixes, but that's a trade off around release cadence we can work out longer
> term.
>
> I would like to have one or two projects that can work off of the bleeding
> edge repo, but we'd have to get that to mesh with foundation policy. My gut
> tells me we should be able to come up with an agreement that makes such a
> project "part of the development community" but the specifics will have to
> be worked out.
>
>
> On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
>> How would we have to add testing to our pre commit testing?
>>
>> Each project has likely customized their own core commit scripts (multiple
>> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
>> soliciit interest from other projects who already have fancy precommit
>> tests beyond HBase/Hadoop too.
>>
>
> I'm not sure if Allen's response above answered this for you. The current
> state of test-patch has a plugin system for adding new tests. It is also
> customizable to handle project idiosyncrasies (atleast the ones we've seen
> so far, like unspecified module dependencies, multiple projects per repo,
> or use of ant). Releases will naturally include docs on how to leverage
> those to customize what a particular project needs tested pre-commit.
>
> Given the nature of the work Yetus is hoping to enable, I think it's safe
> to assume the project community is going to be doing a fair bit of outreach
> to help show projects outside of HBase/Hadoop how things can work for them.
>
>
>
> [1]: http://getbootstrap.com/
> [2]: http://www.apache.org/dev/release.html
> [3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
I'm going to try responding to several things at once here, so apologies if
I miss anyone and sorry for the long email. :)


On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> I think it's good to have a general build/test process projects can share,
> so +1 to pulling it out. You should get help from others.
>
> regarding incubation, it is a lot of work, especially for something that's
> more of an in-house tool than an artifact to release and redistribute.
>
> You can't just use apache labs or the build project's repo to work on this?
>
> if you do want to incubate, we may want to nominate the hadoop project as
> the monitoring PMC, rather than incubator@.
>
> -steve
>
>
Important note: we're proposing a board resolution that would directly pull
this code base out into a new TLP; there'd be no incubator, we'd just
continue building community and start making releases.

The proposed PMC believes the tooling we're talking about has direct
applicability to projects well outside of the ASF. Lot's of other open
source projects run on community contributions and have a general need for
better QA tools. Given that problem set and the presence of a community
working to solve it, there's no reason this needs to be treated as an
in-house build project. We certainly want to be useful to ASF projects and
getting them on-board given our current optimization for ASF infra will
certainly be easier, but we're not limited to that (and our current
prerequisites, a CI tool and jira or github, are pretty broadly available).


On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org> wrote:

>
> Since we're tossing out names, how about Apache Bootstrap? It's a
> meta-project to help other projects get off the ground, after all.
>


There's already a web development framework named Bootstrap[1]. It's also
used by several ASF projects, so I think it best to avoid the confusion.

The name is, of course, up to the proposed PMC. As a bit of background, the
current name Yetus fulfills Allen's desire to have something shell related
and my desire to have a project that starts with Y (there are currently no
ASF projects that start with Y). The universe of names that fill in these
two is very small, AFAICT. I did a brief suitability search and didn't find
any blockers.


 On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
 wrote:

>
> Since a couple of people have brought it up:
>
>         I think the release question is probably one of the big question
> marks.  Other than tar balls, how does something like this actually get
> used downstream?
>
>         For test-patch, in particular, I have a few thoughts on this:
>
> Short term:
>
>         * Projects that want to move RIGHT NOW would modify their Jenkins
> jobs to checkout from the Yetus repo (preferably at a well known tag or
> branch) in one directory and their project repo in another directory.  Then
> it’s just a matter of passing the correct flags to test-patch.  This is
> pretty much how I’ve been personally running test-patch for about 6 months
> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>
>         * Create a stub version of test-patch that projects could check
> into their repo, replacing the existing test-patch.  This stub version
> would git clone from either ASF or github and then execute test-patch
> accordingly on demand.  With the correct smarts, it could make sure it has
> a cached version to prevent continual clones.
>
> Longer term:
>
>         * I’ve been toying with the idea of (ab)using Java repos and
> packaging as a transportation layer, either in addition or in combination
> with something like a maven plugin.  Something like this would clearly be
> better for offline usage and/or to lower the network traffic.
>

It's important that the project follow ASF guidelines on publishing
releases[2]. So long as we publish releases to the distribution directory I
think we'd be fine having folks work off of the corresponding tag. I'm not
sure there's much reason to do that, however. A Jenkins job can just as
easily grab a release tarball as a git tag and we're not talking about a
large amount of stuff. The kind of build setup that Chris N mentioned is
also totally doable now that there's a build description DSL for Jenkins[3].

For individual developers, I don't see any reason we can't package things
up as a tool, similar to how findbugs or shellcheck work. We can make OS
packages (or homebrew for OS X) if we want to make stand alone installation
on developer machines real easy. Those same packages could be installed on
the ASF build machines, provided some ASF project wanted to make use of
Yetus.

Having releases will incur some turn around time for when folks want to see
fixes, but that's a trade off around release cadence we can work out longer
term.

I would like to have one or two projects that can work off of the bleeding
edge repo, but we'd have to get that to mesh with foundation policy. My gut
tells me we should be able to come up with an agreement that makes such a
project "part of the development community" but the specifics will have to
be worked out.


On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> How would we have to add testing to our pre commit testing?
>
> Each project has likely customized their own core commit scripts (multiple
> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
> soliciit interest from other projects who already have fancy precommit
> tests beyond HBase/Hadoop too.
>

I'm not sure if Allen's response above answered this for you. The current
state of test-patch has a plugin system for adding new tests. It is also
customizable to handle project idiosyncrasies (atleast the ones we've seen
so far, like unspecified module dependencies, multiple projects per repo,
or use of ant). Releases will naturally include docs on how to leverage
those to customize what a particular project needs tested pre-commit.

Given the nature of the work Yetus is hoping to enable, I think it's safe
to assume the project community is going to be doing a fair bit of outreach
to help show projects outside of HBase/Hadoop how things can work for them.



[1]: http://getbootstrap.com/
[2]: http://www.apache.org/dev/release.html
[3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin

-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
I'm going to try responding to several things at once here, so apologies if
I miss anyone and sorry for the long email. :)


On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> I think it's good to have a general build/test process projects can share,
> so +1 to pulling it out. You should get help from others.
>
> regarding incubation, it is a lot of work, especially for something that's
> more of an in-house tool than an artifact to release and redistribute.
>
> You can't just use apache labs or the build project's repo to work on this?
>
> if you do want to incubate, we may want to nominate the hadoop project as
> the monitoring PMC, rather than incubator@.
>
> -steve
>
>
Important note: we're proposing a board resolution that would directly pull
this code base out into a new TLP; there'd be no incubator, we'd just
continue building community and start making releases.

The proposed PMC believes the tooling we're talking about has direct
applicability to projects well outside of the ASF. Lot's of other open
source projects run on community contributions and have a general need for
better QA tools. Given that problem set and the presence of a community
working to solve it, there's no reason this needs to be treated as an
in-house build project. We certainly want to be useful to ASF projects and
getting them on-board given our current optimization for ASF infra will
certainly be easier, but we're not limited to that (and our current
prerequisites, a CI tool and jira or github, are pretty broadly available).


On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk <nd...@apache.org> wrote:

>
> Since we're tossing out names, how about Apache Bootstrap? It's a
> meta-project to help other projects get off the ground, after all.
>


There's already a web development framework named Bootstrap[1]. It's also
used by several ASF projects, so I think it best to avoid the confusion.

The name is, of course, up to the proposed PMC. As a bit of background, the
current name Yetus fulfills Allen's desire to have something shell related
and my desire to have a project that starts with Y (there are currently no
ASF projects that start with Y). The universe of names that fill in these
two is very small, AFAICT. I did a brief suitability search and didn't find
any blockers.


 On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer <aw...@altiscale.com>
 wrote:

>
> Since a couple of people have brought it up:
>
>         I think the release question is probably one of the big question
> marks.  Other than tar balls, how does something like this actually get
> used downstream?
>
>         For test-patch, in particular, I have a few thoughts on this:
>
> Short term:
>
>         * Projects that want to move RIGHT NOW would modify their Jenkins
> jobs to checkout from the Yetus repo (preferably at a well known tag or
> branch) in one directory and their project repo in another directory.  Then
> it’s just a matter of passing the correct flags to test-patch.  This is
> pretty much how I’ve been personally running test-patch for about 6 months
> now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
>
>         * Create a stub version of test-patch that projects could check
> into their repo, replacing the existing test-patch.  This stub version
> would git clone from either ASF or github and then execute test-patch
> accordingly on demand.  With the correct smarts, it could make sure it has
> a cached version to prevent continual clones.
>
> Longer term:
>
>         * I’ve been toying with the idea of (ab)using Java repos and
> packaging as a transportation layer, either in addition or in combination
> with something like a maven plugin.  Something like this would clearly be
> better for offline usage and/or to lower the network traffic.
>

It's important that the project follow ASF guidelines on publishing
releases[2]. So long as we publish releases to the distribution directory I
think we'd be fine having folks work off of the corresponding tag. I'm not
sure there's much reason to do that, however. A Jenkins job can just as
easily grab a release tarball as a git tag and we're not talking about a
large amount of stuff. The kind of build setup that Chris N mentioned is
also totally doable now that there's a build description DSL for Jenkins[3].

For individual developers, I don't see any reason we can't package things
up as a tool, similar to how findbugs or shellcheck work. We can make OS
packages (or homebrew for OS X) if we want to make stand alone installation
on developer machines real easy. Those same packages could be installed on
the ASF build machines, provided some ASF project wanted to make use of
Yetus.

Having releases will incur some turn around time for when folks want to see
fixes, but that's a trade off around release cadence we can work out longer
term.

I would like to have one or two projects that can work off of the bleeding
edge repo, but we'd have to get that to mesh with foundation policy. My gut
tells me we should be able to come up with an agreement that makes such a
project "part of the development community" but the specifics will have to
be worked out.


On Tue, Jun 16, 2015 at 4:41 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> How would we have to add testing to our pre commit testing?
>
> Each project has likely customized their own core commit scripts (multiple
> Jvms versions, checkstyle, javadoc exceptions etc.). We should probably
> soliciit interest from other projects who already have fancy precommit
> tests beyond HBase/Hadoop too.
>

I'm not sure if Allen's response above answered this for you. The current
state of test-patch has a plugin system for adding new tests. It is also
customizable to handle project idiosyncrasies (atleast the ones we've seen
so far, like unspecified module dependencies, multiple projects per repo,
or use of ant). Releases will naturally include docs on how to leverage
those to customize what a particular project needs tested pre-commit.

Given the nature of the work Yetus is hoping to enable, I think it's safe
to assume the project community is going to be doing a fair bit of outreach
to help show projects outside of HBase/Hadoop how things can work for them.



[1]: http://getbootstrap.com/
[2]: http://www.apache.org/dev/release.html
[3]: https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin

-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Steve Loughran <st...@hortonworks.com>.
I think it's good to have a general build/test process projects can share, so +1 to pulling it out. You should get help from others. 

regarding incubation, it is a lot of work, especially for something that's more of an in-house tool than an artifact to release and redistribute.

You can't just use apache labs or the build project's repo to work on this? 

if you do want to incubate, we may want to nominate the hadoop project as the monitoring PMC, rather than incubator@. 

-steve

> On 16 Jun 2015, at 17:59, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
> 
> Since a couple of people have brought it up:
> 
> 	I think the release question is probably one of the big question marks.  Other than tar balls, how does something like this actually get used downstream?
> 
> 	For test-patch, in particular, I have a few thoughts on this:
> 
> Short term:
> 
> 	* Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory.  Then it’s just a matter of passing the correct flags to test-patch.  This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
> 
> 	* Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch.  This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand.  With the correct smarts, it could make sure it has a cached version to prevent continual clones.
> 
> Longer term:
> 
> 	* I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin.  Something like this would clearly be better for offline usage and/or to lower the network traffic.
> 
> 
> 	It’s probably worth pointing out that plugins can get sucked in from outside the Yetus dir structure, so project specific bits can remain in those projects.  This would mean that, e.g., if ambari decides they want to change the dependency ordering such that ambari-metrics always gets built first, that’s completely doable without the Yetus project getting involved.  This is particularly relevant for things like the Dockerfile where projects would almost certainly want to dictate their build and test time dependencies.  


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Allen Wittenauer <aw...@altiscale.com>.
Since a couple of people have brought it up:

	I think the release question is probably one of the big question marks.  Other than tar balls, how does something like this actually get used downstream?

	For test-patch, in particular, I have a few thoughts on this:

Short term:

	* Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory.  Then it’s just a matter of passing the correct flags to test-patch.  This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.

	* Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch.  This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand.  With the correct smarts, it could make sure it has a cached version to prevent continual clones.

Longer term:

	* I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin.  Something like this would clearly be better for offline usage and/or to lower the network traffic.


	It’s probably worth pointing out that plugins can get sucked in from outside the Yetus dir structure, so project specific bits can remain in those projects.  This would mean that, e.g., if ambari decides they want to change the dependency ordering such that ambari-metrics always gets built first, that’s completely doable without the Yetus project getting involved.  This is particularly relevant for things like the Dockerfile where projects would almost certainly want to dictate their build and test time dependencies.  

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Allen Wittenauer <aw...@altiscale.com>.
Since a couple of people have brought it up:

	I think the release question is probably one of the big question marks.  Other than tar balls, how does something like this actually get used downstream?

	For test-patch, in particular, I have a few thoughts on this:

Short term:

	* Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory.  Then it’s just a matter of passing the correct flags to test-patch.  This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.

	* Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch.  This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand.  With the correct smarts, it could make sure it has a cached version to prevent continual clones.

Longer term:

	* I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin.  Something like this would clearly be better for offline usage and/or to lower the network traffic.


	It’s probably worth pointing out that plugins can get sucked in from outside the Yetus dir structure, so project specific bits can remain in those projects.  This would mean that, e.g., if ambari decides they want to change the dependency ordering such that ambari-metrics always gets built first, that’s completely doable without the Yetus project getting involved.  This is particularly relevant for things like the Dockerfile where projects would almost certainly want to dictate their build and test time dependencies.  

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
+1 on the idea.

It would be great if tests about dependency management. multiple
branches, and distributed environment can be done in the project. One
discussion point is how Hadoop depends on Yetus, including the
development cycles. It's a good time to rethink what's can be done for
making Hadoop better.

Thanks,
- Tsuyoshi

On Tue, Jun 16, 2015 at 8:47 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>>> >> >have the permission to do this from a technical point of view (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
+1

ZooKeeper is another project that has expressed interest in improving its
pre-commit process lately.  I understand Allen has had some success
applying this to the ZooKeeper build too, with some small caveats around
quirks in the build.xml that I think we can resolve.

I'm interested in defining how the release model works for a project like
this.  The current model of forking and checking it in directly to
multiple projects leads to the fragmentation and bugs described earlier in
the thread.  Another possible model is something more dynamic, like a
bootstrap script capable of checking out a release from a git tag before
launching pre-commit.  I'm interested to hear from various projects on how
they'd like to integrate.

--Chris Nauroth




On 6/15/15, 8:57 PM, "Josh Elser" <el...@apache.org> wrote:

>+1
>
>(Have been talking to Sean in private on the subject -- seems
>appropriate to voice some public support)
>
>I'd be interested in this for Accumulo and Slider. For Accumulo, we've
>come a far way without a pre-commit build, primarily due to a CTR
>process. We have seen the repeated questions of "how do I run the tests"
>which a more automated workflow would help with, IMO. I think Slider
>could benefit with the same reasons.
>
>I'd also be giddy to see the recent improvements in Hadoop trickle down
>into the other projects that Allen already mentioned.
>
>Take this as record that I'd be happy to try to help out where possible.
>
>Sean Busbey wrote:
>> thank you for making a more digestible version Allen. :)
>>
>> If you're interested in soliciting feedback from other projects, I
>>created
>> ASF short links to this thread in common-dev and hbase:
>>
>>
>> * http://s.apache.org/yetus-discuss-hadoop
>> * http://s.apache.org/yetus-discuss-hbase
>>
>> While I agree that it's important to get feedback from ASF projects that
>> might find this useful, I can say that recently I've been involved in
>>the
>> non-ASF project YCSB and both the pretest and better shell stuff would
>>be
>> immensely useful over there.
>>
>> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<aw...@altiscale.com>
>>wrote:
>>
>>>          I'm clearly +1 on this idea.  As part of the rewrite in
>>>Hadoop of
>>> test-patch, it was amazing to see how far and wide this bit of code as
>>> spread.  So I see consolidating everyone's efforts as a huge win for a
>>> large number of projects.  (esp considering how many I saw suffering
>>>from a
>>> variety of identified bugs! )
>>>
>>>          But….
>>>
>>>          I think it's important for people involved in those other
>>>projects
>>> to speak up and voice an opinion as to whether this is useful.
>>>
>>> To summarize:
>>>
>>>          In the short term, a single location to get/use a precommit
>>>patch
>>> tester rather than everyone building/supporting their own in their
>>>spare
>>> time.
>>>
>>>           FWIW, we've already got the code base modified to be
>>>pluggable.
>>> We've written some basic/simple plugins that support Hadoop, HBase,
>>>Tajo,
>>> Tez, Pig, and Flink.  For HBase and Flink, this does include their
>>>custom
>>> checks.  Adding support for other project shouldn't be hard.  Simple
>>> projects take almost no time after seeing the basic pattern.
>>>
>>>          I think it's worthwhile highlighting that means support for
>>>both
>>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>>
>>> Longer term:
>>>
>>>          Well, we clearly have ideas of things that we want to do.
>>>Adding
>>> more features to test-patch (review board? gradle?) is obvious. But
>>>what
>>> about teasing apart and generalizing some of the other shell bits from
>>> projects? A common library for building CLI tools to fault injection to
>>> release documentation creation tools to …  I'd even like to see us get
>>>as
>>> advanced as a "run this program to auto-generate daemon stop/start
>>>bits".
>>>
>>>          I had a few chats with people about this idea at Hadoop
>>>Summit.
>>> What's truly exciting are the ideas that people had once they realized
>>>what
>>> kinds of problems we're trying to solve.  It's always amazing the
>>>problems
>>> that projects have that could be solved by these types of solutions.
>>>Let's
>>> stop hiding our cool toys in this area.
>>>
>>>          So, what feedback and ideas do you have in this area?  Are
>>>you a
>>> yay or a nay?
>>>
>>>
>>> On Jun 15, 2015, at 4:47 PM, Sean Busbey<bu...@cloudera.com>  wrote:
>>>
>>>> Oof. I had meant to push on this again but life got in the way and now
>>> the
>>>> June board meeting is upon us. Sorry everyone. In the event that this
>>> ends
>>>> up contentious, hopefully one of the copied communities can give us a
>>>> branch to work in.
>>>>
>>>> I know everyone is busy, so here's the short version of this email:
>>>>I'd
>>>> like to move some of the code currently in Hadoop (test-patch) into a
>>>>new
>>>> TLP focused on QA tooling. I'm not sure what the best format for
>>>>priming
>>>> this conversation is. ORC filled in the incubator project proposal
>>>> template, but I'm not sure how much that confused the issue. So to
>>>>start,
>>>> I'll just write what I'm hoping we can accomplish in general terms
>>>>here.
>>>>
>>>> All software development projects that are community based (that is,
>>>> accepting outside contributions) face a common QA problem for vetting
>>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>>> popular that the weight of the problem drove tool development (i.e.
>>>> test-patch). That tool is generalizable enough that a bunch of other
>>>>TLPs
>>>> have adopted their own forks. Unfortunately, in most projects this
>>>>kind
>>> of
>>>> QA work is an enabler rather than a primary concern, so often the
>>>>tooling
>>>> is worked on ad-hoc and little shared improvements happen across
>>>> projects. Since
>>>> the tooling itself is never a primary concern, any made is rarely
>>>>reused
>>>> outside of ASF projects.
>>>>
>>>> Over the last couple months a few of us have been working on
>>>>generalizing
>>>> the tooling present in the Hadoop code base (because it was the most
>>> mature
>>>> out of all those in the various projects) and it's reached a point
>>>>where
>>> we
>>>> think we can start bringing on other downstream users. This means we
>>>>need
>>>> to start establishing things like a release cadence and to grow the
>>>>new
>>>> contributors we have to handle more project responsibility.
>>>>Personally, I
>>>> think that means it's time to move out from under Hadoop to drive
>>>>things
>>> as
>>>> our own community. Eventually, I hope the community can help draw in a
>>>> group of folks traditionally underrepresented in ASF projects, namely
>>>>QA
>>>> and operations folks.
>>>>
>>>> I think test-patch by itself has enough scope to justify a project.
>>> Having
>>>> a solid set of build tools that are customizable to fit the norms of
>>>> different software communities is a bunch of work. Making it work
>>>>well in
>>>> both the context of automated test systems like Jenkins and for
>>> individual
>>>> developers is even more work. We could easily also take over
>>>>maintenance
>>> of
>>>> things like shelldocs, since test-patch is the primary consumer of
>>>>that
>>>> currently but it's generally useful tooling.
>>>>
>>>> In addition to test-patch, I think the proposed project has some
>>>>future
>>>> growth potential. Given some adoption of test-patch to prove utility,
>>>>the
>>>> project could build on the ties it makes to start building tools to
>>>>help
>>>> projects do their own longer-run testing. Note that I'm talking about
>>>>the
>>>> tools to build QA processes and not a particular set of tested
>>> components.
>>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>>> generalizable as a fault injection framework (either based on that
>>>>code
>>> or
>>>> something like it). Doing this for arbitrary software is obviously
>>>>very
>>>> difficult, and a part of easing that will be to make (and then favor)
>>>> tooling to allow projects to have operational glue that looks the
>>>>same.
>>>> Namely, the shell work that's been done in hadoop-functions.sh would
>>>>be a
>>>> great foundational layer that could bring good daemon handling
>>>>practices
>>> to
>>>> a whole slew of software projects. In the event that these frameworks
>>>>and
>>>> tools get adopted by parts of the Hadoop ecosystem, that could make
>>>>the
>>> job
>>>> of i.e. Bigtop substantially easier.
>>>>
>>>> I've reached out to a few folks who have been involved in the current
>>>> test-patch work or expressed interest in helping out on getting it
>>>>used
>>> in
>>>> other projects. Right now, the proposed PMC would be (alphabetical by
>>> last
>>>> name):
>>>>
>>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>>>jclouds
>>>> pmc, sqoop pmc, all around Jenkins expert)
>>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>>> phoenix pmc)
>>>> * Allen Wittenauer (hadoop committer)
>>>>
>>>> That PMC gives us several members and a bunch of folks familiar with
>>>>the
>>>> ASF. Combined with the code already existing in Apache spaces, I think
>>> that
>>>> gives us sufficient justification for a direct board proposal.
>>>>
>>>> The planned project name is "Apache Yetus". It's an archaic genus of
>>>>sea
>>>> snail and most of our project will be focused on shell scripts.
>>>>
>>>> N.b.: this does not mean that the Hadoop community would _have_ to
>>>>rely
>>> on
>>>> the new TLP, but I hope that once we have a release that can be
>>>>evaluated
>>>> there'd be enough benefit to strongly encourage it.
>>>>
>>>> This has mostly been focused on scope and community issues, and I'd
>>>>love
>>> to
>>>> talk through any feedback on that. Additionally, are there any other
>>> points
>>>> folks want to make sure are covered before we have a resolution?
>>>>
>>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>>> Hi Folks!
>>>>>>
>>>>>> After working on test-patch with other folks for the last few
>>>>>>months, I
>>>>>> think we've reached the point where we can make the fastest progress
>>>>>> towards the goal of a general use pre-commit patch tester by
>>>>>>spinning
>>>>>> things into a project focused on just that. I think we have a mature
>>> enough
>>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>>> together a tlp proposal.
>>>>>>
>>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we
>>>>>>can
>>>>>> continue to make things more useful.
>>>>>>
>>>>>> -Sean
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>>>> HBase's dev-support folder is where the scripts and support files
>>> live.
>>>>>>> We've only recently started adding anything to the maven builds
>>>>>>>that's
>>>>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>> where I'd
>>>>>>> add in more if we ran into the same permissions problems y'all are
>>> having.
>>>>>>> There's also our precommit job itself, though it isn't large[2].
>>> AFAIK,
>>>>>>> we don't properly back this up anywhere, we just notify each other
>>>>>>>of
>>>>>>> changes on a particular mail thread[3].
>>>>>>>
>>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>> all
>>>>>>> read because I just finished fixing "mvn site" running out of
>>>>>>>permgen)
>>>>>>> [3]: http://s.apache.org/NT0
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<
>>> cnauroth@hortonworks.com
>>>>>>>> wrote:
>>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in
>>>>>>>>the
>>>>>>>> HBase
>>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>>
>>>>>>>> Chris Nauroth
>>>>>>>> Hortonworks
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey"<bu...@cloudera.com>  wrote:
>>>>>>>>
>>>>>>>>> +dev@hbase
>>>>>>>>>
>>>>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
>>> make
>>>>>>>>> them
>>>>>>>>> more robust. From what I can tell our stuff started off as an
>>> earlier
>>>>>>>>> version of what Hadoop uses for testing.
>>>>>>>>>
>>>>>>>>> Folks on either side open to an experiment of combining our
>>> precommit
>>>>>>>>> check
>>>>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>>>>> things.
>>>>>>>>> Naturally we'll still need different jenkins jobs to handle
>>> different
>>>>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>>>>> lives,
>>>>>>>>> but that could come later.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<
>>>>>>>> cnauroth@hortonworks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>> 
>>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-err
>>>o
>>>>>>>>>> rs
>>>>>>>>>> .html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>>>>> kinds of
>>>>>>>>>> failures could leave our build directories in an indeterminate
>>> state.
>>>>>>>>>> For
>>>>>>>>>> example, we could end up with an old class file on the classpath
>>> for
>>>>>>>>>> test
>>>>>>>>>> runs that was supposedly deleted.
>>>>>>>>>>
>>>>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>>> failure
>>>>>>>>>> by placing a file where the code expects to see a directory.
>>>>>>>>>>That
>>>>>>>> might
>>>>>>>>>> even let us enable some of these tests that are skipped on
>>>>>>>>>>Windows,
>>>>>>>>>> because Windows allows access for the owner even after
>>>>>>>>>>permissions
>>>>>>>> have
>>>>>>>>>> been stripped.
>>>>>>>>>>
>>>>>>>>>> Chris Nauroth
>>>>>>>>>> Hortonworks
>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe"<cm...@alumni.cmu.edu>
>>> wrote:
>>>>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>>>>> directories that have no executable permissions on them?
>>>>>>>>>>>Clearly
>>> we
>>>>>>>>>>> have the permission to do this from a technical point of view
>>> (since
>>>>>>>>>>> we created the directories as the jenkins user), it's simply
>>>>>>>>>>>that
>>>>>>>> the
>>>>>>>>>>> code refuses to do it.
>>>>>>>>>>>
>>>>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>>>>>
>>>>>>>>>>> Colin
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<le...@cloudera.com>
>>>>>>>>>>>wrote:
>>>>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>>>>>
>>>>>>>>>>>> In HDFS-7722:
>>>>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions
>>>>>>>>>>>>in
>>>>>>>>>>>> TearDown().
>>>>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>>>>>>>clause.
>>>>>>>>>>>>
>>>>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>>>>> passed.
>>>>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>>>>>
>>>>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>>>>> DiskErrorException {
>>>>>>>>>>>>   if (!dir.isDirectory()) {
>>>>>>>>>>>>     throw new DiskErrorException("Not a directory: "
>>>>>>>>>>>>                                  + dir.toString());
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>>   checkAccessByFileMethods(dir);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>>>>> regular
>>>>>>>>>>>> file to stimulate disk failures.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>>>>> <cn...@hortonworks.com>  wrote:
>>>>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>>>>> permissions
>>>>>>>>>>>>> from
>>>>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>>>>> failures
>>>>>>>>>> at
>>>>>>>>>>>>> data
>>>>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
>>> appear
>>>>>>>>>> to be
>>>>>>>>>>>>> doing the necessary work to restore executable permissions at
>>> the
>>>>>>>>>> end
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
>>> makes
>>>>>>>>>>>>> changes
>>>>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks
>>>>>>>>>>>>>fine
>>>>>>>>>>>>> though.  I
>>>>>>>>>>>>> don¹t know if there are other uncommitted patches that
>>>>>>>>>>>>>changed
>>>>>>>> these
>>>>>>>>>>>>> test
>>>>>>>>>>>>> suites.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suppose it¹s also possible that the JUnit process
>>>>>>>>>>>>>unexpectedly
>>>>>>>>>> died
>>>>>>>>>>>>> after removing executable permissions but before restoring
>>>>>>>>>>>>>them.
>>>>>>>>>> That
>>>>>>>>>>>>> always would have been a weakness of these test suites,
>>>>>>>> regardless
>>>>>>>>>> of
>>>>>>>>>>>>> any
>>>>>>>>>>>>> recent changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris Nauroth
>>>>>>>>>>>>> Hortonworks
>>>>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"<at...@cloudera.com>
>>>>>>>>>>>>>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Colin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>>>>>>>>>going
>>>>>>>> on
>>>>>>>>>> with
>>>>>>>>>>>>>> these boxes. He took a look and concluded that some perms
>>>>>>>>>>>>>>are
>>>>>>>> being
>>>>>>>>>>>>>> set in
>>>>>>>>>>>>>> those directories by our unit tests which are precluding
>>>>>>>>>>>>>>those
>>>>>>>> files
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us,
>>>>>>>>>>>>>>but
>>> we
>>>>>>>>>> should
>>>>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>>>>> question
>>>>>>>>>> to
>>>>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To help narrow down which commit it was that started this,
>>> Andrew
>>>>>>>>>> sent
>>>>>>>>>>>>>> me
>>>>>>>>>>>>>> this info:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>> /
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>>>>> since
>>>>>>>>>> 9:32
>>>>>>>>>>>>>> UTC
>>>>>>>>>>>>>> on March 5th."
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>>>>> <cm...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't
>>>>>>>>>>>>>>>find
>>>>>>>> any
>>>>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most
>>>>>>>>>>>>>>>of
>>>>>>>> them
>>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>>>>> (default-clean)
>>>>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>>> delete
>>>>>>>>>>>>>>>
>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>>>>> fs
>>>>>>>>>>>>>>> -pr
>>>>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>>> ->  [Help 1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
>>> wrong
>>>>>>>>>>>>>>> permissions?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Colin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sean
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sean
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>
>>
>>
>


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
+1

ZooKeeper is another project that has expressed interest in improving its
pre-commit process lately.  I understand Allen has had some success
applying this to the ZooKeeper build too, with some small caveats around
quirks in the build.xml that I think we can resolve.

I'm interested in defining how the release model works for a project like
this.  The current model of forking and checking it in directly to
multiple projects leads to the fragmentation and bugs described earlier in
the thread.  Another possible model is something more dynamic, like a
bootstrap script capable of checking out a release from a git tag before
launching pre-commit.  I'm interested to hear from various projects on how
they'd like to integrate.

--Chris Nauroth




On 6/15/15, 8:57 PM, "Josh Elser" <el...@apache.org> wrote:

>+1
>
>(Have been talking to Sean in private on the subject -- seems
>appropriate to voice some public support)
>
>I'd be interested in this for Accumulo and Slider. For Accumulo, we've
>come a far way without a pre-commit build, primarily due to a CTR
>process. We have seen the repeated questions of "how do I run the tests"
>which a more automated workflow would help with, IMO. I think Slider
>could benefit with the same reasons.
>
>I'd also be giddy to see the recent improvements in Hadoop trickle down
>into the other projects that Allen already mentioned.
>
>Take this as record that I'd be happy to try to help out where possible.
>
>Sean Busbey wrote:
>> thank you for making a more digestible version Allen. :)
>>
>> If you're interested in soliciting feedback from other projects, I
>>created
>> ASF short links to this thread in common-dev and hbase:
>>
>>
>> * http://s.apache.org/yetus-discuss-hadoop
>> * http://s.apache.org/yetus-discuss-hbase
>>
>> While I agree that it's important to get feedback from ASF projects that
>> might find this useful, I can say that recently I've been involved in
>>the
>> non-ASF project YCSB and both the pretest and better shell stuff would
>>be
>> immensely useful over there.
>>
>> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<aw...@altiscale.com>
>>wrote:
>>
>>>          I'm clearly +1 on this idea.  As part of the rewrite in
>>>Hadoop of
>>> test-patch, it was amazing to see how far and wide this bit of code as
>>> spread.  So I see consolidating everyone's efforts as a huge win for a
>>> large number of projects.  (esp considering how many I saw suffering
>>>from a
>>> variety of identified bugs! )
>>>
>>>          But….
>>>
>>>          I think it's important for people involved in those other
>>>projects
>>> to speak up and voice an opinion as to whether this is useful.
>>>
>>> To summarize:
>>>
>>>          In the short term, a single location to get/use a precommit
>>>patch
>>> tester rather than everyone building/supporting their own in their
>>>spare
>>> time.
>>>
>>>           FWIW, we've already got the code base modified to be
>>>pluggable.
>>> We've written some basic/simple plugins that support Hadoop, HBase,
>>>Tajo,
>>> Tez, Pig, and Flink.  For HBase and Flink, this does include their
>>>custom
>>> checks.  Adding support for other project shouldn't be hard.  Simple
>>> projects take almost no time after seeing the basic pattern.
>>>
>>>          I think it's worthwhile highlighting that means support for
>>>both
>>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>>
>>> Longer term:
>>>
>>>          Well, we clearly have ideas of things that we want to do.
>>>Adding
>>> more features to test-patch (review board? gradle?) is obvious. But
>>>what
>>> about teasing apart and generalizing some of the other shell bits from
>>> projects? A common library for building CLI tools to fault injection to
>>> release documentation creation tools to …  I'd even like to see us get
>>>as
>>> advanced as a "run this program to auto-generate daemon stop/start
>>>bits".
>>>
>>>          I had a few chats with people about this idea at Hadoop
>>>Summit.
>>> What's truly exciting are the ideas that people had once they realized
>>>what
>>> kinds of problems we're trying to solve.  It's always amazing the
>>>problems
>>> that projects have that could be solved by these types of solutions.
>>>Let's
>>> stop hiding our cool toys in this area.
>>>
>>>          So, what feedback and ideas do you have in this area?  Are
>>>you a
>>> yay or a nay?
>>>
>>>
>>> On Jun 15, 2015, at 4:47 PM, Sean Busbey<bu...@cloudera.com>  wrote:
>>>
>>>> Oof. I had meant to push on this again but life got in the way and now
>>> the
>>>> June board meeting is upon us. Sorry everyone. In the event that this
>>> ends
>>>> up contentious, hopefully one of the copied communities can give us a
>>>> branch to work in.
>>>>
>>>> I know everyone is busy, so here's the short version of this email:
>>>>I'd
>>>> like to move some of the code currently in Hadoop (test-patch) into a
>>>>new
>>>> TLP focused on QA tooling. I'm not sure what the best format for
>>>>priming
>>>> this conversation is. ORC filled in the incubator project proposal
>>>> template, but I'm not sure how much that confused the issue. So to
>>>>start,
>>>> I'll just write what I'm hoping we can accomplish in general terms
>>>>here.
>>>>
>>>> All software development projects that are community based (that is,
>>>> accepting outside contributions) face a common QA problem for vetting
>>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>>> popular that the weight of the problem drove tool development (i.e.
>>>> test-patch). That tool is generalizable enough that a bunch of other
>>>>TLPs
>>>> have adopted their own forks. Unfortunately, in most projects this
>>>>kind
>>> of
>>>> QA work is an enabler rather than a primary concern, so often the
>>>>tooling
>>>> is worked on ad-hoc and little shared improvements happen across
>>>> projects. Since
>>>> the tooling itself is never a primary concern, any made is rarely
>>>>reused
>>>> outside of ASF projects.
>>>>
>>>> Over the last couple months a few of us have been working on
>>>>generalizing
>>>> the tooling present in the Hadoop code base (because it was the most
>>> mature
>>>> out of all those in the various projects) and it's reached a point
>>>>where
>>> we
>>>> think we can start bringing on other downstream users. This means we
>>>>need
>>>> to start establishing things like a release cadence and to grow the
>>>>new
>>>> contributors we have to handle more project responsibility.
>>>>Personally, I
>>>> think that means it's time to move out from under Hadoop to drive
>>>>things
>>> as
>>>> our own community. Eventually, I hope the community can help draw in a
>>>> group of folks traditionally underrepresented in ASF projects, namely
>>>>QA
>>>> and operations folks.
>>>>
>>>> I think test-patch by itself has enough scope to justify a project.
>>> Having
>>>> a solid set of build tools that are customizable to fit the norms of
>>>> different software communities is a bunch of work. Making it work
>>>>well in
>>>> both the context of automated test systems like Jenkins and for
>>> individual
>>>> developers is even more work. We could easily also take over
>>>>maintenance
>>> of
>>>> things like shelldocs, since test-patch is the primary consumer of
>>>>that
>>>> currently but it's generally useful tooling.
>>>>
>>>> In addition to test-patch, I think the proposed project has some
>>>>future
>>>> growth potential. Given some adoption of test-patch to prove utility,
>>>>the
>>>> project could build on the ties it makes to start building tools to
>>>>help
>>>> projects do their own longer-run testing. Note that I'm talking about
>>>>the
>>>> tools to build QA processes and not a particular set of tested
>>> components.
>>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>>> generalizable as a fault injection framework (either based on that
>>>>code
>>> or
>>>> something like it). Doing this for arbitrary software is obviously
>>>>very
>>>> difficult, and a part of easing that will be to make (and then favor)
>>>> tooling to allow projects to have operational glue that looks the
>>>>same.
>>>> Namely, the shell work that's been done in hadoop-functions.sh would
>>>>be a
>>>> great foundational layer that could bring good daemon handling
>>>>practices
>>> to
>>>> a whole slew of software projects. In the event that these frameworks
>>>>and
>>>> tools get adopted by parts of the Hadoop ecosystem, that could make
>>>>the
>>> job
>>>> of i.e. Bigtop substantially easier.
>>>>
>>>> I've reached out to a few folks who have been involved in the current
>>>> test-patch work or expressed interest in helping out on getting it
>>>>used
>>> in
>>>> other projects. Right now, the proposed PMC would be (alphabetical by
>>> last
>>>> name):
>>>>
>>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>>>jclouds
>>>> pmc, sqoop pmc, all around Jenkins expert)
>>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>>> phoenix pmc)
>>>> * Allen Wittenauer (hadoop committer)
>>>>
>>>> That PMC gives us several members and a bunch of folks familiar with
>>>>the
>>>> ASF. Combined with the code already existing in Apache spaces, I think
>>> that
>>>> gives us sufficient justification for a direct board proposal.
>>>>
>>>> The planned project name is "Apache Yetus". It's an archaic genus of
>>>>sea
>>>> snail and most of our project will be focused on shell scripts.
>>>>
>>>> N.b.: this does not mean that the Hadoop community would _have_ to
>>>>rely
>>> on
>>>> the new TLP, but I hope that once we have a release that can be
>>>>evaluated
>>>> there'd be enough benefit to strongly encourage it.
>>>>
>>>> This has mostly been focused on scope and community issues, and I'd
>>>>love
>>> to
>>>> talk through any feedback on that. Additionally, are there any other
>>> points
>>>> folks want to make sure are covered before we have a resolution?
>>>>
>>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>>> Hi Folks!
>>>>>>
>>>>>> After working on test-patch with other folks for the last few
>>>>>>months, I
>>>>>> think we've reached the point where we can make the fastest progress
>>>>>> towards the goal of a general use pre-commit patch tester by
>>>>>>spinning
>>>>>> things into a project focused on just that. I think we have a mature
>>> enough
>>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>>> together a tlp proposal.
>>>>>>
>>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we
>>>>>>can
>>>>>> continue to make things more useful.
>>>>>>
>>>>>> -Sean
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<bu...@cloudera.com>
>>> wrote:
>>>>>>> HBase's dev-support folder is where the scripts and support files
>>> live.
>>>>>>> We've only recently started adding anything to the maven builds
>>>>>>>that's
>>>>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>> where I'd
>>>>>>> add in more if we ran into the same permissions problems y'all are
>>> having.
>>>>>>> There's also our precommit job itself, though it isn't large[2].
>>> AFAIK,
>>>>>>> we don't properly back this up anywhere, we just notify each other
>>>>>>>of
>>>>>>> changes on a particular mail thread[3].
>>>>>>>
>>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>> all
>>>>>>> read because I just finished fixing "mvn site" running out of
>>>>>>>permgen)
>>>>>>> [3]: http://s.apache.org/NT0
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<
>>> cnauroth@hortonworks.com
>>>>>>>> wrote:
>>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in
>>>>>>>>the
>>>>>>>> HBase
>>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>>
>>>>>>>> Chris Nauroth
>>>>>>>> Hortonworks
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey"<bu...@cloudera.com>  wrote:
>>>>>>>>
>>>>>>>>> +dev@hbase
>>>>>>>>>
>>>>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
>>> make
>>>>>>>>> them
>>>>>>>>> more robust. From what I can tell our stuff started off as an
>>> earlier
>>>>>>>>> version of what Hadoop uses for testing.
>>>>>>>>>
>>>>>>>>> Folks on either side open to an experiment of combining our
>>> precommit
>>>>>>>>> check
>>>>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>>>>> things.
>>>>>>>>> Naturally we'll still need different jenkins jobs to handle
>>> different
>>>>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>>>>> lives,
>>>>>>>>> but that could come later.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<
>>>>>>>> cnauroth@hortonworks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>> 
>>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-err
>>>o
>>>>>>>>>> rs
>>>>>>>>>> .html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>>>>> kinds of
>>>>>>>>>> failures could leave our build directories in an indeterminate
>>> state.
>>>>>>>>>> For
>>>>>>>>>> example, we could end up with an old class file on the classpath
>>> for
>>>>>>>>>> test
>>>>>>>>>> runs that was supposedly deleted.
>>>>>>>>>>
>>>>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>>> failure
>>>>>>>>>> by placing a file where the code expects to see a directory.
>>>>>>>>>>That
>>>>>>>> might
>>>>>>>>>> even let us enable some of these tests that are skipped on
>>>>>>>>>>Windows,
>>>>>>>>>> because Windows allows access for the owner even after
>>>>>>>>>>permissions
>>>>>>>> have
>>>>>>>>>> been stripped.
>>>>>>>>>>
>>>>>>>>>> Chris Nauroth
>>>>>>>>>> Hortonworks
>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe"<cm...@alumni.cmu.edu>
>>> wrote:
>>>>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>>>>> directories that have no executable permissions on them?
>>>>>>>>>>>Clearly
>>> we
>>>>>>>>>>> have the permission to do this from a technical point of view
>>> (since
>>>>>>>>>>> we created the directories as the jenkins user), it's simply
>>>>>>>>>>>that
>>>>>>>> the
>>>>>>>>>>> code refuses to do it.
>>>>>>>>>>>
>>>>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>>>>>
>>>>>>>>>>> Colin
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<le...@cloudera.com>
>>>>>>>>>>>wrote:
>>>>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>>>>>
>>>>>>>>>>>> In HDFS-7722:
>>>>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions
>>>>>>>>>>>>in
>>>>>>>>>>>> TearDown().
>>>>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>>>>>>>clause.
>>>>>>>>>>>>
>>>>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>>>>> passed.
>>>>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>>>>>
>>>>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>>>>> DiskErrorException {
>>>>>>>>>>>>   if (!dir.isDirectory()) {
>>>>>>>>>>>>     throw new DiskErrorException("Not a directory: "
>>>>>>>>>>>>                                  + dir.toString());
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>>   checkAccessByFileMethods(dir);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>>>>> regular
>>>>>>>>>>>> file to stimulate disk failures.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>>>>> <cn...@hortonworks.com>  wrote:
>>>>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>>>>> permissions
>>>>>>>>>>>>> from
>>>>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>>>>> failures
>>>>>>>>>> at
>>>>>>>>>>>>> data
>>>>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
>>> appear
>>>>>>>>>> to be
>>>>>>>>>>>>> doing the necessary work to restore executable permissions at
>>> the
>>>>>>>>>> end
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
>>> makes
>>>>>>>>>>>>> changes
>>>>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks
>>>>>>>>>>>>>fine
>>>>>>>>>>>>> though.  I
>>>>>>>>>>>>> don¹t know if there are other uncommitted patches that
>>>>>>>>>>>>>changed
>>>>>>>> these
>>>>>>>>>>>>> test
>>>>>>>>>>>>> suites.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suppose it¹s also possible that the JUnit process
>>>>>>>>>>>>>unexpectedly
>>>>>>>>>> died
>>>>>>>>>>>>> after removing executable permissions but before restoring
>>>>>>>>>>>>>them.
>>>>>>>>>> That
>>>>>>>>>>>>> always would have been a weakness of these test suites,
>>>>>>>> regardless
>>>>>>>>>> of
>>>>>>>>>>>>> any
>>>>>>>>>>>>> recent changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris Nauroth
>>>>>>>>>>>>> Hortonworks
>>>>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"<at...@cloudera.com>
>>>>>>>>>>>>>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Colin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>>>>>>>>>going
>>>>>>>> on
>>>>>>>>>> with
>>>>>>>>>>>>>> these boxes. He took a look and concluded that some perms
>>>>>>>>>>>>>>are
>>>>>>>> being
>>>>>>>>>>>>>> set in
>>>>>>>>>>>>>> those directories by our unit tests which are precluding
>>>>>>>>>>>>>>those
>>>>>>>> files
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us,
>>>>>>>>>>>>>>but
>>> we
>>>>>>>>>> should
>>>>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>>>>> question
>>>>>>>>>> to
>>>>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To help narrow down which commit it was that started this,
>>> Andrew
>>>>>>>>>> sent
>>>>>>>>>>>>>> me
>>>>>>>>>>>>>> this info:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>> /
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>>>>> since
>>>>>>>>>> 9:32
>>>>>>>>>>>>>> UTC
>>>>>>>>>>>>>> on March 5th."
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>>>>> <cm...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't
>>>>>>>>>>>>>>>find
>>>>>>>> any
>>>>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most
>>>>>>>>>>>>>>>of
>>>>>>>> them
>>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>>>>> (default-clean)
>>>>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>>> delete
>>>>>>>>>>>>>>>
>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>>>>> fs
>>>>>>>>>>>>>>> -pr
>>>>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>>> ->  [Help 1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
>>> wrong
>>>>>>>>>>>>>>> permissions?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Colin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sean
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sean
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>
>>
>>
>


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Josh Elser <el...@apache.org>.
+1

(Have been talking to Sean in private on the subject -- seems 
appropriate to voice some public support)

I'd be interested in this for Accumulo and Slider. For Accumulo, we've 
come a far way without a pre-commit build, primarily due to a CTR 
process. We have seen the repeated questions of "how do I run the tests" 
which a more automated workflow would help with, IMO. I think Slider 
could benefit with the same reasons.

I'd also be giddy to see the recent improvements in Hadoop trickle down 
into the other projects that Allen already mentioned.

Take this as record that I'd be happy to try to help out where possible.

Sean Busbey wrote:
> thank you for making a more digestible version Allen. :)
>
> If you're interested in soliciting feedback from other projects, I created
> ASF short links to this thread in common-dev and hbase:
>
>
> * http://s.apache.org/yetus-discuss-hadoop
> * http://s.apache.org/yetus-discuss-hbase
>
> While I agree that it's important to get feedback from ASF projects that
> might find this useful, I can say that recently I've been involved in the
> non-ASF project YCSB and both the pretest and better shell stuff would be
> immensely useful over there.
>
> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<aw...@altiscale.com>  wrote:
>
>>          I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
>> test-patch, it was amazing to see how far and wide this bit of code as
>> spread.  So I see consolidating everyone's efforts as a huge win for a
>> large number of projects.  (esp considering how many I saw suffering from a
>> variety of identified bugs! )
>>
>>          But….
>>
>>          I think it's important for people involved in those other projects
>> to speak up and voice an opinion as to whether this is useful.
>>
>> To summarize:
>>
>>          In the short term, a single location to get/use a precommit patch
>> tester rather than everyone building/supporting their own in their spare
>> time.
>>
>>           FWIW, we've already got the code base modified to be pluggable.
>> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
>> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
>> checks.  Adding support for other project shouldn't be hard.  Simple
>> projects take almost no time after seeing the basic pattern.
>>
>>          I think it's worthwhile highlighting that means support for both
>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>
>> Longer term:
>>
>>          Well, we clearly have ideas of things that we want to do. Adding
>> more features to test-patch (review board? gradle?) is obvious. But what
>> about teasing apart and generalizing some of the other shell bits from
>> projects? A common library for building CLI tools to fault injection to
>> release documentation creation tools to …  I'd even like to see us get as
>> advanced as a "run this program to auto-generate daemon stop/start bits".
>>
>>          I had a few chats with people about this idea at Hadoop Summit.
>> What's truly exciting are the ideas that people had once they realized what
>> kinds of problems we're trying to solve.  It's always amazing the problems
>> that projects have that could be solved by these types of solutions.  Let's
>> stop hiding our cool toys in this area.
>>
>>          So, what feedback and ideas do you have in this area?  Are you a
>> yay or a nay?
>>
>>
>> On Jun 15, 2015, at 4:47 PM, Sean Busbey<bu...@cloudera.com>  wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>> the
>>> June board meeting is upon us. Sorry everyone. In the event that this
>> ends
>>> up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind
>> of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across
>>> projects. Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on generalizing
>>> the tooling present in the Hadoop code base (because it was the most
>> mature
>>> out of all those in the various projects) and it's reached a point where
>> we
>>> think we can start bringing on other downstream users. This means we need
>>> to start establishing things like a release cadence and to grow the new
>>> contributors we have to handle more project responsibility. Personally, I
>>> think that means it's time to move out from under Hadoop to drive things
>> as
>>> our own community. Eventually, I hope the community can help draw in a
>>> group of folks traditionally underrepresented in ASF projects, namely QA
>>> and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>> Having
>>> a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for
>> individual
>>> developers is even more work. We could easily also take over maintenance
>> of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested
>> components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code
>> or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices
>> to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the
>> job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used
>> in
>>> other projects. Right now, the proposed PMC would be (alphabetical by
>> last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>>> pmc, sqoop pmc, all around Jenkins expert)
>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>> phoenix pmc)
>>> * Allen Wittenauer (hadoop committer)
>>>
>>> That PMC gives us several members and a bunch of folks familiar with the
>>> ASF. Combined with the code already existing in Apache spaces, I think
>> that
>>> gives us sufficient justification for a direct board proposal.
>>>
>>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>>> snail and most of our project will be focused on shell scripts.
>>>
>>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>> on
>>> the new TLP, but I hope that once we have a release that can be evaluated
>>> there'd be enough benefit to strongly encourage it.
>>>
>>> This has mostly been focused on scope and community issues, and I'd love
>> to
>>> talk through any feedback on that. Additionally, are there any other
>> points
>>> folks want to make sure are covered before we have a resolution?
>>>
>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>>> Hi Folks!
>>>>>
>>>>> After working on test-patch with other folks for the last few months, I
>>>>> think we've reached the point where we can make the fastest progress
>>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>>> things into a project focused on just that. I think we have a mature
>> enough
>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>> together a tlp proposal.
>>>>>
>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>>> continue to make things more useful.
>>>>>
>>>>> -Sean
>>>>>
>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>>>> HBase's dev-support folder is where the scripts and support files
>> live.
>>>>>> We've only recently started adding anything to the maven builds that's
>>>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
>> where I'd
>>>>>> add in more if we ran into the same permissions problems y'all are
>> having.
>>>>>> There's also our precommit job itself, though it isn't large[2].
>> AFAIK,
>>>>>> we don't properly back this up anywhere, we just notify each other of
>>>>>> changes on a particular mail thread[3].
>>>>>>
>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>> all
>>>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>>>> [3]: http://s.apache.org/NT0
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<
>> cnauroth@hortonworks.com
>>>>>>> wrote:
>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>>> HBase
>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey"<bu...@cloudera.com>  wrote:
>>>>>>>
>>>>>>>> +dev@hbase
>>>>>>>>
>>>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
>> make
>>>>>>>> them
>>>>>>>> more robust. From what I can tell our stuff started off as an
>> earlier
>>>>>>>> version of what Hadoop uses for testing.
>>>>>>>>
>>>>>>>> Folks on either side open to an experiment of combining our
>> precommit
>>>>>>>> check
>>>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>>>> things.
>>>>>>>> Naturally we'll still need different jenkins jobs to handle
>> different
>>>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>>>> lives,
>>>>>>>> but that could come later.
>>>>>>>>
>>>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<
>>>>>>> cnauroth@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>>>> rs
>>>>>>>>> .html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>>>> kinds of
>>>>>>>>> failures could leave our build directories in an indeterminate
>> state.
>>>>>>>>> For
>>>>>>>>> example, we could end up with an old class file on the classpath
>> for
>>>>>>>>> test
>>>>>>>>> runs that was supposedly deleted.
>>>>>>>>>
>>>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>> failure
>>>>>>>>> by placing a file where the code expects to see a directory.  That
>>>>>>> might
>>>>>>>>> even let us enable some of these tests that are skipped on Windows,
>>>>>>>>> because Windows allows access for the owner even after permissions
>>>>>>> have
>>>>>>>>> been stripped.
>>>>>>>>>
>>>>>>>>> Chris Nauroth
>>>>>>>>> Hortonworks
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe"<cm...@alumni.cmu.edu>
>> wrote:
>>>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>>>> directories that have no executable permissions on them?  Clearly
>> we
>>>>>>>>>> have the permission to do this from a technical point of view
>> (since
>>>>>>>>>> we created the directories as the jenkins user), it's simply that
>>>>>>> the
>>>>>>>>>> code refuses to do it.
>>>>>>>>>>
>>>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>>>>
>>>>>>>>>> Colin
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<le...@cloudera.com>  wrote:
>>>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>>>>
>>>>>>>>>>> In HDFS-7722:
>>>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>>>>>>> TearDown().
>>>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>>>>>>>>
>>>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>>>> passed.
>>>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>>>>
>>>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>>>> DiskErrorException {
>>>>>>>>>>>   if (!dir.isDirectory()) {
>>>>>>>>>>>     throw new DiskErrorException("Not a directory: "
>>>>>>>>>>>                                  + dir.toString());
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>>   checkAccessByFileMethods(dir);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>>>> regular
>>>>>>>>>>> file to stimulate disk failures.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>>>> <cn...@hortonworks.com>  wrote:
>>>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>>>> permissions
>>>>>>>>>>>> from
>>>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>>>> failures
>>>>>>>>> at
>>>>>>>>>>>> data
>>>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
>> appear
>>>>>>>>> to be
>>>>>>>>>>>> doing the necessary work to restore executable permissions at
>> the
>>>>>>>>> end
>>>>>>>>>>>> of
>>>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
>> makes
>>>>>>>>>>>> changes
>>>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>>>>>>>> though.  I
>>>>>>>>>>>> don¹t know if there are other uncommitted patches that changed
>>>>>>> these
>>>>>>>>>>>> test
>>>>>>>>>>>> suites.
>>>>>>>>>>>>
>>>>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>>>>>> died
>>>>>>>>>>>> after removing executable permissions but before restoring them.
>>>>>>>>> That
>>>>>>>>>>>> always would have been a weakness of these test suites,
>>>>>>> regardless
>>>>>>>>> of
>>>>>>>>>>>> any
>>>>>>>>>>>> recent changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Chris Nauroth
>>>>>>>>>>>> Hortonworks
>>>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"<at...@cloudera.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Colin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>>>> on
>>>>>>>>> with
>>>>>>>>>>>>> these boxes. He took a look and concluded that some perms are
>>>>>>> being
>>>>>>>>>>>>> set in
>>>>>>>>>>>>> those directories by our unit tests which are precluding those
>>>>>>> files
>>>>>>>>>>>>> from
>>>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but
>> we
>>>>>>>>> should
>>>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>>>> question
>>>>>>>>> to
>>>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To help narrow down which commit it was that started this,
>> Andrew
>>>>>>>>> sent
>>>>>>>>>>>>> me
>>>>>>>>>>>>> this info:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>> /
>>>>>>>>>>>>> has
>>>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>>>> since
>>>>>>>>> 9:32
>>>>>>>>>>>>> UTC
>>>>>>>>>>>>> on March 5th."
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>>>> <cm...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
>>>>>>> any
>>>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>>> them
>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>>>> (default-clean)
>>>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>> delete
>>>>>>>>>>>>>>
>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>>>> fs
>>>>>>>>>>>>>> -pr
>>>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>> ->  [Help 1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
>> wrong
>>>>>>>>>>>>>> permissions?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Colin
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sean
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>> --
>>> Sean
>>
>
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Josh Elser <el...@apache.org>.
+1

(Have been talking to Sean in private on the subject -- seems 
appropriate to voice some public support)

I'd be interested in this for Accumulo and Slider. For Accumulo, we've 
come a far way without a pre-commit build, primarily due to a CTR 
process. We have seen the repeated questions of "how do I run the tests" 
which a more automated workflow would help with, IMO. I think Slider 
could benefit with the same reasons.

I'd also be giddy to see the recent improvements in Hadoop trickle down 
into the other projects that Allen already mentioned.

Take this as record that I'd be happy to try to help out where possible.

Sean Busbey wrote:
> thank you for making a more digestible version Allen. :)
>
> If you're interested in soliciting feedback from other projects, I created
> ASF short links to this thread in common-dev and hbase:
>
>
> * http://s.apache.org/yetus-discuss-hadoop
> * http://s.apache.org/yetus-discuss-hbase
>
> While I agree that it's important to get feedback from ASF projects that
> might find this useful, I can say that recently I've been involved in the
> non-ASF project YCSB and both the pretest and better shell stuff would be
> immensely useful over there.
>
> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<aw...@altiscale.com>  wrote:
>
>>          I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
>> test-patch, it was amazing to see how far and wide this bit of code as
>> spread.  So I see consolidating everyone's efforts as a huge win for a
>> large number of projects.  (esp considering how many I saw suffering from a
>> variety of identified bugs! )
>>
>>          But….
>>
>>          I think it's important for people involved in those other projects
>> to speak up and voice an opinion as to whether this is useful.
>>
>> To summarize:
>>
>>          In the short term, a single location to get/use a precommit patch
>> tester rather than everyone building/supporting their own in their spare
>> time.
>>
>>           FWIW, we've already got the code base modified to be pluggable.
>> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
>> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
>> checks.  Adding support for other project shouldn't be hard.  Simple
>> projects take almost no time after seeing the basic pattern.
>>
>>          I think it's worthwhile highlighting that means support for both
>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>
>> Longer term:
>>
>>          Well, we clearly have ideas of things that we want to do. Adding
>> more features to test-patch (review board? gradle?) is obvious. But what
>> about teasing apart and generalizing some of the other shell bits from
>> projects? A common library for building CLI tools to fault injection to
>> release documentation creation tools to …  I'd even like to see us get as
>> advanced as a "run this program to auto-generate daemon stop/start bits".
>>
>>          I had a few chats with people about this idea at Hadoop Summit.
>> What's truly exciting are the ideas that people had once they realized what
>> kinds of problems we're trying to solve.  It's always amazing the problems
>> that projects have that could be solved by these types of solutions.  Let's
>> stop hiding our cool toys in this area.
>>
>>          So, what feedback and ideas do you have in this area?  Are you a
>> yay or a nay?
>>
>>
>> On Jun 15, 2015, at 4:47 PM, Sean Busbey<bu...@cloudera.com>  wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>> the
>>> June board meeting is upon us. Sorry everyone. In the event that this
>> ends
>>> up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind
>> of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across
>>> projects. Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on generalizing
>>> the tooling present in the Hadoop code base (because it was the most
>> mature
>>> out of all those in the various projects) and it's reached a point where
>> we
>>> think we can start bringing on other downstream users. This means we need
>>> to start establishing things like a release cadence and to grow the new
>>> contributors we have to handle more project responsibility. Personally, I
>>> think that means it's time to move out from under Hadoop to drive things
>> as
>>> our own community. Eventually, I hope the community can help draw in a
>>> group of folks traditionally underrepresented in ASF projects, namely QA
>>> and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>> Having
>>> a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for
>> individual
>>> developers is even more work. We could easily also take over maintenance
>> of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested
>> components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code
>> or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices
>> to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the
>> job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used
>> in
>>> other projects. Right now, the proposed PMC would be (alphabetical by
>> last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>>> pmc, sqoop pmc, all around Jenkins expert)
>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>> phoenix pmc)
>>> * Allen Wittenauer (hadoop committer)
>>>
>>> That PMC gives us several members and a bunch of folks familiar with the
>>> ASF. Combined with the code already existing in Apache spaces, I think
>> that
>>> gives us sufficient justification for a direct board proposal.
>>>
>>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>>> snail and most of our project will be focused on shell scripts.
>>>
>>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>> on
>>> the new TLP, but I hope that once we have a release that can be evaluated
>>> there'd be enough benefit to strongly encourage it.
>>>
>>> This has mostly been focused on scope and community issues, and I'd love
>> to
>>> talk through any feedback on that. Additionally, are there any other
>> points
>>> folks want to make sure are covered before we have a resolution?
>>>
>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>>> Hi Folks!
>>>>>
>>>>> After working on test-patch with other folks for the last few months, I
>>>>> think we've reached the point where we can make the fastest progress
>>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>>> things into a project focused on just that. I think we have a mature
>> enough
>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>> together a tlp proposal.
>>>>>
>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>>> continue to make things more useful.
>>>>>
>>>>> -Sean
>>>>>
>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<bu...@cloudera.com>
>> wrote:
>>>>>> HBase's dev-support folder is where the scripts and support files
>> live.
>>>>>> We've only recently started adding anything to the maven builds that's
>>>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
>> where I'd
>>>>>> add in more if we ran into the same permissions problems y'all are
>> having.
>>>>>> There's also our precommit job itself, though it isn't large[2].
>> AFAIK,
>>>>>> we don't properly back this up anywhere, we just notify each other of
>>>>>> changes on a particular mail thread[3].
>>>>>>
>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>> all
>>>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>>>> [3]: http://s.apache.org/NT0
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<
>> cnauroth@hortonworks.com
>>>>>>> wrote:
>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>>> HBase
>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey"<bu...@cloudera.com>  wrote:
>>>>>>>
>>>>>>>> +dev@hbase
>>>>>>>>
>>>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
>> make
>>>>>>>> them
>>>>>>>> more robust. From what I can tell our stuff started off as an
>> earlier
>>>>>>>> version of what Hadoop uses for testing.
>>>>>>>>
>>>>>>>> Folks on either side open to an experiment of combining our
>> precommit
>>>>>>>> check
>>>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>>>> things.
>>>>>>>> Naturally we'll still need different jenkins jobs to handle
>> different
>>>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>>>> lives,
>>>>>>>> but that could come later.
>>>>>>>>
>>>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<
>>>>>>> cnauroth@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>>>> rs
>>>>>>>>> .html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>>>> kinds of
>>>>>>>>> failures could leave our build directories in an indeterminate
>> state.
>>>>>>>>> For
>>>>>>>>> example, we could end up with an old class file on the classpath
>> for
>>>>>>>>> test
>>>>>>>>> runs that was supposedly deleted.
>>>>>>>>>
>>>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>> failure
>>>>>>>>> by placing a file where the code expects to see a directory.  That
>>>>>>> might
>>>>>>>>> even let us enable some of these tests that are skipped on Windows,
>>>>>>>>> because Windows allows access for the owner even after permissions
>>>>>>> have
>>>>>>>>> been stripped.
>>>>>>>>>
>>>>>>>>> Chris Nauroth
>>>>>>>>> Hortonworks
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe"<cm...@alumni.cmu.edu>
>> wrote:
>>>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>>>> directories that have no executable permissions on them?  Clearly
>> we
>>>>>>>>>> have the permission to do this from a technical point of view
>> (since
>>>>>>>>>> we created the directories as the jenkins user), it's simply that
>>>>>>> the
>>>>>>>>>> code refuses to do it.
>>>>>>>>>>
>>>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>>>>
>>>>>>>>>> Colin
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<le...@cloudera.com>  wrote:
>>>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>>>>
>>>>>>>>>>> In HDFS-7722:
>>>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>>>>>>> TearDown().
>>>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>>>>>>>>
>>>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>>>> passed.
>>>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>>>>
>>>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>>>> DiskErrorException {
>>>>>>>>>>>   if (!dir.isDirectory()) {
>>>>>>>>>>>     throw new DiskErrorException("Not a directory: "
>>>>>>>>>>>                                  + dir.toString());
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>>   checkAccessByFileMethods(dir);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>>>> regular
>>>>>>>>>>> file to stimulate disk failures.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>>>> <cn...@hortonworks.com>  wrote:
>>>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>>>> permissions
>>>>>>>>>>>> from
>>>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>>>> failures
>>>>>>>>> at
>>>>>>>>>>>> data
>>>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
>> appear
>>>>>>>>> to be
>>>>>>>>>>>> doing the necessary work to restore executable permissions at
>> the
>>>>>>>>> end
>>>>>>>>>>>> of
>>>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
>> makes
>>>>>>>>>>>> changes
>>>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>>>>>>>> though.  I
>>>>>>>>>>>> don¹t know if there are other uncommitted patches that changed
>>>>>>> these
>>>>>>>>>>>> test
>>>>>>>>>>>> suites.
>>>>>>>>>>>>
>>>>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>>>>>> died
>>>>>>>>>>>> after removing executable permissions but before restoring them.
>>>>>>>>> That
>>>>>>>>>>>> always would have been a weakness of these test suites,
>>>>>>> regardless
>>>>>>>>> of
>>>>>>>>>>>> any
>>>>>>>>>>>> recent changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Chris Nauroth
>>>>>>>>>>>> Hortonworks
>>>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"<at...@cloudera.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Colin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>>>> on
>>>>>>>>> with
>>>>>>>>>>>>> these boxes. He took a look and concluded that some perms are
>>>>>>> being
>>>>>>>>>>>>> set in
>>>>>>>>>>>>> those directories by our unit tests which are precluding those
>>>>>>> files
>>>>>>>>>>>>> from
>>>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but
>> we
>>>>>>>>> should
>>>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>>>> question
>>>>>>>>> to
>>>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To help narrow down which commit it was that started this,
>> Andrew
>>>>>>>>> sent
>>>>>>>>>>>>> me
>>>>>>>>>>>>> this info:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>> /
>>>>>>>>>>>>> has
>>>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>>>> since
>>>>>>>>> 9:32
>>>>>>>>>>>>> UTC
>>>>>>>>>>>>> on March 5th."
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>>>> <cm...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
>>>>>>> any
>>>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>>> them
>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>>>> (default-clean)
>>>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>> delete
>>>>>>>>>>>>>>
>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>>>> fs
>>>>>>>>>>>>>> -pr
>>>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>> ->  [Help 1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
>> wrong
>>>>>>>>>>>>>> permissions?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Colin
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sean
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>> --
>>> Sean
>>
>
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:


* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
>         I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
> test-patch, it was amazing to see how far and wide this bit of code as
> spread.  So I see consolidating everyone's efforts as a huge win for a
> large number of projects.  (esp considering how many I saw suffering from a
> variety of identified bugs! )
>
>         But….
>
>         I think it's important for people involved in those other projects
> to speak up and voice an opinion as to whether this is useful.
>
> To summarize:
>
>         In the short term, a single location to get/use a precommit patch
> tester rather than everyone building/supporting their own in their spare
> time.
>
>          FWIW, we've already got the code base modified to be pluggable.
> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
> checks.  Adding support for other project shouldn't be hard.  Simple
> projects take almost no time after seeing the basic pattern.
>
>         I think it's worthwhile highlighting that means support for both
> JIRA and GitHub as well as Ant and Maven from the same code base.
>
> Longer term:
>
>         Well, we clearly have ideas of things that we want to do. Adding
> more features to test-patch (review board? gradle?) is obvious. But what
> about teasing apart and generalizing some of the other shell bits from
> projects? A common library for building CLI tools to fault injection to
> release documentation creation tools to …  I'd even like to see us get as
> advanced as a "run this program to auto-generate daemon stop/start bits".
>
>         I had a few chats with people about this idea at Hadoop Summit.
> What's truly exciting are the ideas that people had once they realized what
> kinds of problems we're trying to solve.  It's always amazing the problems
> that projects have that could be solved by these types of solutions.  Let's
> stop hiding our cool toys in this area.
>
>         So, what feedback and ideas do you have in this area?  Are you a
> yay or a nay?
>
>
> On Jun 15, 2015, at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Dimiduk (hbase pmc, phoenix pmc)
> > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> > * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> > phoenix pmc)
> > * Allen Wittenauer (hadoop committer)
> >
> > That PMC gives us several members and a bunch of folks familiar with the
> > ASF. Combined with the code already existing in Apache spaces, I think
> that
> > gives us sufficient justification for a direct board proposal.
> >
> > The planned project name is "Apache Yetus". It's an archaic genus of sea
> > snail and most of our project will be focused on shell scripts.
> >
> > N.b.: this does not mean that the Hadoop community would _have_ to rely
> on
> > the new TLP, but I hope that once we have a release that can be evaluated
> > there'd be enough benefit to strongly encourage it.
> >
> > This has mostly been focused on scope and community issues, and I'd love
> to
> > talk through any feedback on that. Additionally, are there any other
> points
> > folks want to make sure are covered before we have a resolution?
> >
> > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> >> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >>
> >>
> >>
> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>
> >>> Hi Folks!
> >>>
> >>> After working on test-patch with other folks for the last few months, I
> >>> think we've reached the point where we can make the fastest progress
> >>> towards the goal of a general use pre-commit patch tester by spinning
> >>> things into a project focused on just that. I think we have a mature
> enough
> >>> code base and a sufficient fledgling community, so I'm going to put
> >>> together a tlp proposal.
> >>>
> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >>> continue to make things more useful.
> >>>
> >>> -Sean
> >>>
> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>>
> >>>> HBase's dev-support folder is where the scripts and support files
> live.
> >>>> We've only recently started adding anything to the maven builds that's
> >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
> where I'd
> >>>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>>
> >>>> There's also our precommit job itself, though it isn't large[2].
> AFAIK,
> >>>> we don't properly back this up anywhere, we just notify each other of
> >>>> changes on a particular mail thread[3].
> >>>>
> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
> all
> >>>> read because I just finished fixing "mvn site" running out of permgen)
> >>>> [3]: http://s.apache.org/NT0
> >>>>
> >>>>
> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com
> >>>>> wrote:
> >>>>
> >>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>>> HBase
> >>>>> repo?  Is there any additional context we need to be aware of?
> >>>>>
> >>>>> Chris Nauroth
> >>>>> Hortonworks
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
> >>>>>
> >>>>>> +dev@hbase
> >>>>>>
> >>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>>>> them
> >>>>>> more robust. From what I can tell our stuff started off as an
> earlier
> >>>>>> version of what Hadoop uses for testing.
> >>>>>>
> >>>>>> Folks on either side open to an experiment of combining our
> precommit
> >>>>>> check
> >>>>>> tooling? In principle we should be looking for the same kinds of
> >>>>> things.
> >>>>>>
> >>>>>> Naturally we'll still need different jenkins jobs to handle
> different
> >>>>>> resource needs and we'd need to figure out where stuff eventually
> >>>>> lives,
> >>>>>> but that could come later.
> >>>>>>
> >>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>>> cnauroth@hortonworks.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The only thing I'm aware of is the failOnError option:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>>>>> rs
> >>>>>>> .html
> >>>>>>>
> >>>>>>>
> >>>>>>> I prefer that we don't disable this, because ignoring different
> >>>>> kinds of
> >>>>>>> failures could leave our build directories in an indeterminate
> state.
> >>>>>>> For
> >>>>>>> example, we could end up with an old class file on the classpath
> for
> >>>>>>> test
> >>>>>>> runs that was supposedly deleted.
> >>>>>>>
> >>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
> >>>>> failure
> >>>>>>> by placing a file where the code expects to see a directory.  That
> >>>>> might
> >>>>>>> even let us enable some of these tests that are skipped on Windows,
> >>>>>>> because Windows allows access for the owner even after permissions
> >>>>> have
> >>>>>>> been stripped.
> >>>>>>>
> >>>>>>> Chris Nauroth
> >>>>>>> Hortonworks
> >>>>>>> http://hortonworks.com/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> wrote:
> >>>>>>>
> >>>>>>>> Is there a maven plugin or setting we can use to simply remove
> >>>>>>>> directories that have no executable permissions on them?  Clearly
> we
> >>>>>>>> have the permission to do this from a technical point of view
> (since
> >>>>>>>> we created the directories as the jenkins user), it's simply that
> >>>>> the
> >>>>>>>> code refuses to do it.
> >>>>>>>>
> >>>>>>>> Otherwise I guess we can just fix those tests...
> >>>>>>>>
> >>>>>>>> Colin
> >>>>>>>>
> >>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
> >>>>>>>>>
> >>>>>>>>> In HDFS-7722:
> >>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>>>>>>> TearDown().
> >>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>>>>>>>>
> >>>>>>>>> Also I ran mvn test several times on my machine and all tests
> >>>>> passed.
> >>>>>>>>>
> >>>>>>>>> However, since in DiskChecker#checkDirAccess():
> >>>>>>>>>
> >>>>>>>>> private static void checkDirAccess(File dir) throws
> >>>>>>> DiskErrorException {
> >>>>>>>>>  if (!dir.isDirectory()) {
> >>>>>>>>>    throw new DiskErrorException("Not a directory: "
> >>>>>>>>>                                 + dir.toString());
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  checkAccessByFileMethods(dir);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> One potentially safer alternative is replacing data dir with a
> >>>>>>> regular
> >>>>>>>>> file to stimulate disk failures.
> >>>>>>>>>
> >>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>>>>>>> <cn...@hortonworks.com> wrote:
> >>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>>>>>>> TestDataNodeVolumeFailureReporting, and
> >>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
> >>>>>>> permissions
> >>>>>>>>>> from
> >>>>>>>>>> directories like the one Colin mentioned to simulate disk
> >>>>> failures
> >>>>>>> at
> >>>>>>>>>> data
> >>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>>>>> to be
> >>>>>>>>>> doing the necessary work to restore executable permissions at
> the
> >>>>>>> end
> >>>>>>>>>> of
> >>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>>>>>>> changes
> >>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>>>>>>>> though.  I
> >>>>>>>>>> don¹t know if there are other uncommitted patches that changed
> >>>>> these
> >>>>>>>>>> test
> >>>>>>>>>> suites.
> >>>>>>>>>>
> >>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>>>>>> died
> >>>>>>>>>> after removing executable permissions but before restoring them.
> >>>>>>> That
> >>>>>>>>>> always would have been a weakness of these test suites,
> >>>>> regardless
> >>>>>>> of
> >>>>>>>>>> any
> >>>>>>>>>> recent changes.
> >>>>>>>>>>
> >>>>>>>>>> Chris Nauroth
> >>>>>>>>>> Hortonworks
> >>>>>>>>>> http://hortonworks.com/
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hey Colin,
> >>>>>>>>>>>
> >>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
> >>>>> on
> >>>>>>> with
> >>>>>>>>>>> these boxes. He took a look and concluded that some perms are
> >>>>> being
> >>>>>>>>>>> set in
> >>>>>>>>>>> those directories by our unit tests which are precluding those
> >>>>> files
> >>>>>>>>>>> from
> >>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>>>> should
> >>>>>>>>>>> expect this to keep happening until we can fix the test in
> >>>>> question
> >>>>>>> to
> >>>>>>>>>>> properly clean up after itself.
> >>>>>>>>>>>
> >>>>>>>>>>> To help narrow down which commit it was that started this,
> Andrew
> >>>>>>> sent
> >>>>>>>>>>> me
> >>>>>>>>>>> this info:
> >>>>>>>>>>>
> >>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>>>>
> >>>>>
> >>>>>>>>>>>
> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>> /
> >>>>>>>>>>> has
> >>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
> >>>>> since
> >>>>>>> 9:32
> >>>>>>>>>>> UTC
> >>>>>>>>>>> on March 5th."
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Aaron T. Myers
> >>>>>>>>>>> Software Engineer, Cloudera
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>>>>> <cm...@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
> >>>>> any
> >>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>>> them
> >>>>>>>>>>>> seem
> >>>>>>>>>>>> to be failing with some variant of this message:
> >>>>>>>>>>>>
> >>>>>>>>>>>> [ERROR] Failed to execute goal
> >>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>>>> (default-clean)
> >>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>>> delete
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>>>>>>>>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>>>>>> fs
> >>>>>>>>>>>> -pr
> >>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>>> -> [Help 1]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>>>>>>>>> permissions?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Colin
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Lei (Eddy) Xu
> >>>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Sean
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
>
>


-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:


* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
>         I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
> test-patch, it was amazing to see how far and wide this bit of code as
> spread.  So I see consolidating everyone's efforts as a huge win for a
> large number of projects.  (esp considering how many I saw suffering from a
> variety of identified bugs! )
>
>         But….
>
>         I think it's important for people involved in those other projects
> to speak up and voice an opinion as to whether this is useful.
>
> To summarize:
>
>         In the short term, a single location to get/use a precommit patch
> tester rather than everyone building/supporting their own in their spare
> time.
>
>          FWIW, we've already got the code base modified to be pluggable.
> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
> checks.  Adding support for other project shouldn't be hard.  Simple
> projects take almost no time after seeing the basic pattern.
>
>         I think it's worthwhile highlighting that means support for both
> JIRA and GitHub as well as Ant and Maven from the same code base.
>
> Longer term:
>
>         Well, we clearly have ideas of things that we want to do. Adding
> more features to test-patch (review board? gradle?) is obvious. But what
> about teasing apart and generalizing some of the other shell bits from
> projects? A common library for building CLI tools to fault injection to
> release documentation creation tools to …  I'd even like to see us get as
> advanced as a "run this program to auto-generate daemon stop/start bits".
>
>         I had a few chats with people about this idea at Hadoop Summit.
> What's truly exciting are the ideas that people had once they realized what
> kinds of problems we're trying to solve.  It's always amazing the problems
> that projects have that could be solved by these types of solutions.  Let's
> stop hiding our cool toys in this area.
>
>         So, what feedback and ideas do you have in this area?  Are you a
> yay or a nay?
>
>
> On Jun 15, 2015, at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Dimiduk (hbase pmc, phoenix pmc)
> > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> > * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> > phoenix pmc)
> > * Allen Wittenauer (hadoop committer)
> >
> > That PMC gives us several members and a bunch of folks familiar with the
> > ASF. Combined with the code already existing in Apache spaces, I think
> that
> > gives us sufficient justification for a direct board proposal.
> >
> > The planned project name is "Apache Yetus". It's an archaic genus of sea
> > snail and most of our project will be focused on shell scripts.
> >
> > N.b.: this does not mean that the Hadoop community would _have_ to rely
> on
> > the new TLP, but I hope that once we have a release that can be evaluated
> > there'd be enough benefit to strongly encourage it.
> >
> > This has mostly been focused on scope and community issues, and I'd love
> to
> > talk through any feedback on that. Additionally, are there any other
> points
> > folks want to make sure are covered before we have a resolution?
> >
> > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> >> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >>
> >>
> >>
> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>
> >>> Hi Folks!
> >>>
> >>> After working on test-patch with other folks for the last few months, I
> >>> think we've reached the point where we can make the fastest progress
> >>> towards the goal of a general use pre-commit patch tester by spinning
> >>> things into a project focused on just that. I think we have a mature
> enough
> >>> code base and a sufficient fledgling community, so I'm going to put
> >>> together a tlp proposal.
> >>>
> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >>> continue to make things more useful.
> >>>
> >>> -Sean
> >>>
> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>>
> >>>> HBase's dev-support folder is where the scripts and support files
> live.
> >>>> We've only recently started adding anything to the maven builds that's
> >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
> where I'd
> >>>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>>
> >>>> There's also our precommit job itself, though it isn't large[2].
> AFAIK,
> >>>> we don't properly back this up anywhere, we just notify each other of
> >>>> changes on a particular mail thread[3].
> >>>>
> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
> all
> >>>> read because I just finished fixing "mvn site" running out of permgen)
> >>>> [3]: http://s.apache.org/NT0
> >>>>
> >>>>
> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com
> >>>>> wrote:
> >>>>
> >>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>>> HBase
> >>>>> repo?  Is there any additional context we need to be aware of?
> >>>>>
> >>>>> Chris Nauroth
> >>>>> Hortonworks
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
> >>>>>
> >>>>>> +dev@hbase
> >>>>>>
> >>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>>>> them
> >>>>>> more robust. From what I can tell our stuff started off as an
> earlier
> >>>>>> version of what Hadoop uses for testing.
> >>>>>>
> >>>>>> Folks on either side open to an experiment of combining our
> precommit
> >>>>>> check
> >>>>>> tooling? In principle we should be looking for the same kinds of
> >>>>> things.
> >>>>>>
> >>>>>> Naturally we'll still need different jenkins jobs to handle
> different
> >>>>>> resource needs and we'd need to figure out where stuff eventually
> >>>>> lives,
> >>>>>> but that could come later.
> >>>>>>
> >>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>>> cnauroth@hortonworks.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The only thing I'm aware of is the failOnError option:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>>>>> rs
> >>>>>>> .html
> >>>>>>>
> >>>>>>>
> >>>>>>> I prefer that we don't disable this, because ignoring different
> >>>>> kinds of
> >>>>>>> failures could leave our build directories in an indeterminate
> state.
> >>>>>>> For
> >>>>>>> example, we could end up with an old class file on the classpath
> for
> >>>>>>> test
> >>>>>>> runs that was supposedly deleted.
> >>>>>>>
> >>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
> >>>>> failure
> >>>>>>> by placing a file where the code expects to see a directory.  That
> >>>>> might
> >>>>>>> even let us enable some of these tests that are skipped on Windows,
> >>>>>>> because Windows allows access for the owner even after permissions
> >>>>> have
> >>>>>>> been stripped.
> >>>>>>>
> >>>>>>> Chris Nauroth
> >>>>>>> Hortonworks
> >>>>>>> http://hortonworks.com/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> wrote:
> >>>>>>>
> >>>>>>>> Is there a maven plugin or setting we can use to simply remove
> >>>>>>>> directories that have no executable permissions on them?  Clearly
> we
> >>>>>>>> have the permission to do this from a technical point of view
> (since
> >>>>>>>> we created the directories as the jenkins user), it's simply that
> >>>>> the
> >>>>>>>> code refuses to do it.
> >>>>>>>>
> >>>>>>>> Otherwise I guess we can just fix those tests...
> >>>>>>>>
> >>>>>>>> Colin
> >>>>>>>>
> >>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
> >>>>>>>>>
> >>>>>>>>> In HDFS-7722:
> >>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>>>>>>> TearDown().
> >>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>>>>>>>>
> >>>>>>>>> Also I ran mvn test several times on my machine and all tests
> >>>>> passed.
> >>>>>>>>>
> >>>>>>>>> However, since in DiskChecker#checkDirAccess():
> >>>>>>>>>
> >>>>>>>>> private static void checkDirAccess(File dir) throws
> >>>>>>> DiskErrorException {
> >>>>>>>>>  if (!dir.isDirectory()) {
> >>>>>>>>>    throw new DiskErrorException("Not a directory: "
> >>>>>>>>>                                 + dir.toString());
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  checkAccessByFileMethods(dir);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> One potentially safer alternative is replacing data dir with a
> >>>>>>> regular
> >>>>>>>>> file to stimulate disk failures.
> >>>>>>>>>
> >>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>>>>>>> <cn...@hortonworks.com> wrote:
> >>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>>>>>>> TestDataNodeVolumeFailureReporting, and
> >>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
> >>>>>>> permissions
> >>>>>>>>>> from
> >>>>>>>>>> directories like the one Colin mentioned to simulate disk
> >>>>> failures
> >>>>>>> at
> >>>>>>>>>> data
> >>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>>>>> to be
> >>>>>>>>>> doing the necessary work to restore executable permissions at
> the
> >>>>>>> end
> >>>>>>>>>> of
> >>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>>>>>>> changes
> >>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>>>>>>>> though.  I
> >>>>>>>>>> don¹t know if there are other uncommitted patches that changed
> >>>>> these
> >>>>>>>>>> test
> >>>>>>>>>> suites.
> >>>>>>>>>>
> >>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>>>>>> died
> >>>>>>>>>> after removing executable permissions but before restoring them.
> >>>>>>> That
> >>>>>>>>>> always would have been a weakness of these test suites,
> >>>>> regardless
> >>>>>>> of
> >>>>>>>>>> any
> >>>>>>>>>> recent changes.
> >>>>>>>>>>
> >>>>>>>>>> Chris Nauroth
> >>>>>>>>>> Hortonworks
> >>>>>>>>>> http://hortonworks.com/
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hey Colin,
> >>>>>>>>>>>
> >>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
> >>>>> on
> >>>>>>> with
> >>>>>>>>>>> these boxes. He took a look and concluded that some perms are
> >>>>> being
> >>>>>>>>>>> set in
> >>>>>>>>>>> those directories by our unit tests which are precluding those
> >>>>> files
> >>>>>>>>>>> from
> >>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>>>> should
> >>>>>>>>>>> expect this to keep happening until we can fix the test in
> >>>>> question
> >>>>>>> to
> >>>>>>>>>>> properly clean up after itself.
> >>>>>>>>>>>
> >>>>>>>>>>> To help narrow down which commit it was that started this,
> Andrew
> >>>>>>> sent
> >>>>>>>>>>> me
> >>>>>>>>>>> this info:
> >>>>>>>>>>>
> >>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>>>>
> >>>>>
> >>>>>>>>>>>
> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>> /
> >>>>>>>>>>> has
> >>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
> >>>>> since
> >>>>>>> 9:32
> >>>>>>>>>>> UTC
> >>>>>>>>>>> on March 5th."
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Aaron T. Myers
> >>>>>>>>>>> Software Engineer, Cloudera
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>>>>> <cm...@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
> >>>>> any
> >>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>>> them
> >>>>>>>>>>>> seem
> >>>>>>>>>>>> to be failing with some variant of this message:
> >>>>>>>>>>>>
> >>>>>>>>>>>> [ERROR] Failed to execute goal
> >>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>>>> (default-clean)
> >>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>>> delete
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>>>>>>>>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>>>>>> fs
> >>>>>>>>>>>> -pr
> >>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>>> -> [Help 1]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>>>>>>>>> permissions?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Colin
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Lei (Eddy) Xu
> >>>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Sean
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
>
>


-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Allen Wittenauer <aw...@altiscale.com>.
	I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of test-patch, it was amazing to see how far and wide this bit of code as spread.  So I see consolidating everyone's efforts as a huge win for a large number of projects.  (esp considering how many I saw suffering from a variety of identified bugs! )

	But….

	I think it's important for people involved in those other projects to speak up and voice an opinion as to whether this is useful. 

To summarize:

	In the short term, a single location to get/use a precommit patch tester rather than everyone building/supporting their own in their spare time. 

	 FWIW, we've already got the code base modified to be pluggable.  We've written some basic/simple plugins that support Hadoop, HBase, Tajo, Tez, Pig, and Flink.  For HBase and Flink, this does include their custom checks.  Adding support for other project shouldn't be hard.  Simple projects take almost no time after seeing the basic pattern.

	I think it's worthwhile highlighting that means support for both JIRA and GitHub as well as Ant and Maven from the same code base.

Longer term:

	Well, we clearly have ideas of things that we want to do. Adding more features to test-patch (review board? gradle?) is obvious. But what about teasing apart and generalizing some of the other shell bits from projects? A common library for building CLI tools to fault injection to release documentation creation tools to …  I'd even like to see us get as advanced as a "run this program to auto-generate daemon stop/start bits".

	I had a few chats with people about this idea at Hadoop Summit.  What's truly exciting are the ideas that people had once they realized what kinds of problems we're trying to solve.  It's always amazing the problems that projects have that could be solved by these types of solutions.  Let's stop hiding our cool toys in this area.

	So, what feedback and ideas do you have in this area?  Are you a yay or a nay?


On Jun 15, 2015, at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
> 
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
> 
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
> 
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
> 
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
> 
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
> 
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
> 
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
> 
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
> 
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
> 
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
> 
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
> 
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
> 
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>> 
>> 
>> 
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>> 
>>> Hi Folks!
>>> 
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>> 
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>> 
>>> -Sean
>>> 
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>> 
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>> 
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>> 
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>> 
>>>> 
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>>> wrote:
>>>> 
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>> 
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>> 
>>>>>> +dev@hbase
>>>>>> 
>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>>> them
>>>>>> more robust. From what I can tell our stuff started off as an earlier
>>>>>> version of what Hadoop uses for testing.
>>>>>> 
>>>>>> Folks on either side open to an experiment of combining our precommit
>>>>>> check
>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>>> 
>>>>>> Naturally we'll still need different jenkins jobs to handle different
>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>>> but that could come later.
>>>>>> 
>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>> rs
>>>>>>> .html
>>>>>>> 
>>>>>>> 
>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>>>> failures could leave our build directories in an indeterminate state.
>>>>>>> For
>>>>>>> example, we could end up with an old class file on the classpath for
>>>>>>> test
>>>>>>> runs that was supposedly deleted.
>>>>>>> 
>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>>>> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>>>> even let us enable some of these tests that are skipped on Windows,
>>>>>>> because Windows allows access for the owner even after permissions
>>>>> have
>>>>>>> been stripped.
>>>>>>> 
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>>>> 
>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>> directories that have no executable permissions on them?  Clearly we
>>>>>>>> have the permission to do this from a technical point of view (since
>>>>>>>> we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>>>>> code refuses to do it.
>>>>>>>> 
>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>> 
>>>>>>>> Colin
>>>>>>>> 
>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>> 
>>>>>>>>> In HDFS-7722:
>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>>>>> TearDown().
>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>>>>>> 
>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>>>>>> 
>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>> 
>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>> DiskErrorException {
>>>>>>>>>  if (!dir.isDirectory()) {
>>>>>>>>>    throw new DiskErrorException("Not a directory: "
>>>>>>>>>                                 + dir.toString());
>>>>>>>>>  }
>>>>>>>>> 
>>>>>>>>>  checkAccessByFileMethods(dir);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>> regular
>>>>>>>>> file to stimulate disk failures.
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>> <cn...@hortonworks.com> wrote:
>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>> permissions
>>>>>>>>>> from
>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>>>> at
>>>>>>>>>> data
>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all appear
>>>>>>> to be
>>>>>>>>>> doing the necessary work to restore executable permissions at the
>>>>>>> end
>>>>>>>>>> of
>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>>>>>>> changes
>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>>>>>> though.  I
>>>>>>>>>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>>>>>>> test
>>>>>>>>>> suites.
>>>>>>>>>> 
>>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>>>> died
>>>>>>>>>> after removing executable permissions but before restoring them.
>>>>>>> That
>>>>>>>>>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>>>> of
>>>>>>>>>> any
>>>>>>>>>> recent changes.
>>>>>>>>>> 
>>>>>>>>>> Chris Nauroth
>>>>>>>>>> Hortonworks
>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hey Colin,
>>>>>>>>>>> 
>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>>>> with
>>>>>>>>>>> these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>>>>>>>> set in
>>>>>>>>>>> those directories by our unit tests which are precluding those
>>>>> files
>>>>>>>>>>> from
>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but we
>>>>>>> should
>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>> question
>>>>>>> to
>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>> 
>>>>>>>>>>> To help narrow down which commit it was that started this, Andrew
>>>>>>> sent
>>>>>>>>>>> me
>>>>>>>>>>> this info:
>>>>>>>>>>> 
>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>>> 
>>>>> 
>>>>>>>>>>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>> /
>>>>>>>>>>> has
>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>>>> 9:32
>>>>>>>>>>> UTC
>>>>>>>>>>> on March 5th."
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>> <cm...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>>>>>>>>> seem
>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>> 
>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>> (default-clean)
>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>>>>>>>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>> fs
>>>>>>>>>>>> -pr
>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>> -> [Help 1]
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>>>>>>>> permissions?
>>>>>>>>>>>> 
>>>>>>>>>>>> Colin
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sean
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sean
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sean
>>> 
>> 
>> 
>> 
>> --
>> Sean
>> 
> 
> 
> 
> -- 
> Sean


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nick Dimiduk <nd...@apache.org>.
I think this is a great idea! Having just gone through the process of
getting Phoenix up to speed with precommits, it would be really nice to
have a place to go other than "fork/hack someone else's work". For the same
project, I recently integrated its first daemon service. This meant adding
a bunch of servicy Python code (multi platform support is required) which I
only sort of trust. Again, would be great to have an explicit resource for
this kind of thing in the ecosystem. I expect Calcite and Kylin will be
following along shortly.

Since we're tossing out names, how about Apache Bootstrap? It's a
meta-project to help other projects get off the ground, after all.

-n

On Monday, June 15, 2015, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
>
> > Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >
> >
> >
> > On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >
> >> Hi Folks!
> >>
> >> After working on test-patch with other folks for the last few months, I
> >> think we've reached the point where we can make the fastest progress
> >> towards the goal of a general use pre-commit patch tester by spinning
> >> things into a project focused on just that. I think we have a mature
> enough
> >> code base and a sufficient fledgling community, so I'm going to put
> >> together a tlp proposal.
> >>
> >> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >> continue to make things more useful.
> >>
> >> -Sean
> >>
> >> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >>
> >>> HBase's dev-support folder is where the scripts and support files live.
> >>> We've only recently started adding anything to the maven builds that's
> >>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
> I'd
> >>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>
> >>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
> >>> we don't properly back this up anywhere, we just notify each other of
> >>> changes on a particular mail thread[3].
> >>>
> >>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> >>> read because I just finished fixing "mvn site" running out of permgen)
> >>> [3]: http://s.apache.org/NT0
> >>>
> >>>
> >>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com <javascript:;>
> >>> > wrote:
> >>>
> >>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>> HBase
> >>>> repo?  Is there any additional context we need to be aware of?
> >>>>
> >>>> Chris Nauroth
> >>>> Hortonworks
> >>>> http://hortonworks.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com
> <javascript:;>> wrote:
> >>>>
> >>>> >+dev@hbase
> >>>> >
> >>>> >HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>> >them
> >>>> >more robust. From what I can tell our stuff started off as an earlier
> >>>> >version of what Hadoop uses for testing.
> >>>> >
> >>>> >Folks on either side open to an experiment of combining our precommit
> >>>> >check
> >>>> >tooling? In principle we should be looking for the same kinds of
> >>>> things.
> >>>> >
> >>>> >Naturally we'll still need different jenkins jobs to handle different
> >>>> >resource needs and we'd need to figure out where stuff eventually
> >>>> lives,
> >>>> >but that could come later.
> >>>> >
> >>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>> cnauroth@hortonworks.com <javascript:;>>
> >>>> >wrote:
> >>>> >
> >>>> >> The only thing I'm aware of is the failOnError option:
> >>>> >>
> >>>> >>
> >>>> >>
> >>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>> >>rs
> >>>> >> .html
> >>>> >>
> >>>> >>
> >>>> >> I prefer that we don't disable this, because ignoring different
> >>>> kinds of
> >>>> >> failures could leave our build directories in an indeterminate
> state.
> >>>> >>For
> >>>> >> example, we could end up with an old class file on the classpath
> for
> >>>> >>test
> >>>> >> runs that was supposedly deleted.
> >>>> >>
> >>>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>>> failure
> >>>> >> by placing a file where the code expects to see a directory.  That
> >>>> might
> >>>> >> even let us enable some of these tests that are skipped on Windows,
> >>>> >> because Windows allows access for the owner even after permissions
> >>>> have
> >>>> >> been stripped.
> >>>> >>
> >>>> >> Chris Nauroth
> >>>> >> Hortonworks
> >>>> >> http://hortonworks.com/
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
> <javascript:;>> wrote:
> >>>> >>
> >>>> >> >Is there a maven plugin or setting we can use to simply remove
> >>>> >> >directories that have no executable permissions on them?  Clearly
> we
> >>>> >> >have the permission to do this from a technical point of view
> (since
> >>>> >> >we created the directories as the jenkins user), it's simply that
> >>>> the
> >>>> >> >code refuses to do it.
> >>>> >> >
> >>>> >> >Otherwise I guess we can just fix those tests...
> >>>> >> >
> >>>> >> >Colin
> >>>> >> >
> >>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>>> >> >>
> >>>> >> >> In HDFS-7722:
> >>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>> >> >>TearDown().
> >>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
> clause.
> >>>> >> >>
> >>>> >> >> Also I ran mvn test several times on my machine and all tests
> >>>> passed.
> >>>> >> >>
> >>>> >> >> However, since in DiskChecker#checkDirAccess():
> >>>> >> >>
> >>>> >> >> private static void checkDirAccess(File dir) throws
> >>>> >>DiskErrorException {
> >>>> >> >>   if (!dir.isDirectory()) {
> >>>> >> >>     throw new DiskErrorException("Not a directory: "
> >>>> >> >>                                  + dir.toString());
> >>>> >> >>   }
> >>>> >> >>
> >>>> >> >>   checkAccessByFileMethods(dir);
> >>>> >> >> }
> >>>> >> >>
> >>>> >> >> One potentially safer alternative is replacing data dir with a
> >>>> >>regular
> >>>> >> >> file to stimulate disk failures.
> >>>> >> >>
> >>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>> >> >><cnauroth@hortonworks.com <javascript:;>> wrote:
> >>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>>> >>permissions
> >>>> >> >>>from
> >>>> >> >>> directories like the one Colin mentioned to simulate disk
> >>>> failures
> >>>> >>at
> >>>> >> >>>data
> >>>> >> >>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>> >>to be
> >>>> >> >>> doing the necessary work to restore executable permissions at
> the
> >>>> >>end
> >>>> >> >>>of
> >>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>> >> >>>changes
> >>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>> >> >>>though.  I
> >>>> >> >>> don¹t know if there are other uncommitted patches that changed
> >>>> these
> >>>> >> >>>test
> >>>> >> >>> suites.
> >>>> >> >>>
> >>>> >> >>> I suppose it¹s also possible that the JUnit process
> unexpectedly
> >>>> >>died
> >>>> >> >>> after removing executable permissions but before restoring
> them.
> >>>> >>That
> >>>> >> >>> always would have been a weakness of these test suites,
> >>>> regardless
> >>>> >>of
> >>>> >> >>>any
> >>>> >> >>> recent changes.
> >>>> >> >>>
> >>>> >> >>> Chris Nauroth
> >>>> >> >>> Hortonworks
> >>>> >> >>> http://hortonworks.com/
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >>>
> >>>> >> >>>>Hey Colin,
> >>>> >> >>>>
> >>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
> >>>> on
> >>>> >>with
> >>>> >> >>>>these boxes. He took a look and concluded that some perms are
> >>>> being
> >>>> >> >>>>set in
> >>>> >> >>>>those directories by our unit tests which are precluding those
> >>>> files
> >>>> >> >>>>from
> >>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
> we
> >>>> >>should
> >>>> >> >>>>expect this to keep happening until we can fix the test in
> >>>> question
> >>>> >>to
> >>>> >> >>>>properly clean up after itself.
> >>>> >> >>>>
> >>>> >> >>>>To help narrow down which commit it was that started this,
> Andrew
> >>>> >>sent
> >>>> >> >>>>me
> >>>> >> >>>>this info:
> >>>> >> >>>>
> >>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>> >>
> >>>>
> >>>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >>>>>>/
> >>>> >> >>>>has
> >>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
> >>>> since
> >>>> >>9:32
> >>>> >> >>>>UTC
> >>>> >> >>>>on March 5th."
> >>>> >> >>>>
> >>>> >> >>>>--
> >>>> >> >>>>Aaron T. Myers
> >>>> >> >>>>Software Engineer, Cloudera
> >>>> >> >>>>
> >>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>> >><cmccabe@apache.org <javascript:;>>
> >>>> >> >>>>wrote:
> >>>> >> >>>>
> >>>> >> >>>>> Hi all,
> >>>> >> >>>>>
> >>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
> find
> >>>> any
> >>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>> them
> >>>> >> >>>>>seem
> >>>> >> >>>>> to be failing with some variant of this message:
> >>>> >> >>>>>
> >>>> >> >>>>> [ERROR] Failed to execute goal
> >>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>> >>(default-clean)
> >>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>> delete
> >>>> >> >>>>>
> >>>> >> >>>>>
> >>>> >>
> >>>>
> >>>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>> >>>>>>>fs
> >>>> >> >>>>>-pr
> >>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >> >>>>> -> [Help 1]
> >>>> >> >>>>>
> >>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>> >> >>>>> permissions?
> >>>> >> >>>>>
> >>>> >> >>>>> Colin
> >>>> >> >>>>>
> >>>> >> >>>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> --
> >>>> >> >> Lei (Eddy) Xu
> >>>> >> >> Software Engineer, Cloudera
> >>>> >>
> >>>> >>
> >>>> >
> >>>> >
> >>>> >--
> >>>> >Sean
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
As mentioned on HADOOP-12111, there is now an incubator-style proposal:
http://wiki.apache.org/incubator/YetusProposal

On Wed, Jun 24, 2015 at 9:41 AM, Sean Busbey <bu...@cloudera.com> wrote:

> Hi Folks!
>
> Work in a feature branch is now being tracked by HADOOP-12111.
>
> On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> It looks like we have consensus.
>>
>> I'll start drafting up a proposal for the next board meeting (July 15th).
>> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
>> that we did due diligence on whatever we pick.
>>
>> In the mean time, Hadoop PMC would y'all be willing to host us in a
>> branch so that we can start prepping things now? We would want branch
>> commit rights for the proposed new PMC.
>>
>>
>> -Sean
>>
>>
>> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>>> the June board meeting is upon us. Sorry everyone. In the event that this
>>> ends up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across projects. Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on
>>> generalizing the tooling present in the Hadoop code base (because it was
>>> the most mature out of all those in the various projects) and it's reached
>>> a point where we think we can start bringing on other downstream users.
>>> This means we need to start establishing things like a release cadence and
>>> to grow the new contributors we have to handle more project responsibility.
>>> Personally, I think that means it's time to move out from under Hadoop to
>>> drive things as our own community. Eventually, I hope the community can
>>> help draw in a group of folks traditionally underrepresented in ASF
>>> projects, namely QA and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>>> Having a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for individual
>>> developers is even more work. We could easily also take over maintenance of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used in
>>> other projects. Right now, the proposed PMC would be (alphabetical by last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>> jclouds pmc, sqoop pmc, all around Jenkins expert)
>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>> phoenix pmc)
>>> * Allen Wittenauer (hadoop committer)
>>>
>>> That PMC gives us several members and a bunch of folks familiar with the
>>> ASF. Combined with the code already existing in Apache spaces, I think that
>>> gives us sufficient justification for a direct board proposal.
>>>
>>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>>> snail and most of our project will be focused on shell scripts.
>>>
>>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>>> on the new TLP, but I hope that once we have a release that can be
>>> evaluated there'd be enough benefit to strongly encourage it.
>>>
>>> This has mostly been focused on scope and community issues, and I'd love
>>> to talk through any feedback on that. Additionally, are there any other
>>> points folks want to make sure are covered before we have a resolution?
>>>
>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Folks!
>>>>>
>>>>> After working on test-patch with other folks for the last few months,
>>>>> I think we've reached the point where we can make the fastest progress
>>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>>> things into a project focused on just that. I think we have a mature enough
>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>> together a tlp proposal.
>>>>>
>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>>> continue to make things more useful.
>>>>>
>>>>> -Sean
>>>>>
>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> HBase's dev-support folder is where the scripts and support files
>>>>>> live. We've only recently started adding anything to the maven builds
>>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>>> are having.
>>>>>>
>>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>>> of changes on a particular mail thread[3].
>>>>>>
>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>>> [3]: http://s.apache.org/NT0
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com> wrote:
>>>>>>
>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>>> HBase
>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>>>
>>>>>>> >+dev@hbase
>>>>>>> >
>>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
>>>>>>> make
>>>>>>> >them
>>>>>>> >more robust. From what I can tell our stuff started off as an
>>>>>>> earlier
>>>>>>> >version of what Hadoop uses for testing.
>>>>>>> >
>>>>>>> >Folks on either side open to an experiment of combining our
>>>>>>> precommit
>>>>>>> >check
>>>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>>>> things.
>>>>>>> >
>>>>>>> >Naturally we'll still need different jenkins jobs to handle
>>>>>>> different
>>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>>> lives,
>>>>>>> >but that could come later.
>>>>>>> >
>>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>>> cnauroth@hortonworks.com>
>>>>>>> >wrote:
>>>>>>> >
>>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>> >>rs
>>>>>>> >> .html
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>>> kinds of
>>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>>> state.
>>>>>>> >>For
>>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>>> for
>>>>>>> >>test
>>>>>>> >> runs that was supposedly deleted.
>>>>>>> >>
>>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>> failure
>>>>>>> >> by placing a file where the code expects to see a directory.
>>>>>>> That might
>>>>>>> >> even let us enable some of these tests that are skipped on
>>>>>>> Windows,
>>>>>>> >> because Windows allows access for the owner even after
>>>>>>> permissions have
>>>>>>> >> been stripped.
>>>>>>> >>
>>>>>>> >> Chris Nauroth
>>>>>>> >> Hortonworks
>>>>>>> >> http://hortonworks.com/
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>>>> >> >directories that have no executable permissions on them?
>>>>>>> Clearly we
>>>>>>> >> >have the permission to do this from a technical point of view
>>>>>>> (since
>>>>>>> >> >we created the directories as the jenkins user), it's simply
>>>>>>> that the
>>>>>>> >> >code refuses to do it.
>>>>>>> >> >
>>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>>> >> >
>>>>>>> >> >Colin
>>>>>>> >> >
>>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>> >> >>
>>>>>>> >> >> In HDFS-7722:
>>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
>>>>>>> in
>>>>>>> >> >>TearDown().
>>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>> clause.
>>>>>>> >> >>
>>>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>>>> passed.
>>>>>>> >> >>
>>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>>> >> >>
>>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>>> >>DiskErrorException {
>>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>>>> >> >>                                  + dir.toString());
>>>>>>> >> >>   }
>>>>>>> >> >>
>>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>>> >> >> }
>>>>>>> >> >>
>>>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>>>> >>regular
>>>>>>> >> >> file to stimulate disk failures.
>>>>>>> >> >>
>>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>> >>permissions
>>>>>>> >> >>>from
>>>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>>>> failures
>>>>>>> >>at
>>>>>>> >> >>>data
>>>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>>>> appear
>>>>>>> >>to be
>>>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>>>> the
>>>>>>> >>end
>>>>>>> >> >>>of
>>>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>>> makes
>>>>>>> >> >>>changes
>>>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks
>>>>>>> fine
>>>>>>> >> >>>though.  I
>>>>>>> >> >>> don¹t know if there are other uncommitted patches that
>>>>>>> changed these
>>>>>>> >> >>>test
>>>>>>> >> >>> suites.
>>>>>>> >> >>>
>>>>>>> >> >>> I suppose it¹s also possible that the JUnit process
>>>>>>> unexpectedly
>>>>>>> >>died
>>>>>>> >> >>> after removing executable permissions but before restoring
>>>>>>> them.
>>>>>>> >>That
>>>>>>> >> >>> always would have been a weakness of these test suites,
>>>>>>> regardless
>>>>>>> >>of
>>>>>>> >> >>>any
>>>>>>> >> >>> recent changes.
>>>>>>> >> >>>
>>>>>>> >> >>> Chris Nauroth
>>>>>>> >> >>> Hortonworks
>>>>>>> >> >>> http://hortonworks.com/
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>>>Hey Colin,
>>>>>>> >> >>>>
>>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>> going on
>>>>>>> >>with
>>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>>>> being
>>>>>>> >> >>>>set in
>>>>>>> >> >>>>those directories by our unit tests which are precluding
>>>>>>> those files
>>>>>>> >> >>>>from
>>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>>>>>>> we
>>>>>>> >>should
>>>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>>>> question
>>>>>>> >>to
>>>>>>> >> >>>>properly clean up after itself.
>>>>>>> >> >>>>
>>>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>>>> Andrew
>>>>>>> >>sent
>>>>>>> >> >>>>me
>>>>>>> >> >>>>this info:
>>>>>>> >> >>>>
>>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >>>>>>/
>>>>>>> >> >>>>has
>>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>>> since
>>>>>>> >>9:32
>>>>>>> >> >>>>UTC
>>>>>>> >> >>>>on March 5th."
>>>>>>> >> >>>>
>>>>>>> >> >>>>--
>>>>>>> >> >>>>Aaron T. Myers
>>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>>> >> >>>>
>>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>> >><cm...@apache.org>
>>>>>>> >> >>>>wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>>> Hi all,
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>>>> find any
>>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most
>>>>>>> of them
>>>>>>> >> >>>>>seem
>>>>>>> >> >>>>> to be failing with some variant of this message:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>> >>(default-clean)
>>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>> delete
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>> >>>>>>>fs
>>>>>>> >> >>>>>-pr
>>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >> >>>>> -> [Help 1]
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>>> wrong
>>>>>>> >> >>>>> permissions?
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Colin
>>>>>>> >> >>>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> --
>>>>>>> >> >> Lei (Eddy) Xu
>>>>>>> >> >> Software Engineer, Cloudera
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> >--
>>>>>>> >Sean
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
As mentioned on HADOOP-12111, there is now an incubator-style proposal:
http://wiki.apache.org/incubator/YetusProposal

On Wed, Jun 24, 2015 at 9:41 AM, Sean Busbey <bu...@cloudera.com> wrote:

> Hi Folks!
>
> Work in a feature branch is now being tracked by HADOOP-12111.
>
> On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> It looks like we have consensus.
>>
>> I'll start drafting up a proposal for the next board meeting (July 15th).
>> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
>> that we did due diligence on whatever we pick.
>>
>> In the mean time, Hadoop PMC would y'all be willing to host us in a
>> branch so that we can start prepping things now? We would want branch
>> commit rights for the proposed new PMC.
>>
>>
>> -Sean
>>
>>
>> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>>> the June board meeting is upon us. Sorry everyone. In the event that this
>>> ends up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across projects. Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on
>>> generalizing the tooling present in the Hadoop code base (because it was
>>> the most mature out of all those in the various projects) and it's reached
>>> a point where we think we can start bringing on other downstream users.
>>> This means we need to start establishing things like a release cadence and
>>> to grow the new contributors we have to handle more project responsibility.
>>> Personally, I think that means it's time to move out from under Hadoop to
>>> drive things as our own community. Eventually, I hope the community can
>>> help draw in a group of folks traditionally underrepresented in ASF
>>> projects, namely QA and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>>> Having a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for individual
>>> developers is even more work. We could easily also take over maintenance of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used in
>>> other projects. Right now, the proposed PMC would be (alphabetical by last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>> jclouds pmc, sqoop pmc, all around Jenkins expert)
>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>> phoenix pmc)
>>> * Allen Wittenauer (hadoop committer)
>>>
>>> That PMC gives us several members and a bunch of folks familiar with the
>>> ASF. Combined with the code already existing in Apache spaces, I think that
>>> gives us sufficient justification for a direct board proposal.
>>>
>>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>>> snail and most of our project will be focused on shell scripts.
>>>
>>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>>> on the new TLP, but I hope that once we have a release that can be
>>> evaluated there'd be enough benefit to strongly encourage it.
>>>
>>> This has mostly been focused on scope and community issues, and I'd love
>>> to talk through any feedback on that. Additionally, are there any other
>>> points folks want to make sure are covered before we have a resolution?
>>>
>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Folks!
>>>>>
>>>>> After working on test-patch with other folks for the last few months,
>>>>> I think we've reached the point where we can make the fastest progress
>>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>>> things into a project focused on just that. I think we have a mature enough
>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>> together a tlp proposal.
>>>>>
>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>>> continue to make things more useful.
>>>>>
>>>>> -Sean
>>>>>
>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> HBase's dev-support folder is where the scripts and support files
>>>>>> live. We've only recently started adding anything to the maven builds
>>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>>> are having.
>>>>>>
>>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>>> of changes on a particular mail thread[3].
>>>>>>
>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>>> [3]: http://s.apache.org/NT0
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com> wrote:
>>>>>>
>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>>> HBase
>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>>>
>>>>>>> >+dev@hbase
>>>>>>> >
>>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
>>>>>>> make
>>>>>>> >them
>>>>>>> >more robust. From what I can tell our stuff started off as an
>>>>>>> earlier
>>>>>>> >version of what Hadoop uses for testing.
>>>>>>> >
>>>>>>> >Folks on either side open to an experiment of combining our
>>>>>>> precommit
>>>>>>> >check
>>>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>>>> things.
>>>>>>> >
>>>>>>> >Naturally we'll still need different jenkins jobs to handle
>>>>>>> different
>>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>>> lives,
>>>>>>> >but that could come later.
>>>>>>> >
>>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>>> cnauroth@hortonworks.com>
>>>>>>> >wrote:
>>>>>>> >
>>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>> >>rs
>>>>>>> >> .html
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>>> kinds of
>>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>>> state.
>>>>>>> >>For
>>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>>> for
>>>>>>> >>test
>>>>>>> >> runs that was supposedly deleted.
>>>>>>> >>
>>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>> failure
>>>>>>> >> by placing a file where the code expects to see a directory.
>>>>>>> That might
>>>>>>> >> even let us enable some of these tests that are skipped on
>>>>>>> Windows,
>>>>>>> >> because Windows allows access for the owner even after
>>>>>>> permissions have
>>>>>>> >> been stripped.
>>>>>>> >>
>>>>>>> >> Chris Nauroth
>>>>>>> >> Hortonworks
>>>>>>> >> http://hortonworks.com/
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>>>> >> >directories that have no executable permissions on them?
>>>>>>> Clearly we
>>>>>>> >> >have the permission to do this from a technical point of view
>>>>>>> (since
>>>>>>> >> >we created the directories as the jenkins user), it's simply
>>>>>>> that the
>>>>>>> >> >code refuses to do it.
>>>>>>> >> >
>>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>>> >> >
>>>>>>> >> >Colin
>>>>>>> >> >
>>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>> >> >>
>>>>>>> >> >> In HDFS-7722:
>>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
>>>>>>> in
>>>>>>> >> >>TearDown().
>>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>> clause.
>>>>>>> >> >>
>>>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>>>> passed.
>>>>>>> >> >>
>>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>>> >> >>
>>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>>> >>DiskErrorException {
>>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>>>> >> >>                                  + dir.toString());
>>>>>>> >> >>   }
>>>>>>> >> >>
>>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>>> >> >> }
>>>>>>> >> >>
>>>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>>>> >>regular
>>>>>>> >> >> file to stimulate disk failures.
>>>>>>> >> >>
>>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>> >>permissions
>>>>>>> >> >>>from
>>>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>>>> failures
>>>>>>> >>at
>>>>>>> >> >>>data
>>>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>>>> appear
>>>>>>> >>to be
>>>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>>>> the
>>>>>>> >>end
>>>>>>> >> >>>of
>>>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>>> makes
>>>>>>> >> >>>changes
>>>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks
>>>>>>> fine
>>>>>>> >> >>>though.  I
>>>>>>> >> >>> don¹t know if there are other uncommitted patches that
>>>>>>> changed these
>>>>>>> >> >>>test
>>>>>>> >> >>> suites.
>>>>>>> >> >>>
>>>>>>> >> >>> I suppose it¹s also possible that the JUnit process
>>>>>>> unexpectedly
>>>>>>> >>died
>>>>>>> >> >>> after removing executable permissions but before restoring
>>>>>>> them.
>>>>>>> >>That
>>>>>>> >> >>> always would have been a weakness of these test suites,
>>>>>>> regardless
>>>>>>> >>of
>>>>>>> >> >>>any
>>>>>>> >> >>> recent changes.
>>>>>>> >> >>>
>>>>>>> >> >>> Chris Nauroth
>>>>>>> >> >>> Hortonworks
>>>>>>> >> >>> http://hortonworks.com/
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>>>Hey Colin,
>>>>>>> >> >>>>
>>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>> going on
>>>>>>> >>with
>>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>>>> being
>>>>>>> >> >>>>set in
>>>>>>> >> >>>>those directories by our unit tests which are precluding
>>>>>>> those files
>>>>>>> >> >>>>from
>>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>>>>>>> we
>>>>>>> >>should
>>>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>>>> question
>>>>>>> >>to
>>>>>>> >> >>>>properly clean up after itself.
>>>>>>> >> >>>>
>>>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>>>> Andrew
>>>>>>> >>sent
>>>>>>> >> >>>>me
>>>>>>> >> >>>>this info:
>>>>>>> >> >>>>
>>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >>>>>>/
>>>>>>> >> >>>>has
>>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>>> since
>>>>>>> >>9:32
>>>>>>> >> >>>>UTC
>>>>>>> >> >>>>on March 5th."
>>>>>>> >> >>>>
>>>>>>> >> >>>>--
>>>>>>> >> >>>>Aaron T. Myers
>>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>>> >> >>>>
>>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>> >><cm...@apache.org>
>>>>>>> >> >>>>wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>>> Hi all,
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>>>> find any
>>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most
>>>>>>> of them
>>>>>>> >> >>>>>seem
>>>>>>> >> >>>>> to be failing with some variant of this message:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>> >>(default-clean)
>>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>> delete
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>> >>>>>>>fs
>>>>>>> >> >>>>>-pr
>>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >> >>>>> -> [Help 1]
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>>> wrong
>>>>>>> >> >>>>> permissions?
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Colin
>>>>>>> >> >>>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> --
>>>>>>> >> >> Lei (Eddy) Xu
>>>>>>> >> >> Software Engineer, Cloudera
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> >--
>>>>>>> >Sean
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

Work in a feature branch is now being tracked by HADOOP-12111.

On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bu...@cloudera.com> wrote:

> It looks like we have consensus.
>
> I'll start drafting up a proposal for the next board meeting (July 15th).
> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
> that we did due diligence on whatever we pick.
>
> In the mean time, Hadoop PMC would y'all be willing to host us in a branch
> so that we can start prepping things now? We would want branch commit
> rights for the proposed new PMC.
>
>
> -Sean
>
>
> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Oof. I had meant to push on this again but life got in the way and now
>> the June board meeting is upon us. Sorry everyone. In the event that this
>> ends up contentious, hopefully one of the copied communities can give us a
>> branch to work in.
>>
>> I know everyone is busy, so here's the short version of this email: I'd
>> like to move some of the code currently in Hadoop (test-patch) into a new
>> TLP focused on QA tooling. I'm not sure what the best format for priming
>> this conversation is. ORC filled in the incubator project proposal
>> template, but I'm not sure how much that confused the issue. So to start,
>> I'll just write what I'm hoping we can accomplish in general terms here.
>>
>> All software development projects that are community based (that is,
>> accepting outside contributions) face a common QA problem for vetting
>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>> popular that the weight of the problem drove tool development (i.e.
>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>> have adopted their own forks. Unfortunately, in most projects this kind of
>> QA work is an enabler rather than a primary concern, so often the tooling
>> is worked on ad-hoc and little shared improvements happen across projects. Since
>> the tooling itself is never a primary concern, any made is rarely reused
>> outside of ASF projects.
>>
>> Over the last couple months a few of us have been working on generalizing
>> the tooling present in the Hadoop code base (because it was the most mature
>> out of all those in the various projects) and it's reached a point where we
>> think we can start bringing on other downstream users. This means we need
>> to start establishing things like a release cadence and to grow the new
>> contributors we have to handle more project responsibility. Personally, I
>> think that means it's time to move out from under Hadoop to drive things as
>> our own community. Eventually, I hope the community can help draw in a
>> group of folks traditionally underrepresented in ASF projects, namely QA
>> and operations folks.
>>
>> I think test-patch by itself has enough scope to justify a project.
>> Having a solid set of build tools that are customizable to fit the norms of
>> different software communities is a bunch of work. Making it work well in
>> both the context of automated test systems like Jenkins and for individual
>> developers is even more work. We could easily also take over maintenance of
>> things like shelldocs, since test-patch is the primary consumer of that
>> currently but it's generally useful tooling.
>>
>> In addition to test-patch, I think the proposed project has some future
>> growth potential. Given some adoption of test-patch to prove utility, the
>> project could build on the ties it makes to start building tools to help
>> projects do their own longer-run testing. Note that I'm talking about the
>> tools to build QA processes and not a particular set of tested components.
>> Specifically, I think the ChaosMonkey work that's in HBase should be
>> generalizable as a fault injection framework (either based on that code or
>> something like it). Doing this for arbitrary software is obviously very
>> difficult, and a part of easing that will be to make (and then favor)
>> tooling to allow projects to have operational glue that looks the same.
>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>> great foundational layer that could bring good daemon handling practices to
>> a whole slew of software projects. In the event that these frameworks and
>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>> of i.e. Bigtop substantially easier.
>>
>> I've reached out to a few folks who have been involved in the current
>> test-patch work or expressed interest in helping out on getting it used in
>> other projects. Right now, the proposed PMC would be (alphabetical by last
>> name):
>>
>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>> pmc, sqoop pmc, all around Jenkins expert)
>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>> phoenix pmc)
>> * Allen Wittenauer (hadoop committer)
>>
>> That PMC gives us several members and a bunch of folks familiar with the
>> ASF. Combined with the code already existing in Apache spaces, I think that
>> gives us sufficient justification for a direct board proposal.
>>
>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>> snail and most of our project will be focused on shell scripts.
>>
>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>> on the new TLP, but I hope that once we have a release that can be
>> evaluated there'd be enough benefit to strongly encourage it.
>>
>> This has mostly been focused on scope and community issues, and I'd love
>> to talk through any feedback on that. Additionally, are there any other
>> points folks want to make sure are covered before we have a resolution?
>>
>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>
>>>
>>>
>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> Hi Folks!
>>>>
>>>> After working on test-patch with other folks for the last few months, I
>>>> think we've reached the point where we can make the fastest progress
>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>> things into a project focused on just that. I think we have a mature enough
>>>> code base and a sufficient fledgling community, so I'm going to put
>>>> together a tlp proposal.
>>>>
>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>> continue to make things more useful.
>>>>
>>>> -Sean
>>>>
>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>> HBase's dev-support folder is where the scripts and support files
>>>>> live. We've only recently started adding anything to the maven builds
>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>> are having.
>>>>>
>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>> of changes on a particular mail thread[3].
>>>>>
>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>> [3]: http://s.apache.org/NT0
>>>>>
>>>>>
>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com> wrote:
>>>>>
>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>> HBase
>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>
>>>>>> Chris Nauroth
>>>>>> Hortonworks
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>>
>>>>>> >+dev@hbase
>>>>>> >
>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
>>>>>> make
>>>>>> >them
>>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>>> >version of what Hadoop uses for testing.
>>>>>> >
>>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>>> >check
>>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>>> things.
>>>>>> >
>>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>> lives,
>>>>>> >but that could come later.
>>>>>> >
>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> >wrote:
>>>>>> >
>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>> >>rs
>>>>>> >> .html
>>>>>> >>
>>>>>> >>
>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>> kinds of
>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>> state.
>>>>>> >>For
>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>> for
>>>>>> >>test
>>>>>> >> runs that was supposedly deleted.
>>>>>> >>
>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>> failure
>>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>>> might
>>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>>> >> because Windows allows access for the owner even after permissions
>>>>>> have
>>>>>> >> been stripped.
>>>>>> >>
>>>>>> >> Chris Nauroth
>>>>>> >> Hortonworks
>>>>>> >> http://hortonworks.com/
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>>> >> >directories that have no executable permissions on them?  Clearly
>>>>>> we
>>>>>> >> >have the permission to do this from a technical point of view
>>>>>> (since
>>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>>> the
>>>>>> >> >code refuses to do it.
>>>>>> >> >
>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>> >> >
>>>>>> >> >Colin
>>>>>> >> >
>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> >> >>
>>>>>> >> >> In HDFS-7722:
>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>> >> >>TearDown().
>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>> clause.
>>>>>> >> >>
>>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>>> passed.
>>>>>> >> >>
>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>> >> >>
>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>> >>DiskErrorException {
>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>>> >> >>                                  + dir.toString());
>>>>>> >> >>   }
>>>>>> >> >>
>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>> >> >> }
>>>>>> >> >>
>>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>>> >>regular
>>>>>> >> >> file to stimulate disk failures.
>>>>>> >> >>
>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>> >>permissions
>>>>>> >> >>>from
>>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>>> failures
>>>>>> >>at
>>>>>> >> >>>data
>>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>>> appear
>>>>>> >>to be
>>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>>> the
>>>>>> >>end
>>>>>> >> >>>of
>>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>> makes
>>>>>> >> >>>changes
>>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>> >> >>>though.  I
>>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>>> these
>>>>>> >> >>>test
>>>>>> >> >>> suites.
>>>>>> >> >>>
>>>>>> >> >>> I suppose it¹s also possible that the JUnit process
>>>>>> unexpectedly
>>>>>> >>died
>>>>>> >> >>> after removing executable permissions but before restoring
>>>>>> them.
>>>>>> >>That
>>>>>> >> >>> always would have been a weakness of these test suites,
>>>>>> regardless
>>>>>> >>of
>>>>>> >> >>>any
>>>>>> >> >>> recent changes.
>>>>>> >> >>>
>>>>>> >> >>> Chris Nauroth
>>>>>> >> >>> Hortonworks
>>>>>> >> >>> http://hortonworks.com/
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>>>>>> wrote:
>>>>>> >> >>>
>>>>>> >> >>>>Hey Colin,
>>>>>> >> >>>>
>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>> going on
>>>>>> >>with
>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>>> being
>>>>>> >> >>>>set in
>>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>>> files
>>>>>> >> >>>>from
>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>>>>>> we
>>>>>> >>should
>>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>>> question
>>>>>> >>to
>>>>>> >> >>>>properly clean up after itself.
>>>>>> >> >>>>
>>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>>> Andrew
>>>>>> >>sent
>>>>>> >> >>>>me
>>>>>> >> >>>>this info:
>>>>>> >> >>>>
>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >>
>>>>>>
>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >>>>>>/
>>>>>> >> >>>>has
>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>> since
>>>>>> >>9:32
>>>>>> >> >>>>UTC
>>>>>> >> >>>>on March 5th."
>>>>>> >> >>>>
>>>>>> >> >>>>--
>>>>>> >> >>>>Aaron T. Myers
>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>> >> >>>>
>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>> >><cm...@apache.org>
>>>>>> >> >>>>wrote:
>>>>>> >> >>>>
>>>>>> >> >>>>> Hi all,
>>>>>> >> >>>>>
>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>>> find any
>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>> them
>>>>>> >> >>>>>seem
>>>>>> >> >>>>> to be failing with some variant of this message:
>>>>>> >> >>>>>
>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> >>(default-clean)
>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>> delete
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >>
>>>>>>
>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>> >>>>>>>fs
>>>>>> >> >>>>>-pr
>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >> >>>>> -> [Help 1]
>>>>>> >> >>>>>
>>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>> wrong
>>>>>> >> >>>>> permissions?
>>>>>> >> >>>>>
>>>>>> >> >>>>> Colin
>>>>>> >> >>>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Lei (Eddy) Xu
>>>>>> >> >> Software Engineer, Cloudera
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >--
>>>>>> >Sean
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

Work in a feature branch is now being tracked by HADOOP-12111.

On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bu...@cloudera.com> wrote:

> It looks like we have consensus.
>
> I'll start drafting up a proposal for the next board meeting (July 15th).
> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
> that we did due diligence on whatever we pick.
>
> In the mean time, Hadoop PMC would y'all be willing to host us in a branch
> so that we can start prepping things now? We would want branch commit
> rights for the proposed new PMC.
>
>
> -Sean
>
>
> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Oof. I had meant to push on this again but life got in the way and now
>> the June board meeting is upon us. Sorry everyone. In the event that this
>> ends up contentious, hopefully one of the copied communities can give us a
>> branch to work in.
>>
>> I know everyone is busy, so here's the short version of this email: I'd
>> like to move some of the code currently in Hadoop (test-patch) into a new
>> TLP focused on QA tooling. I'm not sure what the best format for priming
>> this conversation is. ORC filled in the incubator project proposal
>> template, but I'm not sure how much that confused the issue. So to start,
>> I'll just write what I'm hoping we can accomplish in general terms here.
>>
>> All software development projects that are community based (that is,
>> accepting outside contributions) face a common QA problem for vetting
>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>> popular that the weight of the problem drove tool development (i.e.
>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>> have adopted their own forks. Unfortunately, in most projects this kind of
>> QA work is an enabler rather than a primary concern, so often the tooling
>> is worked on ad-hoc and little shared improvements happen across projects. Since
>> the tooling itself is never a primary concern, any made is rarely reused
>> outside of ASF projects.
>>
>> Over the last couple months a few of us have been working on generalizing
>> the tooling present in the Hadoop code base (because it was the most mature
>> out of all those in the various projects) and it's reached a point where we
>> think we can start bringing on other downstream users. This means we need
>> to start establishing things like a release cadence and to grow the new
>> contributors we have to handle more project responsibility. Personally, I
>> think that means it's time to move out from under Hadoop to drive things as
>> our own community. Eventually, I hope the community can help draw in a
>> group of folks traditionally underrepresented in ASF projects, namely QA
>> and operations folks.
>>
>> I think test-patch by itself has enough scope to justify a project.
>> Having a solid set of build tools that are customizable to fit the norms of
>> different software communities is a bunch of work. Making it work well in
>> both the context of automated test systems like Jenkins and for individual
>> developers is even more work. We could easily also take over maintenance of
>> things like shelldocs, since test-patch is the primary consumer of that
>> currently but it's generally useful tooling.
>>
>> In addition to test-patch, I think the proposed project has some future
>> growth potential. Given some adoption of test-patch to prove utility, the
>> project could build on the ties it makes to start building tools to help
>> projects do their own longer-run testing. Note that I'm talking about the
>> tools to build QA processes and not a particular set of tested components.
>> Specifically, I think the ChaosMonkey work that's in HBase should be
>> generalizable as a fault injection framework (either based on that code or
>> something like it). Doing this for arbitrary software is obviously very
>> difficult, and a part of easing that will be to make (and then favor)
>> tooling to allow projects to have operational glue that looks the same.
>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>> great foundational layer that could bring good daemon handling practices to
>> a whole slew of software projects. In the event that these frameworks and
>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>> of i.e. Bigtop substantially easier.
>>
>> I've reached out to a few folks who have been involved in the current
>> test-patch work or expressed interest in helping out on getting it used in
>> other projects. Right now, the proposed PMC would be (alphabetical by last
>> name):
>>
>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>> pmc, sqoop pmc, all around Jenkins expert)
>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>> phoenix pmc)
>> * Allen Wittenauer (hadoop committer)
>>
>> That PMC gives us several members and a bunch of folks familiar with the
>> ASF. Combined with the code already existing in Apache spaces, I think that
>> gives us sufficient justification for a direct board proposal.
>>
>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>> snail and most of our project will be focused on shell scripts.
>>
>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>> on the new TLP, but I hope that once we have a release that can be
>> evaluated there'd be enough benefit to strongly encourage it.
>>
>> This has mostly been focused on scope and community issues, and I'd love
>> to talk through any feedback on that. Additionally, are there any other
>> points folks want to make sure are covered before we have a resolution?
>>
>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>
>>>
>>>
>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> Hi Folks!
>>>>
>>>> After working on test-patch with other folks for the last few months, I
>>>> think we've reached the point where we can make the fastest progress
>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>> things into a project focused on just that. I think we have a mature enough
>>>> code base and a sufficient fledgling community, so I'm going to put
>>>> together a tlp proposal.
>>>>
>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>> continue to make things more useful.
>>>>
>>>> -Sean
>>>>
>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>> HBase's dev-support folder is where the scripts and support files
>>>>> live. We've only recently started adding anything to the maven builds
>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>> are having.
>>>>>
>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>> of changes on a particular mail thread[3].
>>>>>
>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>> [3]: http://s.apache.org/NT0
>>>>>
>>>>>
>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com> wrote:
>>>>>
>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>> HBase
>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>
>>>>>> Chris Nauroth
>>>>>> Hortonworks
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>>
>>>>>> >+dev@hbase
>>>>>> >
>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
>>>>>> make
>>>>>> >them
>>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>>> >version of what Hadoop uses for testing.
>>>>>> >
>>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>>> >check
>>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>>> things.
>>>>>> >
>>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>> lives,
>>>>>> >but that could come later.
>>>>>> >
>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> >wrote:
>>>>>> >
>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>> >>rs
>>>>>> >> .html
>>>>>> >>
>>>>>> >>
>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>> kinds of
>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>> state.
>>>>>> >>For
>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>> for
>>>>>> >>test
>>>>>> >> runs that was supposedly deleted.
>>>>>> >>
>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>> failure
>>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>>> might
>>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>>> >> because Windows allows access for the owner even after permissions
>>>>>> have
>>>>>> >> been stripped.
>>>>>> >>
>>>>>> >> Chris Nauroth
>>>>>> >> Hortonworks
>>>>>> >> http://hortonworks.com/
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>>> >> >directories that have no executable permissions on them?  Clearly
>>>>>> we
>>>>>> >> >have the permission to do this from a technical point of view
>>>>>> (since
>>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>>> the
>>>>>> >> >code refuses to do it.
>>>>>> >> >
>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>> >> >
>>>>>> >> >Colin
>>>>>> >> >
>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> >> >>
>>>>>> >> >> In HDFS-7722:
>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>> >> >>TearDown().
>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>> clause.
>>>>>> >> >>
>>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>>> passed.
>>>>>> >> >>
>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>> >> >>
>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>> >>DiskErrorException {
>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>>> >> >>                                  + dir.toString());
>>>>>> >> >>   }
>>>>>> >> >>
>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>> >> >> }
>>>>>> >> >>
>>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>>> >>regular
>>>>>> >> >> file to stimulate disk failures.
>>>>>> >> >>
>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>> >>permissions
>>>>>> >> >>>from
>>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>>> failures
>>>>>> >>at
>>>>>> >> >>>data
>>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>>> appear
>>>>>> >>to be
>>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>>> the
>>>>>> >>end
>>>>>> >> >>>of
>>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>> makes
>>>>>> >> >>>changes
>>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>> >> >>>though.  I
>>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>>> these
>>>>>> >> >>>test
>>>>>> >> >>> suites.
>>>>>> >> >>>
>>>>>> >> >>> I suppose it¹s also possible that the JUnit process
>>>>>> unexpectedly
>>>>>> >>died
>>>>>> >> >>> after removing executable permissions but before restoring
>>>>>> them.
>>>>>> >>That
>>>>>> >> >>> always would have been a weakness of these test suites,
>>>>>> regardless
>>>>>> >>of
>>>>>> >> >>>any
>>>>>> >> >>> recent changes.
>>>>>> >> >>>
>>>>>> >> >>> Chris Nauroth
>>>>>> >> >>> Hortonworks
>>>>>> >> >>> http://hortonworks.com/
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>>>>>> wrote:
>>>>>> >> >>>
>>>>>> >> >>>>Hey Colin,
>>>>>> >> >>>>
>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>> going on
>>>>>> >>with
>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>>> being
>>>>>> >> >>>>set in
>>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>>> files
>>>>>> >> >>>>from
>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>>>>>> we
>>>>>> >>should
>>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>>> question
>>>>>> >>to
>>>>>> >> >>>>properly clean up after itself.
>>>>>> >> >>>>
>>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>>> Andrew
>>>>>> >>sent
>>>>>> >> >>>>me
>>>>>> >> >>>>this info:
>>>>>> >> >>>>
>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >>
>>>>>>
>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >>>>>>/
>>>>>> >> >>>>has
>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>> since
>>>>>> >>9:32
>>>>>> >> >>>>UTC
>>>>>> >> >>>>on March 5th."
>>>>>> >> >>>>
>>>>>> >> >>>>--
>>>>>> >> >>>>Aaron T. Myers
>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>> >> >>>>
>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>> >><cm...@apache.org>
>>>>>> >> >>>>wrote:
>>>>>> >> >>>>
>>>>>> >> >>>>> Hi all,
>>>>>> >> >>>>>
>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>>> find any
>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>> them
>>>>>> >> >>>>>seem
>>>>>> >> >>>>> to be failing with some variant of this message:
>>>>>> >> >>>>>
>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> >>(default-clean)
>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>> delete
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >>
>>>>>>
>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>> >>>>>>>fs
>>>>>> >> >>>>>-pr
>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >> >>>>> -> [Help 1]
>>>>>> >> >>>>>
>>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>> wrong
>>>>>> >> >>>>> permissions?
>>>>>> >> >>>>>
>>>>>> >> >>>>> Colin
>>>>>> >> >>>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Lei (Eddy) Xu
>>>>>> >> >> Software Engineer, Cloudera
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >--
>>>>>> >Sean
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
It looks like we have consensus.

I'll start drafting up a proposal for the next board meeting (July 15th).
Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
that we did due diligence on whatever we pick.

In the mean time, Hadoop PMC would y'all be willing to host us in a branch
so that we can start prepping things now? We would want branch commit
rights for the proposed new PMC.


-Sean


On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love
> to talk through any feedback on that. Additionally, are there any other
> points folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>> cnauroth@hortonworks.com> wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate
>>>>> state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly
>>>>> we
>>>>> >> >have the permission to do this from a technical point of view
>>>>> (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>> appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>> the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>> makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>> Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>> find any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>> wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Nick Dimiduk <nd...@apache.org>.
I think this is a great idea! Having just gone through the process of
getting Phoenix up to speed with precommits, it would be really nice to
have a place to go other than "fork/hack someone else's work". For the same
project, I recently integrated its first daemon service. This meant adding
a bunch of servicy Python code (multi platform support is required) which I
only sort of trust. Again, would be great to have an explicit resource for
this kind of thing in the ecosystem. I expect Calcite and Kylin will be
following along shortly.

Since we're tossing out names, how about Apache Bootstrap? It's a
meta-project to help other projects get off the ground, after all.

-n

On Monday, June 15, 2015, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
>
> > Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >
> >
> >
> > On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >
> >> Hi Folks!
> >>
> >> After working on test-patch with other folks for the last few months, I
> >> think we've reached the point where we can make the fastest progress
> >> towards the goal of a general use pre-commit patch tester by spinning
> >> things into a project focused on just that. I think we have a mature
> enough
> >> code base and a sufficient fledgling community, so I'm going to put
> >> together a tlp proposal.
> >>
> >> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >> continue to make things more useful.
> >>
> >> -Sean
> >>
> >> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >>
> >>> HBase's dev-support folder is where the scripts and support files live.
> >>> We've only recently started adding anything to the maven builds that's
> >>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
> I'd
> >>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>
> >>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
> >>> we don't properly back this up anywhere, we just notify each other of
> >>> changes on a particular mail thread[3].
> >>>
> >>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> >>> read because I just finished fixing "mvn site" running out of permgen)
> >>> [3]: http://s.apache.org/NT0
> >>>
> >>>
> >>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com <javascript:;>
> >>> > wrote:
> >>>
> >>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>> HBase
> >>>> repo?  Is there any additional context we need to be aware of?
> >>>>
> >>>> Chris Nauroth
> >>>> Hortonworks
> >>>> http://hortonworks.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com
> <javascript:;>> wrote:
> >>>>
> >>>> >+dev@hbase
> >>>> >
> >>>> >HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>> >them
> >>>> >more robust. From what I can tell our stuff started off as an earlier
> >>>> >version of what Hadoop uses for testing.
> >>>> >
> >>>> >Folks on either side open to an experiment of combining our precommit
> >>>> >check
> >>>> >tooling? In principle we should be looking for the same kinds of
> >>>> things.
> >>>> >
> >>>> >Naturally we'll still need different jenkins jobs to handle different
> >>>> >resource needs and we'd need to figure out where stuff eventually
> >>>> lives,
> >>>> >but that could come later.
> >>>> >
> >>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>> cnauroth@hortonworks.com <javascript:;>>
> >>>> >wrote:
> >>>> >
> >>>> >> The only thing I'm aware of is the failOnError option:
> >>>> >>
> >>>> >>
> >>>> >>
> >>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>> >>rs
> >>>> >> .html
> >>>> >>
> >>>> >>
> >>>> >> I prefer that we don't disable this, because ignoring different
> >>>> kinds of
> >>>> >> failures could leave our build directories in an indeterminate
> state.
> >>>> >>For
> >>>> >> example, we could end up with an old class file on the classpath
> for
> >>>> >>test
> >>>> >> runs that was supposedly deleted.
> >>>> >>
> >>>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>>> failure
> >>>> >> by placing a file where the code expects to see a directory.  That
> >>>> might
> >>>> >> even let us enable some of these tests that are skipped on Windows,
> >>>> >> because Windows allows access for the owner even after permissions
> >>>> have
> >>>> >> been stripped.
> >>>> >>
> >>>> >> Chris Nauroth
> >>>> >> Hortonworks
> >>>> >> http://hortonworks.com/
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
> <javascript:;>> wrote:
> >>>> >>
> >>>> >> >Is there a maven plugin or setting we can use to simply remove
> >>>> >> >directories that have no executable permissions on them?  Clearly
> we
> >>>> >> >have the permission to do this from a technical point of view
> (since
> >>>> >> >we created the directories as the jenkins user), it's simply that
> >>>> the
> >>>> >> >code refuses to do it.
> >>>> >> >
> >>>> >> >Otherwise I guess we can just fix those tests...
> >>>> >> >
> >>>> >> >Colin
> >>>> >> >
> >>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>>> >> >>
> >>>> >> >> In HDFS-7722:
> >>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>> >> >>TearDown().
> >>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
> clause.
> >>>> >> >>
> >>>> >> >> Also I ran mvn test several times on my machine and all tests
> >>>> passed.
> >>>> >> >>
> >>>> >> >> However, since in DiskChecker#checkDirAccess():
> >>>> >> >>
> >>>> >> >> private static void checkDirAccess(File dir) throws
> >>>> >>DiskErrorException {
> >>>> >> >>   if (!dir.isDirectory()) {
> >>>> >> >>     throw new DiskErrorException("Not a directory: "
> >>>> >> >>                                  + dir.toString());
> >>>> >> >>   }
> >>>> >> >>
> >>>> >> >>   checkAccessByFileMethods(dir);
> >>>> >> >> }
> >>>> >> >>
> >>>> >> >> One potentially safer alternative is replacing data dir with a
> >>>> >>regular
> >>>> >> >> file to stimulate disk failures.
> >>>> >> >>
> >>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>> >> >><cnauroth@hortonworks.com <javascript:;>> wrote:
> >>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>>> >>permissions
> >>>> >> >>>from
> >>>> >> >>> directories like the one Colin mentioned to simulate disk
> >>>> failures
> >>>> >>at
> >>>> >> >>>data
> >>>> >> >>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>> >>to be
> >>>> >> >>> doing the necessary work to restore executable permissions at
> the
> >>>> >>end
> >>>> >> >>>of
> >>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>> >> >>>changes
> >>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>> >> >>>though.  I
> >>>> >> >>> don¹t know if there are other uncommitted patches that changed
> >>>> these
> >>>> >> >>>test
> >>>> >> >>> suites.
> >>>> >> >>>
> >>>> >> >>> I suppose it¹s also possible that the JUnit process
> unexpectedly
> >>>> >>died
> >>>> >> >>> after removing executable permissions but before restoring
> them.
> >>>> >>That
> >>>> >> >>> always would have been a weakness of these test suites,
> >>>> regardless
> >>>> >>of
> >>>> >> >>>any
> >>>> >> >>> recent changes.
> >>>> >> >>>
> >>>> >> >>> Chris Nauroth
> >>>> >> >>> Hortonworks
> >>>> >> >>> http://hortonworks.com/
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >>>
> >>>> >> >>>>Hey Colin,
> >>>> >> >>>>
> >>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
> >>>> on
> >>>> >>with
> >>>> >> >>>>these boxes. He took a look and concluded that some perms are
> >>>> being
> >>>> >> >>>>set in
> >>>> >> >>>>those directories by our unit tests which are precluding those
> >>>> files
> >>>> >> >>>>from
> >>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
> we
> >>>> >>should
> >>>> >> >>>>expect this to keep happening until we can fix the test in
> >>>> question
> >>>> >>to
> >>>> >> >>>>properly clean up after itself.
> >>>> >> >>>>
> >>>> >> >>>>To help narrow down which commit it was that started this,
> Andrew
> >>>> >>sent
> >>>> >> >>>>me
> >>>> >> >>>>this info:
> >>>> >> >>>>
> >>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>> >>
> >>>>
> >>>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >>>>>>/
> >>>> >> >>>>has
> >>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
> >>>> since
> >>>> >>9:32
> >>>> >> >>>>UTC
> >>>> >> >>>>on March 5th."
> >>>> >> >>>>
> >>>> >> >>>>--
> >>>> >> >>>>Aaron T. Myers
> >>>> >> >>>>Software Engineer, Cloudera
> >>>> >> >>>>
> >>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>> >><cmccabe@apache.org <javascript:;>>
> >>>> >> >>>>wrote:
> >>>> >> >>>>
> >>>> >> >>>>> Hi all,
> >>>> >> >>>>>
> >>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
> find
> >>>> any
> >>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>> them
> >>>> >> >>>>>seem
> >>>> >> >>>>> to be failing with some variant of this message:
> >>>> >> >>>>>
> >>>> >> >>>>> [ERROR] Failed to execute goal
> >>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>> >>(default-clean)
> >>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>> delete
> >>>> >> >>>>>
> >>>> >> >>>>>
> >>>> >>
> >>>>
> >>>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>> >>>>>>>fs
> >>>> >> >>>>>-pr
> >>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >> >>>>> -> [Help 1]
> >>>> >> >>>>>
> >>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>> >> >>>>> permissions?
> >>>> >> >>>>>
> >>>> >> >>>>> Colin
> >>>> >> >>>>>
> >>>> >> >>>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> --
> >>>> >> >> Lei (Eddy) Xu
> >>>> >> >> Software Engineer, Cloudera
> >>>> >>
> >>>> >>
> >>>> >
> >>>> >
> >>>> >--
> >>>> >Sean
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
It looks like we have consensus.

I'll start drafting up a proposal for the next board meeting (July 15th).
Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
that we did due diligence on whatever we pick.

In the mean time, Hadoop PMC would y'all be willing to host us in a branch
so that we can start prepping things now? We would want branch commit
rights for the proposed new PMC.


-Sean


On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love
> to talk through any feedback on that. Additionally, are there any other
> points folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>> cnauroth@hortonworks.com> wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate
>>>>> state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly
>>>>> we
>>>>> >> >have the permission to do this from a technical point of view
>>>>> (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>> appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>> the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>> makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>> Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>> find any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>> wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
+1 on the idea.

It would be great if tests about dependency management. multiple
branches, and distributed environment can be done in the project. One
discussion point is how Hadoop depends on Yetus, including the
development cycles. It's a good time to rethink what's can be done for
making Hadoop better.

Thanks,
- Tsuyoshi

On Tue, Jun 16, 2015 at 8:47 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>>> >> >have the permission to do this from a technical point of view (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Kengo Seki <se...@gmail.com>.
+1. From its user's viewpoint, recent improvements on test-patch made my
work really efficient.
For example, quick feedback due to avoiding unnecessary tests, automated
build environment setup due to Docker support, automated patch download
from JIRA, automated shellcheck and whitespace checker, etc.
I believe it is worth spreading these ideas as a TLP over other projects
having the same problems such as a long QA process.

2015-06-16 15:08 GMT+09:00 Chris Douglas <cd...@apache.org>:

> +1 A separate project sounds great. It'd be great to have more
> standard tooling across the ecosystem.
>
> As a practical matter, how should projects consume releases? -C
>
> On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Dimiduk (hbase pmc, phoenix pmc)
> > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> > * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> > phoenix pmc)
> > * Allen Wittenauer (hadoop committer)
> >
> > That PMC gives us several members and a bunch of folks familiar with the
> > ASF. Combined with the code already existing in Apache spaces, I think
> that
> > gives us sufficient justification for a direct board proposal.
> >
> > The planned project name is "Apache Yetus". It's an archaic genus of sea
> > snail and most of our project will be focused on shell scripts.
> >
> > N.b.: this does not mean that the Hadoop community would _have_ to rely
> on
> > the new TLP, but I hope that once we have a release that can be evaluated
> > there'd be enough benefit to strongly encourage it.
> >
> > This has mostly been focused on scope and community issues, and I'd love
> to
> > talk through any feedback on that. Additionally, are there any other
> points
> > folks want to make sure are covered before we have a resolution?
> >
> > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> >> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >>
> >>
> >>
> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>
> >>> Hi Folks!
> >>>
> >>> After working on test-patch with other folks for the last few months, I
> >>> think we've reached the point where we can make the fastest progress
> >>> towards the goal of a general use pre-commit patch tester by spinning
> >>> things into a project focused on just that. I think we have a mature
> enough
> >>> code base and a sufficient fledgling community, so I'm going to put
> >>> together a tlp proposal.
> >>>
> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >>> continue to make things more useful.
> >>>
> >>> -Sean
> >>>
> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>>
> >>>> HBase's dev-support folder is where the scripts and support files
> live.
> >>>> We've only recently started adding anything to the maven builds that's
> >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
> where I'd
> >>>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>>
> >>>> There's also our precommit job itself, though it isn't large[2].
> AFAIK,
> >>>> we don't properly back this up anywhere, we just notify each other of
> >>>> changes on a particular mail thread[3].
> >>>>
> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
> all
> >>>> read because I just finished fixing "mvn site" running out of permgen)
> >>>> [3]: http://s.apache.org/NT0
> >>>>
> >>>>
> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com
> >>>> > wrote:
> >>>>
> >>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>>> HBase
> >>>>> repo?  Is there any additional context we need to be aware of?
> >>>>>
> >>>>> Chris Nauroth
> >>>>> Hortonworks
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
> >>>>>
> >>>>> >+dev@hbase
> >>>>> >
> >>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>>> >them
> >>>>> >more robust. From what I can tell our stuff started off as an
> earlier
> >>>>> >version of what Hadoop uses for testing.
> >>>>> >
> >>>>> >Folks on either side open to an experiment of combining our
> precommit
> >>>>> >check
> >>>>> >tooling? In principle we should be looking for the same kinds of
> >>>>> things.
> >>>>> >
> >>>>> >Naturally we'll still need different jenkins jobs to handle
> different
> >>>>> >resource needs and we'd need to figure out where stuff eventually
> >>>>> lives,
> >>>>> >but that could come later.
> >>>>> >
> >>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>>> cnauroth@hortonworks.com>
> >>>>> >wrote:
> >>>>> >
> >>>>> >> The only thing I'm aware of is the failOnError option:
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>>> >>rs
> >>>>> >> .html
> >>>>> >>
> >>>>> >>
> >>>>> >> I prefer that we don't disable this, because ignoring different
> >>>>> kinds of
> >>>>> >> failures could leave our build directories in an indeterminate
> state.
> >>>>> >>For
> >>>>> >> example, we could end up with an old class file on the classpath
> for
> >>>>> >>test
> >>>>> >> runs that was supposedly deleted.
> >>>>> >>
> >>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>>>> failure
> >>>>> >> by placing a file where the code expects to see a directory.  That
> >>>>> might
> >>>>> >> even let us enable some of these tests that are skipped on
> Windows,
> >>>>> >> because Windows allows access for the owner even after permissions
> >>>>> have
> >>>>> >> been stripped.
> >>>>> >>
> >>>>> >> Chris Nauroth
> >>>>> >> Hortonworks
> >>>>> >> http://hortonworks.com/
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> wrote:
> >>>>> >>
> >>>>> >> >Is there a maven plugin or setting we can use to simply remove
> >>>>> >> >directories that have no executable permissions on them?
> Clearly we
> >>>>> >> >have the permission to do this from a technical point of view
> (since
> >>>>> >> >we created the directories as the jenkins user), it's simply that
> >>>>> the
> >>>>> >> >code refuses to do it.
> >>>>> >> >
> >>>>> >> >Otherwise I guess we can just fix those tests...
> >>>>> >> >
> >>>>> >> >Colin
> >>>>> >> >
> >>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> wrote:
> >>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>>>> >> >>
> >>>>> >> >> In HDFS-7722:
> >>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
> in
> >>>>> >> >>TearDown().
> >>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
> clause.
> >>>>> >> >>
> >>>>> >> >> Also I ran mvn test several times on my machine and all tests
> >>>>> passed.
> >>>>> >> >>
> >>>>> >> >> However, since in DiskChecker#checkDirAccess():
> >>>>> >> >>
> >>>>> >> >> private static void checkDirAccess(File dir) throws
> >>>>> >>DiskErrorException {
> >>>>> >> >>   if (!dir.isDirectory()) {
> >>>>> >> >>     throw new DiskErrorException("Not a directory: "
> >>>>> >> >>                                  + dir.toString());
> >>>>> >> >>   }
> >>>>> >> >>
> >>>>> >> >>   checkAccessByFileMethods(dir);
> >>>>> >> >> }
> >>>>> >> >>
> >>>>> >> >> One potentially safer alternative is replacing data dir with a
> >>>>> >>regular
> >>>>> >> >> file to stimulate disk failures.
> >>>>> >> >>
> >>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>>> >> >><cn...@hortonworks.com> wrote:
> >>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>>>> >>permissions
> >>>>> >> >>>from
> >>>>> >> >>> directories like the one Colin mentioned to simulate disk
> >>>>> failures
> >>>>> >>at
> >>>>> >> >>>data
> >>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>>> >>to be
> >>>>> >> >>> doing the necessary work to restore executable permissions at
> the
> >>>>> >>end
> >>>>> >> >>>of
> >>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>> >> >>>changes
> >>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks
> fine
> >>>>> >> >>>though.  I
> >>>>> >> >>> don¹t know if there are other uncommitted patches that changed
> >>>>> these
> >>>>> >> >>>test
> >>>>> >> >>> suites.
> >>>>> >> >>>
> >>>>> >> >>> I suppose it¹s also possible that the JUnit process
> unexpectedly
> >>>>> >>died
> >>>>> >> >>> after removing executable permissions but before restoring
> them.
> >>>>> >>That
> >>>>> >> >>> always would have been a weakness of these test suites,
> >>>>> regardless
> >>>>> >>of
> >>>>> >> >>>any
> >>>>> >> >>> recent changes.
> >>>>> >> >>>
> >>>>> >> >>> Chris Nauroth
> >>>>> >> >>> Hortonworks
> >>>>> >> >>> http://hortonworks.com/
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> wrote:
> >>>>> >> >>>
> >>>>> >> >>>>Hey Colin,
> >>>>> >> >>>>
> >>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
> going
> >>>>> on
> >>>>> >>with
> >>>>> >> >>>>these boxes. He took a look and concluded that some perms are
> >>>>> being
> >>>>> >> >>>>set in
> >>>>> >> >>>>those directories by our unit tests which are precluding those
> >>>>> files
> >>>>> >> >>>>from
> >>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>> >>should
> >>>>> >> >>>>expect this to keep happening until we can fix the test in
> >>>>> question
> >>>>> >>to
> >>>>> >> >>>>properly clean up after itself.
> >>>>> >> >>>>
> >>>>> >> >>>>To help narrow down which commit it was that started this,
> Andrew
> >>>>> >>sent
> >>>>> >> >>>>me
> >>>>> >> >>>>this info:
> >>>>> >> >>>>
> >>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> >>>>>>/
> >>>>> >> >>>>has
> >>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
> >>>>> since
> >>>>> >>9:32
> >>>>> >> >>>>UTC
> >>>>> >> >>>>on March 5th."
> >>>>> >> >>>>
> >>>>> >> >>>>--
> >>>>> >> >>>>Aaron T. Myers
> >>>>> >> >>>>Software Engineer, Cloudera
> >>>>> >> >>>>
> >>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>>> >><cm...@apache.org>
> >>>>> >> >>>>wrote:
> >>>>> >> >>>>
> >>>>> >> >>>>> Hi all,
> >>>>> >> >>>>>
> >>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
> find
> >>>>> any
> >>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>>> them
> >>>>> >> >>>>>seem
> >>>>> >> >>>>> to be failing with some variant of this message:
> >>>>> >> >>>>>
> >>>>> >> >>>>> [ERROR] Failed to execute goal
> >>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>> >>(default-clean)
> >>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>>> delete
> >>>>> >> >>>>>
> >>>>> >> >>>>>
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>> >>>>>>>fs
> >>>>> >> >>>>>-pr
> >>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> >> >>>>> -> [Help 1]
> >>>>> >> >>>>>
> >>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>> >> >>>>> permissions?
> >>>>> >> >>>>>
> >>>>> >> >>>>> Colin
> >>>>> >> >>>>>
> >>>>> >> >>>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> --
> >>>>> >> >> Lei (Eddy) Xu
> >>>>> >> >> Software Engineer, Cloudera
> >>>>> >>
> >>>>> >>
> >>>>> >
> >>>>> >
> >>>>> >--
> >>>>> >Sean
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Kengo Seki <se...@gmail.com>.
+1. From its user's viewpoint, recent improvements on test-patch made my
work really efficient.
For example, quick feedback due to avoiding unnecessary tests, automated
build environment setup due to Docker support, automated patch download
from JIRA, automated shellcheck and whitespace checker, etc.
I believe it is worth spreading these ideas as a TLP over other projects
having the same problems such as a long QA process.

2015-06-16 15:08 GMT+09:00 Chris Douglas <cd...@apache.org>:

> +1 A separate project sounds great. It'd be great to have more
> standard tooling across the ecosystem.
>
> As a practical matter, how should projects consume releases? -C
>
> On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Dimiduk (hbase pmc, phoenix pmc)
> > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> > * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> > phoenix pmc)
> > * Allen Wittenauer (hadoop committer)
> >
> > That PMC gives us several members and a bunch of folks familiar with the
> > ASF. Combined with the code already existing in Apache spaces, I think
> that
> > gives us sufficient justification for a direct board proposal.
> >
> > The planned project name is "Apache Yetus". It's an archaic genus of sea
> > snail and most of our project will be focused on shell scripts.
> >
> > N.b.: this does not mean that the Hadoop community would _have_ to rely
> on
> > the new TLP, but I hope that once we have a release that can be evaluated
> > there'd be enough benefit to strongly encourage it.
> >
> > This has mostly been focused on scope and community issues, and I'd love
> to
> > talk through any feedback on that. Additionally, are there any other
> points
> > folks want to make sure are covered before we have a resolution?
> >
> > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> >> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >>
> >>
> >>
> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>
> >>> Hi Folks!
> >>>
> >>> After working on test-patch with other folks for the last few months, I
> >>> think we've reached the point where we can make the fastest progress
> >>> towards the goal of a general use pre-commit patch tester by spinning
> >>> things into a project focused on just that. I think we have a mature
> enough
> >>> code base and a sufficient fledgling community, so I'm going to put
> >>> together a tlp proposal.
> >>>
> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >>> continue to make things more useful.
> >>>
> >>> -Sean
> >>>
> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>>
> >>>> HBase's dev-support folder is where the scripts and support files
> live.
> >>>> We've only recently started adding anything to the maven builds that's
> >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
> where I'd
> >>>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>>
> >>>> There's also our precommit job itself, though it isn't large[2].
> AFAIK,
> >>>> we don't properly back this up anywhere, we just notify each other of
> >>>> changes on a particular mail thread[3].
> >>>>
> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
> all
> >>>> read because I just finished fixing "mvn site" running out of permgen)
> >>>> [3]: http://s.apache.org/NT0
> >>>>
> >>>>
> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com
> >>>> > wrote:
> >>>>
> >>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>>> HBase
> >>>>> repo?  Is there any additional context we need to be aware of?
> >>>>>
> >>>>> Chris Nauroth
> >>>>> Hortonworks
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
> >>>>>
> >>>>> >+dev@hbase
> >>>>> >
> >>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>>> >them
> >>>>> >more robust. From what I can tell our stuff started off as an
> earlier
> >>>>> >version of what Hadoop uses for testing.
> >>>>> >
> >>>>> >Folks on either side open to an experiment of combining our
> precommit
> >>>>> >check
> >>>>> >tooling? In principle we should be looking for the same kinds of
> >>>>> things.
> >>>>> >
> >>>>> >Naturally we'll still need different jenkins jobs to handle
> different
> >>>>> >resource needs and we'd need to figure out where stuff eventually
> >>>>> lives,
> >>>>> >but that could come later.
> >>>>> >
> >>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>>> cnauroth@hortonworks.com>
> >>>>> >wrote:
> >>>>> >
> >>>>> >> The only thing I'm aware of is the failOnError option:
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>>> >>rs
> >>>>> >> .html
> >>>>> >>
> >>>>> >>
> >>>>> >> I prefer that we don't disable this, because ignoring different
> >>>>> kinds of
> >>>>> >> failures could leave our build directories in an indeterminate
> state.
> >>>>> >>For
> >>>>> >> example, we could end up with an old class file on the classpath
> for
> >>>>> >>test
> >>>>> >> runs that was supposedly deleted.
> >>>>> >>
> >>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>>>> failure
> >>>>> >> by placing a file where the code expects to see a directory.  That
> >>>>> might
> >>>>> >> even let us enable some of these tests that are skipped on
> Windows,
> >>>>> >> because Windows allows access for the owner even after permissions
> >>>>> have
> >>>>> >> been stripped.
> >>>>> >>
> >>>>> >> Chris Nauroth
> >>>>> >> Hortonworks
> >>>>> >> http://hortonworks.com/
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> wrote:
> >>>>> >>
> >>>>> >> >Is there a maven plugin or setting we can use to simply remove
> >>>>> >> >directories that have no executable permissions on them?
> Clearly we
> >>>>> >> >have the permission to do this from a technical point of view
> (since
> >>>>> >> >we created the directories as the jenkins user), it's simply that
> >>>>> the
> >>>>> >> >code refuses to do it.
> >>>>> >> >
> >>>>> >> >Otherwise I guess we can just fix those tests...
> >>>>> >> >
> >>>>> >> >Colin
> >>>>> >> >
> >>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> wrote:
> >>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>>>> >> >>
> >>>>> >> >> In HDFS-7722:
> >>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
> in
> >>>>> >> >>TearDown().
> >>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
> clause.
> >>>>> >> >>
> >>>>> >> >> Also I ran mvn test several times on my machine and all tests
> >>>>> passed.
> >>>>> >> >>
> >>>>> >> >> However, since in DiskChecker#checkDirAccess():
> >>>>> >> >>
> >>>>> >> >> private static void checkDirAccess(File dir) throws
> >>>>> >>DiskErrorException {
> >>>>> >> >>   if (!dir.isDirectory()) {
> >>>>> >> >>     throw new DiskErrorException("Not a directory: "
> >>>>> >> >>                                  + dir.toString());
> >>>>> >> >>   }
> >>>>> >> >>
> >>>>> >> >>   checkAccessByFileMethods(dir);
> >>>>> >> >> }
> >>>>> >> >>
> >>>>> >> >> One potentially safer alternative is replacing data dir with a
> >>>>> >>regular
> >>>>> >> >> file to stimulate disk failures.
> >>>>> >> >>
> >>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>>> >> >><cn...@hortonworks.com> wrote:
> >>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>>>> >>permissions
> >>>>> >> >>>from
> >>>>> >> >>> directories like the one Colin mentioned to simulate disk
> >>>>> failures
> >>>>> >>at
> >>>>> >> >>>data
> >>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>>> >>to be
> >>>>> >> >>> doing the necessary work to restore executable permissions at
> the
> >>>>> >>end
> >>>>> >> >>>of
> >>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>> >> >>>changes
> >>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks
> fine
> >>>>> >> >>>though.  I
> >>>>> >> >>> don¹t know if there are other uncommitted patches that changed
> >>>>> these
> >>>>> >> >>>test
> >>>>> >> >>> suites.
> >>>>> >> >>>
> >>>>> >> >>> I suppose it¹s also possible that the JUnit process
> unexpectedly
> >>>>> >>died
> >>>>> >> >>> after removing executable permissions but before restoring
> them.
> >>>>> >>That
> >>>>> >> >>> always would have been a weakness of these test suites,
> >>>>> regardless
> >>>>> >>of
> >>>>> >> >>>any
> >>>>> >> >>> recent changes.
> >>>>> >> >>>
> >>>>> >> >>> Chris Nauroth
> >>>>> >> >>> Hortonworks
> >>>>> >> >>> http://hortonworks.com/
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> wrote:
> >>>>> >> >>>
> >>>>> >> >>>>Hey Colin,
> >>>>> >> >>>>
> >>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
> going
> >>>>> on
> >>>>> >>with
> >>>>> >> >>>>these boxes. He took a look and concluded that some perms are
> >>>>> being
> >>>>> >> >>>>set in
> >>>>> >> >>>>those directories by our unit tests which are precluding those
> >>>>> files
> >>>>> >> >>>>from
> >>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>> >>should
> >>>>> >> >>>>expect this to keep happening until we can fix the test in
> >>>>> question
> >>>>> >>to
> >>>>> >> >>>>properly clean up after itself.
> >>>>> >> >>>>
> >>>>> >> >>>>To help narrow down which commit it was that started this,
> Andrew
> >>>>> >>sent
> >>>>> >> >>>>me
> >>>>> >> >>>>this info:
> >>>>> >> >>>>
> >>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> >>>>>>/
> >>>>> >> >>>>has
> >>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
> >>>>> since
> >>>>> >>9:32
> >>>>> >> >>>>UTC
> >>>>> >> >>>>on March 5th."
> >>>>> >> >>>>
> >>>>> >> >>>>--
> >>>>> >> >>>>Aaron T. Myers
> >>>>> >> >>>>Software Engineer, Cloudera
> >>>>> >> >>>>
> >>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>>> >><cm...@apache.org>
> >>>>> >> >>>>wrote:
> >>>>> >> >>>>
> >>>>> >> >>>>> Hi all,
> >>>>> >> >>>>>
> >>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
> find
> >>>>> any
> >>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>>> them
> >>>>> >> >>>>>seem
> >>>>> >> >>>>> to be failing with some variant of this message:
> >>>>> >> >>>>>
> >>>>> >> >>>>> [ERROR] Failed to execute goal
> >>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>> >>(default-clean)
> >>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>>> delete
> >>>>> >> >>>>>
> >>>>> >> >>>>>
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>> >>>>>>>fs
> >>>>> >> >>>>>-pr
> >>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> >> >>>>> -> [Help 1]
> >>>>> >> >>>>>
> >>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>> >> >>>>> permissions?
> >>>>> >> >>>>>
> >>>>> >> >>>>> Colin
> >>>>> >> >>>>>
> >>>>> >> >>>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> --
> >>>>> >> >> Lei (Eddy) Xu
> >>>>> >> >> Software Engineer, Cloudera
> >>>>> >>
> >>>>> >>
> >>>>> >
> >>>>> >
> >>>>> >--
> >>>>> >Sean
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
>

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Chris Douglas <cd...@apache.org>.
+1 A separate project sounds great. It'd be great to have more
standard tooling across the ecosystem.

As a practical matter, how should projects consume releases? -C

On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>>> >> >have the permission to do this from a technical point of view (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Chris Douglas <cd...@apache.org>.
+1 A separate project sounds great. It'd be great to have more
standard tooling across the ecosystem.

As a practical matter, how should projects consume releases? -C

On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>
>>
>>
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Hi Folks!
>>>
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>>
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>>
>>> -Sean
>>>
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>>
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>>
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>>
>>>>
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>
>>>>> >+dev@hbase
>>>>> >
>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>> >them
>>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>>> >version of what Hadoop uses for testing.
>>>>> >
>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>> >check
>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>> >
>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>> >but that could come later.
>>>>> >
>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> >wrote:
>>>>> >
>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>> >>rs
>>>>> >> .html
>>>>> >>
>>>>> >>
>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>> >> failures could leave our build directories in an indeterminate state.
>>>>> >>For
>>>>> >> example, we could end up with an old class file on the classpath for
>>>>> >>test
>>>>> >> runs that was supposedly deleted.
>>>>> >>
>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>> >> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>>> >> because Windows allows access for the owner even after permissions
>>>>> have
>>>>> >> been stripped.
>>>>> >>
>>>>> >> Chris Nauroth
>>>>> >> Hortonworks
>>>>> >> http://hortonworks.com/
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>> >>
>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>>> >> >have the permission to do this from a technical point of view (since
>>>>> >> >we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>> >> >code refuses to do it.
>>>>> >> >
>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>> >> >
>>>>> >> >Colin
>>>>> >> >
>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>> >> >>
>>>>> >> >> In HDFS-7722:
>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> >> >>TearDown().
>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> >> >>
>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>> >> >>
>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>> >> >>
>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>> >>DiskErrorException {
>>>>> >> >>   if (!dir.isDirectory()) {
>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>> >> >>                                  + dir.toString());
>>>>> >> >>   }
>>>>> >> >>
>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>> >> >> }
>>>>> >> >>
>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>> >>regular
>>>>> >> >> file to stimulate disk failures.
>>>>> >> >>
>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>> >>permissions
>>>>> >> >>>from
>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>> >>at
>>>>> >> >>>data
>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>>> >>to be
>>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>>> >>end
>>>>> >> >>>of
>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> >> >>>changes
>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> >> >>>though.  I
>>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>> >> >>>test
>>>>> >> >>> suites.
>>>>> >> >>>
>>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>> >>died
>>>>> >> >>> after removing executable permissions but before restoring them.
>>>>> >>That
>>>>> >> >>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>> >>of
>>>>> >> >>>any
>>>>> >> >>> recent changes.
>>>>> >> >>>
>>>>> >> >>> Chris Nauroth
>>>>> >> >>> Hortonworks
>>>>> >> >>> http://hortonworks.com/
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> >> >>>
>>>>> >> >>>>Hey Colin,
>>>>> >> >>>>
>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>> >>with
>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>> >> >>>>set in
>>>>> >> >>>>those directories by our unit tests which are precluding those
>>>>> files
>>>>> >> >>>>from
>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> >>should
>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>> question
>>>>> >>to
>>>>> >> >>>>properly clean up after itself.
>>>>> >> >>>>
>>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>>> >>sent
>>>>> >> >>>>me
>>>>> >> >>>>this info:
>>>>> >> >>>>
>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >>
>>>>>
>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >>>>>>/
>>>>> >> >>>>has
>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>> >>9:32
>>>>> >> >>>>UTC
>>>>> >> >>>>on March 5th."
>>>>> >> >>>>
>>>>> >> >>>>--
>>>>> >> >>>>Aaron T. Myers
>>>>> >> >>>>Software Engineer, Cloudera
>>>>> >> >>>>
>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>> >><cm...@apache.org>
>>>>> >> >>>>wrote:
>>>>> >> >>>>
>>>>> >> >>>>> Hi all,
>>>>> >> >>>>>
>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>> >> >>>>>seem
>>>>> >> >>>>> to be failing with some variant of this message:
>>>>> >> >>>>>
>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> >>(default-clean)
>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >>
>>>>>
>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>> >>>>>>>fs
>>>>> >> >>>>>-pr
>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> >> >>>>> -> [Help 1]
>>>>> >> >>>>>
>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> >> >>>>> permissions?
>>>>> >> >>>>>
>>>>> >> >>>>> Colin
>>>>> >> >>>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Lei (Eddy) Xu
>>>>> >> >> Software Engineer, Cloudera
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >--
>>>>> >Sean
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Allen Wittenauer <aw...@altiscale.com>.
	I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of test-patch, it was amazing to see how far and wide this bit of code as spread.  So I see consolidating everyone's efforts as a huge win for a large number of projects.  (esp considering how many I saw suffering from a variety of identified bugs! )

	But….

	I think it's important for people involved in those other projects to speak up and voice an opinion as to whether this is useful. 

To summarize:

	In the short term, a single location to get/use a precommit patch tester rather than everyone building/supporting their own in their spare time. 

	 FWIW, we've already got the code base modified to be pluggable.  We've written some basic/simple plugins that support Hadoop, HBase, Tajo, Tez, Pig, and Flink.  For HBase and Flink, this does include their custom checks.  Adding support for other project shouldn't be hard.  Simple projects take almost no time after seeing the basic pattern.

	I think it's worthwhile highlighting that means support for both JIRA and GitHub as well as Ant and Maven from the same code base.

Longer term:

	Well, we clearly have ideas of things that we want to do. Adding more features to test-patch (review board? gradle?) is obvious. But what about teasing apart and generalizing some of the other shell bits from projects? A common library for building CLI tools to fault injection to release documentation creation tools to …  I'd even like to see us get as advanced as a "run this program to auto-generate daemon stop/start bits".

	I had a few chats with people about this idea at Hadoop Summit.  What's truly exciting are the ideas that people had once they realized what kinds of problems we're trying to solve.  It's always amazing the problems that projects have that could be solved by these types of solutions.  Let's stop hiding our cool toys in this area.

	So, what feedback and ideas do you have in this area?  Are you a yay or a nay?


On Jun 15, 2015, at 4:47 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
> 
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
> 
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
> 
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
> 
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
> 
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
> 
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
> 
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
> 
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
> 
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
> 
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
> 
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
> 
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:
> 
>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>> 
>> 
>> 
>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>> 
>>> Hi Folks!
>>> 
>>> After working on test-patch with other folks for the last few months, I
>>> think we've reached the point where we can make the fastest progress
>>> towards the goal of a general use pre-commit patch tester by spinning
>>> things into a project focused on just that. I think we have a mature enough
>>> code base and a sufficient fledgling community, so I'm going to put
>>> together a tlp proposal.
>>> 
>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>> continue to make things more useful.
>>> 
>>> -Sean
>>> 
>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>> 
>>>> HBase's dev-support folder is where the scripts and support files live.
>>>> We've only recently started adding anything to the maven builds that's
>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>>> add in more if we ran into the same permissions problems y'all are having.
>>>> 
>>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>>> we don't properly back this up anywhere, we just notify each other of
>>>> changes on a particular mail thread[3].
>>>> 
>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>>> read because I just finished fixing "mvn site" running out of permgen)
>>>> [3]: http://s.apache.org/NT0
>>>> 
>>>> 
>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>>>> wrote:
>>>> 
>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>> HBase
>>>>> repo?  Is there any additional context we need to be aware of?
>>>>> 
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>> 
>>>>>> +dev@hbase
>>>>>> 
>>>>>> HBase has recently been cleaning up our precommit jenkins jobs to make
>>>>>> them
>>>>>> more robust. From what I can tell our stuff started off as an earlier
>>>>>> version of what Hadoop uses for testing.
>>>>>> 
>>>>>> Folks on either side open to an experiment of combining our precommit
>>>>>> check
>>>>>> tooling? In principle we should be looking for the same kinds of
>>>>> things.
>>>>>> 
>>>>>> Naturally we'll still need different jenkins jobs to handle different
>>>>>> resource needs and we'd need to figure out where stuff eventually
>>>>> lives,
>>>>>> but that could come later.
>>>>>> 
>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>> rs
>>>>>>> .html
>>>>>>> 
>>>>>>> 
>>>>>>> I prefer that we don't disable this, because ignoring different
>>>>> kinds of
>>>>>>> failures could leave our build directories in an indeterminate state.
>>>>>>> For
>>>>>>> example, we could end up with an old class file on the classpath for
>>>>>>> test
>>>>>>> runs that was supposedly deleted.
>>>>>>> 
>>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
>>>>> failure
>>>>>>> by placing a file where the code expects to see a directory.  That
>>>>> might
>>>>>>> even let us enable some of these tests that are skipped on Windows,
>>>>>>> because Windows allows access for the owner even after permissions
>>>>> have
>>>>>>> been stripped.
>>>>>>> 
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>>>> 
>>>>>>>> Is there a maven plugin or setting we can use to simply remove
>>>>>>>> directories that have no executable permissions on them?  Clearly we
>>>>>>>> have the permission to do this from a technical point of view (since
>>>>>>>> we created the directories as the jenkins user), it's simply that
>>>>> the
>>>>>>>> code refuses to do it.
>>>>>>>> 
>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>> 
>>>>>>>> Colin
>>>>>>>> 
>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>>>> 
>>>>>>>>> In HDFS-7722:
>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>>>>> TearDown().
>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>>>>>> 
>>>>>>>>> Also I ran mvn test several times on my machine and all tests
>>>>> passed.
>>>>>>>>> 
>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>> 
>>>>>>>>> private static void checkDirAccess(File dir) throws
>>>>>>> DiskErrorException {
>>>>>>>>>  if (!dir.isDirectory()) {
>>>>>>>>>    throw new DiskErrorException("Not a directory: "
>>>>>>>>>                                 + dir.toString());
>>>>>>>>>  }
>>>>>>>>> 
>>>>>>>>>  checkAccessByFileMethods(dir);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> One potentially safer alternative is replacing data dir with a
>>>>>>> regular
>>>>>>>>> file to stimulate disk failures.
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>> <cn...@hortonworks.com> wrote:
>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>> permissions
>>>>>>>>>> from
>>>>>>>>>> directories like the one Colin mentioned to simulate disk
>>>>> failures
>>>>>>> at
>>>>>>>>>> data
>>>>>>>>>> nodes.  I reviewed the code for all of those, and they all appear
>>>>>>> to be
>>>>>>>>>> doing the necessary work to restore executable permissions at the
>>>>>>> end
>>>>>>>>>> of
>>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>>>>>>> changes
>>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>>>>>> though.  I
>>>>>>>>>> don¹t know if there are other uncommitted patches that changed
>>>>> these
>>>>>>>>>> test
>>>>>>>>>> suites.
>>>>>>>>>> 
>>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>>>>> died
>>>>>>>>>> after removing executable permissions but before restoring them.
>>>>>>> That
>>>>>>>>>> always would have been a weakness of these test suites,
>>>>> regardless
>>>>>>> of
>>>>>>>>>> any
>>>>>>>>>> recent changes.
>>>>>>>>>> 
>>>>>>>>>> Chris Nauroth
>>>>>>>>>> Hortonworks
>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hey Colin,
>>>>>>>>>>> 
>>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
>>>>> on
>>>>>>> with
>>>>>>>>>>> these boxes. He took a look and concluded that some perms are
>>>>> being
>>>>>>>>>>> set in
>>>>>>>>>>> those directories by our unit tests which are precluding those
>>>>> files
>>>>>>>>>>> from
>>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but we
>>>>>>> should
>>>>>>>>>>> expect this to keep happening until we can fix the test in
>>>>> question
>>>>>>> to
>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>> 
>>>>>>>>>>> To help narrow down which commit it was that started this, Andrew
>>>>>>> sent
>>>>>>>>>>> me
>>>>>>>>>>> this info:
>>>>>>>>>>> 
>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>>> 
>>>>> 
>>>>>>>>>>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>> /
>>>>>>>>>>> has
>>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
>>>>> since
>>>>>>> 9:32
>>>>>>>>>>> UTC
>>>>>>>>>>> on March 5th."
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>> <cm...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
>>>>> any
>>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>> them
>>>>>>>>>>>> seem
>>>>>>>>>>>> to be failing with some variant of this message:
>>>>>>>>>>>> 
>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>> (default-clean)
>>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>> delete
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>>>>>>>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>> fs
>>>>>>>>>>>> -pr
>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>> -> [Help 1]
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>>>>>>>> permissions?
>>>>>>>>>>>> 
>>>>>>>>>>>> Colin
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sean
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sean
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sean
>>> 
>> 
>> 
>> 
>> --
>> Sean
>> 
> 
> 
> 
> -- 
> Sean


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
Oof. I had meant to push on this again but life got in the way and now the
June board meeting is upon us. Sorry everyone. In the event that this ends
up contentious, hopefully one of the copied communities can give us a
branch to work in.

I know everyone is busy, so here's the short version of this email: I'd
like to move some of the code currently in Hadoop (test-patch) into a new
TLP focused on QA tooling. I'm not sure what the best format for priming
this conversation is. ORC filled in the incubator project proposal
template, but I'm not sure how much that confused the issue. So to start,
I'll just write what I'm hoping we can accomplish in general terms here.

All software development projects that are community based (that is,
accepting outside contributions) face a common QA problem for vetting
in-coming contributions. Hadoop is fortunate enough to be sufficiently
popular that the weight of the problem drove tool development (i.e.
test-patch). That tool is generalizable enough that a bunch of other TLPs
have adopted their own forks. Unfortunately, in most projects this kind of
QA work is an enabler rather than a primary concern, so often the tooling
is worked on ad-hoc and little shared improvements happen across
projects. Since
the tooling itself is never a primary concern, any made is rarely reused
outside of ASF projects.

Over the last couple months a few of us have been working on generalizing
the tooling present in the Hadoop code base (because it was the most mature
out of all those in the various projects) and it's reached a point where we
think we can start bringing on other downstream users. This means we need
to start establishing things like a release cadence and to grow the new
contributors we have to handle more project responsibility. Personally, I
think that means it's time to move out from under Hadoop to drive things as
our own community. Eventually, I hope the community can help draw in a
group of folks traditionally underrepresented in ASF projects, namely QA
and operations folks.

I think test-patch by itself has enough scope to justify a project. Having
a solid set of build tools that are customizable to fit the norms of
different software communities is a bunch of work. Making it work well in
both the context of automated test systems like Jenkins and for individual
developers is even more work. We could easily also take over maintenance of
things like shelldocs, since test-patch is the primary consumer of that
currently but it's generally useful tooling.

In addition to test-patch, I think the proposed project has some future
growth potential. Given some adoption of test-patch to prove utility, the
project could build on the ties it makes to start building tools to help
projects do their own longer-run testing. Note that I'm talking about the
tools to build QA processes and not a particular set of tested components.
Specifically, I think the ChaosMonkey work that's in HBase should be
generalizable as a fault injection framework (either based on that code or
something like it). Doing this for arbitrary software is obviously very
difficult, and a part of easing that will be to make (and then favor)
tooling to allow projects to have operational glue that looks the same.
Namely, the shell work that's been done in hadoop-functions.sh would be a
great foundational layer that could bring good daemon handling practices to
a whole slew of software projects. In the event that these frameworks and
tools get adopted by parts of the Hadoop ecosystem, that could make the job
of i.e. Bigtop substantially easier.

I've reached out to a few folks who have been involved in the current
test-patch work or expressed interest in helping out on getting it used in
other projects. Right now, the proposed PMC would be (alphabetical by last
name):

* Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
pmc, sqoop pmc, all around Jenkins expert)
* Sean Busbey (ASF member, accumulo pmc, hbase pmc)
* Nick Dimiduk (hbase pmc, phoenix pmc)
* Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
* Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
phoenix pmc)
* Allen Wittenauer (hadoop committer)

That PMC gives us several members and a bunch of folks familiar with the
ASF. Combined with the code already existing in Apache spaces, I think that
gives us sufficient justification for a direct board proposal.

The planned project name is "Apache Yetus". It's an archaic genus of sea
snail and most of our project will be focused on shell scripts.

N.b.: this does not mean that the Hadoop community would _have_ to rely on
the new TLP, but I hope that once we have a release that can be evaluated
there'd be enough benefit to strongly encourage it.

This has mostly been focused on scope and community issues, and I'd love to
talk through any feedback on that. Additionally, are there any other points
folks want to make sure are covered before we have a resolution?

On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>
>
>
> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Hi Folks!
>>
>> After working on test-patch with other folks for the last few months, I
>> think we've reached the point where we can make the fastest progress
>> towards the goal of a general use pre-commit patch tester by spinning
>> things into a project focused on just that. I think we have a mature enough
>> code base and a sufficient fledgling community, so I'm going to put
>> together a tlp proposal.
>>
>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>> continue to make things more useful.
>>
>> -Sean
>>
>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> HBase's dev-support folder is where the scripts and support files live.
>>> We've only recently started adding anything to the maven builds that's
>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>> add in more if we ran into the same permissions problems y'all are having.
>>>
>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>> we don't properly back this up anywhere, we just notify each other of
>>> changes on a particular mail thread[3].
>>>
>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>> read because I just finished fixing "mvn site" running out of permgen)
>>> [3]: http://s.apache.org/NT0
>>>
>>>
>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>> > wrote:
>>>
>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>> HBase
>>>> repo?  Is there any additional context we need to be aware of?
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>
>>>> >+dev@hbase
>>>> >
>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>> >them
>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>> >version of what Hadoop uses for testing.
>>>> >
>>>> >Folks on either side open to an experiment of combining our precommit
>>>> >check
>>>> >tooling? In principle we should be looking for the same kinds of
>>>> things.
>>>> >
>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>> >resource needs and we'd need to figure out where stuff eventually
>>>> lives,
>>>> >but that could come later.
>>>> >
>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>> cnauroth@hortonworks.com>
>>>> >wrote:
>>>> >
>>>> >> The only thing I'm aware of is the failOnError option:
>>>> >>
>>>> >>
>>>> >>
>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>> >>rs
>>>> >> .html
>>>> >>
>>>> >>
>>>> >> I prefer that we don't disable this, because ignoring different
>>>> kinds of
>>>> >> failures could leave our build directories in an indeterminate state.
>>>> >>For
>>>> >> example, we could end up with an old class file on the classpath for
>>>> >>test
>>>> >> runs that was supposedly deleted.
>>>> >>
>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>> failure
>>>> >> by placing a file where the code expects to see a directory.  That
>>>> might
>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>> >> because Windows allows access for the owner even after permissions
>>>> have
>>>> >> been stripped.
>>>> >>
>>>> >> Chris Nauroth
>>>> >> Hortonworks
>>>> >> http://hortonworks.com/
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>> >>
>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>> >> >have the permission to do this from a technical point of view (since
>>>> >> >we created the directories as the jenkins user), it's simply that
>>>> the
>>>> >> >code refuses to do it.
>>>> >> >
>>>> >> >Otherwise I guess we can just fix those tests...
>>>> >> >
>>>> >> >Colin
>>>> >> >
>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>> >> >>
>>>> >> >> In HDFS-7722:
>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>> >> >>TearDown().
>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>> >> >>
>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>> passed.
>>>> >> >>
>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>> >> >>
>>>> >> >> private static void checkDirAccess(File dir) throws
>>>> >>DiskErrorException {
>>>> >> >>   if (!dir.isDirectory()) {
>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>> >> >>                                  + dir.toString());
>>>> >> >>   }
>>>> >> >>
>>>> >> >>   checkAccessByFileMethods(dir);
>>>> >> >> }
>>>> >> >>
>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>> >>regular
>>>> >> >> file to stimulate disk failures.
>>>> >> >>
>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>> >> >><cn...@hortonworks.com> wrote:
>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>> >>permissions
>>>> >> >>>from
>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>> failures
>>>> >>at
>>>> >> >>>data
>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>> >>to be
>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>> >>end
>>>> >> >>>of
>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>> >> >>>changes
>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>> >> >>>though.  I
>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>> these
>>>> >> >>>test
>>>> >> >>> suites.
>>>> >> >>>
>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>> >>died
>>>> >> >>> after removing executable permissions but before restoring them.
>>>> >>That
>>>> >> >>> always would have been a weakness of these test suites,
>>>> regardless
>>>> >>of
>>>> >> >>>any
>>>> >> >>> recent changes.
>>>> >> >>>
>>>> >> >>> Chris Nauroth
>>>> >> >>> Hortonworks
>>>> >> >>> http://hortonworks.com/
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>> >> >>>
>>>> >> >>>>Hey Colin,
>>>> >> >>>>
>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>> on
>>>> >>with
>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>> being
>>>> >> >>>>set in
>>>> >> >>>>those directories by our unit tests which are precluding those
>>>> files
>>>> >> >>>>from
>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>> >>should
>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>> question
>>>> >>to
>>>> >> >>>>properly clean up after itself.
>>>> >> >>>>
>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>> >>sent
>>>> >> >>>>me
>>>> >> >>>>this info:
>>>> >> >>>>
>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>> >>
>>>>
>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> >>>>>>/
>>>> >> >>>>has
>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>> since
>>>> >>9:32
>>>> >> >>>>UTC
>>>> >> >>>>on March 5th."
>>>> >> >>>>
>>>> >> >>>>--
>>>> >> >>>>Aaron T. Myers
>>>> >> >>>>Software Engineer, Cloudera
>>>> >> >>>>
>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>> >><cm...@apache.org>
>>>> >> >>>>wrote:
>>>> >> >>>>
>>>> >> >>>>> Hi all,
>>>> >> >>>>>
>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>> any
>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>> them
>>>> >> >>>>>seem
>>>> >> >>>>> to be failing with some variant of this message:
>>>> >> >>>>>
>>>> >> >>>>> [ERROR] Failed to execute goal
>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>> >>(default-clean)
>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>> delete
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >>
>>>>
>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>> >>>>>>>fs
>>>> >> >>>>>-pr
>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> >> >>>>> -> [Help 1]
>>>> >> >>>>>
>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> >> >>>>> permissions?
>>>> >> >>>>>
>>>> >> >>>>> Colin
>>>> >> >>>>>
>>>> >> >>>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Lei (Eddy) Xu
>>>> >> >> Software Engineer, Cloudera
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >--
>>>> >Sean
>>>>
>>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Posted by Sean Busbey <bu...@cloudera.com>.
Oof. I had meant to push on this again but life got in the way and now the
June board meeting is upon us. Sorry everyone. In the event that this ends
up contentious, hopefully one of the copied communities can give us a
branch to work in.

I know everyone is busy, so here's the short version of this email: I'd
like to move some of the code currently in Hadoop (test-patch) into a new
TLP focused on QA tooling. I'm not sure what the best format for priming
this conversation is. ORC filled in the incubator project proposal
template, but I'm not sure how much that confused the issue. So to start,
I'll just write what I'm hoping we can accomplish in general terms here.

All software development projects that are community based (that is,
accepting outside contributions) face a common QA problem for vetting
in-coming contributions. Hadoop is fortunate enough to be sufficiently
popular that the weight of the problem drove tool development (i.e.
test-patch). That tool is generalizable enough that a bunch of other TLPs
have adopted their own forks. Unfortunately, in most projects this kind of
QA work is an enabler rather than a primary concern, so often the tooling
is worked on ad-hoc and little shared improvements happen across
projects. Since
the tooling itself is never a primary concern, any made is rarely reused
outside of ASF projects.

Over the last couple months a few of us have been working on generalizing
the tooling present in the Hadoop code base (because it was the most mature
out of all those in the various projects) and it's reached a point where we
think we can start bringing on other downstream users. This means we need
to start establishing things like a release cadence and to grow the new
contributors we have to handle more project responsibility. Personally, I
think that means it's time to move out from under Hadoop to drive things as
our own community. Eventually, I hope the community can help draw in a
group of folks traditionally underrepresented in ASF projects, namely QA
and operations folks.

I think test-patch by itself has enough scope to justify a project. Having
a solid set of build tools that are customizable to fit the norms of
different software communities is a bunch of work. Making it work well in
both the context of automated test systems like Jenkins and for individual
developers is even more work. We could easily also take over maintenance of
things like shelldocs, since test-patch is the primary consumer of that
currently but it's generally useful tooling.

In addition to test-patch, I think the proposed project has some future
growth potential. Given some adoption of test-patch to prove utility, the
project could build on the ties it makes to start building tools to help
projects do their own longer-run testing. Note that I'm talking about the
tools to build QA processes and not a particular set of tested components.
Specifically, I think the ChaosMonkey work that's in HBase should be
generalizable as a fault injection framework (either based on that code or
something like it). Doing this for arbitrary software is obviously very
difficult, and a part of easing that will be to make (and then favor)
tooling to allow projects to have operational glue that looks the same.
Namely, the shell work that's been done in hadoop-functions.sh would be a
great foundational layer that could bring good daemon handling practices to
a whole slew of software projects. In the event that these frameworks and
tools get adopted by parts of the Hadoop ecosystem, that could make the job
of i.e. Bigtop substantially easier.

I've reached out to a few folks who have been involved in the current
test-patch work or expressed interest in helping out on getting it used in
other projects. Right now, the proposed PMC would be (alphabetical by last
name):

* Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
pmc, sqoop pmc, all around Jenkins expert)
* Sean Busbey (ASF member, accumulo pmc, hbase pmc)
* Nick Dimiduk (hbase pmc, phoenix pmc)
* Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
* Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
phoenix pmc)
* Allen Wittenauer (hadoop committer)

That PMC gives us several members and a bunch of folks familiar with the
ASF. Combined with the code already existing in Apache spaces, I think that
gives us sufficient justification for a direct board proposal.

The planned project name is "Apache Yetus". It's an archaic genus of sea
snail and most of our project will be focused on shell scripts.

N.b.: this does not mean that the Hadoop community would _have_ to rely on
the new TLP, but I hope that once we have a release that can be evaluated
there'd be enough benefit to strongly encourage it.

This has mostly been focused on scope and community issues, and I'd love to
talk through any feedback on that. Additionally, are there any other points
folks want to make sure are covered before we have a resolution?

On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>
>
>
> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Hi Folks!
>>
>> After working on test-patch with other folks for the last few months, I
>> think we've reached the point where we can make the fastest progress
>> towards the goal of a general use pre-commit patch tester by spinning
>> things into a project focused on just that. I think we have a mature enough
>> code base and a sufficient fledgling community, so I'm going to put
>> together a tlp proposal.
>>
>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>> continue to make things more useful.
>>
>> -Sean
>>
>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> HBase's dev-support folder is where the scripts and support files live.
>>> We've only recently started adding anything to the maven builds that's
>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
>>> add in more if we ran into the same permissions problems y'all are having.
>>>
>>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
>>> we don't properly back this up anywhere, we just notify each other of
>>> changes on a particular mail thread[3].
>>>
>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
>>> read because I just finished fixing "mvn site" running out of permgen)
>>> [3]: http://s.apache.org/NT0
>>>
>>>
>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com
>>> > wrote:
>>>
>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>> HBase
>>>> repo?  Is there any additional context we need to be aware of?
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>
>>>> >+dev@hbase
>>>> >
>>>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>>>> >them
>>>> >more robust. From what I can tell our stuff started off as an earlier
>>>> >version of what Hadoop uses for testing.
>>>> >
>>>> >Folks on either side open to an experiment of combining our precommit
>>>> >check
>>>> >tooling? In principle we should be looking for the same kinds of
>>>> things.
>>>> >
>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>> >resource needs and we'd need to figure out where stuff eventually
>>>> lives,
>>>> >but that could come later.
>>>> >
>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>> cnauroth@hortonworks.com>
>>>> >wrote:
>>>> >
>>>> >> The only thing I'm aware of is the failOnError option:
>>>> >>
>>>> >>
>>>> >>
>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>> >>rs
>>>> >> .html
>>>> >>
>>>> >>
>>>> >> I prefer that we don't disable this, because ignoring different
>>>> kinds of
>>>> >> failures could leave our build directories in an indeterminate state.
>>>> >>For
>>>> >> example, we could end up with an old class file on the classpath for
>>>> >>test
>>>> >> runs that was supposedly deleted.
>>>> >>
>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>> failure
>>>> >> by placing a file where the code expects to see a directory.  That
>>>> might
>>>> >> even let us enable some of these tests that are skipped on Windows,
>>>> >> because Windows allows access for the owner even after permissions
>>>> have
>>>> >> been stripped.
>>>> >>
>>>> >> Chris Nauroth
>>>> >> Hortonworks
>>>> >> http://hortonworks.com/
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>> >>
>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>> >> >directories that have no executable permissions on them?  Clearly we
>>>> >> >have the permission to do this from a technical point of view (since
>>>> >> >we created the directories as the jenkins user), it's simply that
>>>> the
>>>> >> >code refuses to do it.
>>>> >> >
>>>> >> >Otherwise I guess we can just fix those tests...
>>>> >> >
>>>> >> >Colin
>>>> >> >
>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>> >> >>
>>>> >> >> In HDFS-7722:
>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>> >> >>TearDown().
>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>> >> >>
>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>> passed.
>>>> >> >>
>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>> >> >>
>>>> >> >> private static void checkDirAccess(File dir) throws
>>>> >>DiskErrorException {
>>>> >> >>   if (!dir.isDirectory()) {
>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>> >> >>                                  + dir.toString());
>>>> >> >>   }
>>>> >> >>
>>>> >> >>   checkAccessByFileMethods(dir);
>>>> >> >> }
>>>> >> >>
>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>> >>regular
>>>> >> >> file to stimulate disk failures.
>>>> >> >>
>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>> >> >><cn...@hortonworks.com> wrote:
>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>> >>permissions
>>>> >> >>>from
>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>> failures
>>>> >>at
>>>> >> >>>data
>>>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>>>> >>to be
>>>> >> >>> doing the necessary work to restore executable permissions at the
>>>> >>end
>>>> >> >>>of
>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>> >> >>>changes
>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>>>> >> >>>though.  I
>>>> >> >>> don¹t know if there are other uncommitted patches that changed
>>>> these
>>>> >> >>>test
>>>> >> >>> suites.
>>>> >> >>>
>>>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>>> >>died
>>>> >> >>> after removing executable permissions but before restoring them.
>>>> >>That
>>>> >> >>> always would have been a weakness of these test suites,
>>>> regardless
>>>> >>of
>>>> >> >>>any
>>>> >> >>> recent changes.
>>>> >> >>>
>>>> >> >>> Chris Nauroth
>>>> >> >>> Hortonworks
>>>> >> >>> http://hortonworks.com/
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>> >> >>>
>>>> >> >>>>Hey Colin,
>>>> >> >>>>
>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
>>>> on
>>>> >>with
>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>> being
>>>> >> >>>>set in
>>>> >> >>>>those directories by our unit tests which are precluding those
>>>> files
>>>> >> >>>>from
>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>>> >>should
>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>> question
>>>> >>to
>>>> >> >>>>properly clean up after itself.
>>>> >> >>>>
>>>> >> >>>>To help narrow down which commit it was that started this, Andrew
>>>> >>sent
>>>> >> >>>>me
>>>> >> >>>>this info:
>>>> >> >>>>
>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>> >>
>>>>
>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> >>>>>>/
>>>> >> >>>>has
>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>> since
>>>> >>9:32
>>>> >> >>>>UTC
>>>> >> >>>>on March 5th."
>>>> >> >>>>
>>>> >> >>>>--
>>>> >> >>>>Aaron T. Myers
>>>> >> >>>>Software Engineer, Cloudera
>>>> >> >>>>
>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>> >><cm...@apache.org>
>>>> >> >>>>wrote:
>>>> >> >>>>
>>>> >> >>>>> Hi all,
>>>> >> >>>>>
>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>>>> any
>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>> them
>>>> >> >>>>>seem
>>>> >> >>>>> to be failing with some variant of this message:
>>>> >> >>>>>
>>>> >> >>>>> [ERROR] Failed to execute goal
>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>> >>(default-clean)
>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>> delete
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >>
>>>>
>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>> >>>>>>>fs
>>>> >> >>>>>-pr
>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> >> >>>>> -> [Help 1]
>>>> >> >>>>>
>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> >> >>>>> permissions?
>>>> >> >>>>>
>>>> >> >>>>> Colin
>>>> >> >>>>>
>>>> >> >>>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Lei (Eddy) Xu
>>>> >> >> Software Engineer, Cloudera
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >--
>>>> >Sean
>>>>
>>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean