You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@cloudera.com> on 2015/03/11 22:44:10 UTC

Re: upstream jenkins build broken?

+dev@hbase

HBase has recently been cleaning up our precommit jenkins jobs to make them
more robust. From what I can tell our stuff started off as an earlier
version of what Hadoop uses for testing.

Folks on either side open to an experiment of combining our precommit check
tooling? In principle we should be looking for the same kinds of things.

Naturally we'll still need different jenkins jobs to handle different
resource needs and we'd need to figure out where stuff eventually lives,
but that could come later.

On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
> >Is there a maven plugin or setting we can use to simply remove
> >directories that have no executable permissions on them?  Clearly we
> >have the permission to do this from a technical point of view (since
> >we created the directories as the jenkins user), it's simply that the
> >code refuses to do it.
> >
> >Otherwise I guess we can just fix those tests...
> >
> >Colin
> >
> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>
> >> In HDFS-7722:
> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>TearDown().
> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>
> >> Also I ran mvn test several times on my machine and all tests passed.
> >>
> >> However, since in DiskChecker#checkDirAccess():
> >>
> >> private static void checkDirAccess(File dir) throws DiskErrorException {
> >>   if (!dir.isDirectory()) {
> >>     throw new DiskErrorException("Not a directory: "
> >>                                  + dir.toString());
> >>   }
> >>
> >>   checkAccessByFileMethods(dir);
> >> }
> >>
> >> One potentially safer alternative is replacing data dir with a regular
> >> file to stimulate disk failures.
> >>
> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >><cn...@hortonworks.com> wrote:
> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> TestDataNodeVolumeFailureReporting, and
> >>> TestDataNodeVolumeFailureToleration all remove executable permissions
> >>>from
> >>> directories like the one Colin mentioned to simulate disk failures at
> >>>data
> >>> nodes.  I reviewed the code for all of those, and they all appear to be
> >>> doing the necessary work to restore executable permissions at the end
> >>>of
> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >>>changes
> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>though.  I
> >>> don¹t know if there are other uncommitted patches that changed these
> >>>test
> >>> suites.
> >>>
> >>> I suppose it¹s also possible that the JUnit process unexpectedly died
> >>> after removing executable permissions but before restoring them.  That
> >>> always would have been a weakness of these test suites, regardless of
> >>>any
> >>> recent changes.
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>
> >>>>Hey Colin,
> >>>>
> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>>these boxes. He took a look and concluded that some perms are being
> >>>>set in
> >>>>those directories by our unit tests which are precluding those files
> >>>>from
> >>>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>>expect this to keep happening until we can fix the test in question to
> >>>>properly clean up after itself.
> >>>>
> >>>>To help narrow down which commit it was that started this, Andrew sent
> >>>>me
> >>>>this info:
> >>>>
> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >>>>has
> >>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>>UTC
> >>>>on March 5th."
> >>>>
> >>>>--
> >>>>Aaron T. Myers
> >>>>Software Engineer, Cloudera
> >>>>
> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>>wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >>>>>seem
> >>>>> to be failing with some variant of this message:
> >>>>>
> >>>>> [ERROR] Failed to execute goal
> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>>
> >>>>>
> >>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
> >>>>>-pr
> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> -> [Help 1]
> >>>>>
> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>>> permissions?
> >>>>>
> >>>>> Colin
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Lei (Eddy) Xu
> >> Software Engineer, Cloudera
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
HBase's dev-support folder is where the scripts and support files live.
We've only recently started adding anything to the maven builds that's
specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
add in more if we ran into the same permissions problems y'all are having.

There's also our precommit job itself, though it isn't large[2]. AFAIK, we
don't properly back this up anywhere, we just notify each other of changes
on a particular mail thread[3].

[1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
[2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all read
because I just finished fixing "mvn site" running out of permgen)
[3]: http://s.apache.org/NT0


On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
> repo?  Is there any additional context we need to be aware of?
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>
> >+dev@hbase
> >
> >HBase has recently been cleaning up our precommit jenkins jobs to make
> >them
> >more robust. From what I can tell our stuff started off as an earlier
> >version of what Hadoop uses for testing.
> >
> >Folks on either side open to an experiment of combining our precommit
> >check
> >tooling? In principle we should be looking for the same kinds of things.
> >
> >Naturally we'll still need different jenkins jobs to handle different
> >resource needs and we'd need to figure out where stuff eventually lives,
> >but that could come later.
> >
> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
> >wrote:
> >
> >> The only thing I'm aware of is the failOnError option:
> >>
> >>
> >>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>rs
> >> .html
> >>
> >>
> >> I prefer that we don't disable this, because ignoring different kinds of
> >> failures could leave our build directories in an indeterminate state.
> >>For
> >> example, we could end up with an old class file on the classpath for
> >>test
> >> runs that was supposedly deleted.
> >>
> >> I think it's worth exploring Eddy's suggestion to try simulating failure
> >> by placing a file where the code expects to see a directory.  That might
> >> even let us enable some of these tests that are skipped on Windows,
> >> because Windows allows access for the owner even after permissions have
> >> been stripped.
> >>
> >> Chris Nauroth
> >> Hortonworks
> >> http://hortonworks.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
> >>
> >> >Is there a maven plugin or setting we can use to simply remove
> >> >directories that have no executable permissions on them?  Clearly we
> >> >have the permission to do this from a technical point of view (since
> >> >we created the directories as the jenkins user), it's simply that the
> >> >code refuses to do it.
> >> >
> >> >Otherwise I guess we can just fix those tests...
> >> >
> >> >Colin
> >> >
> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >> >>
> >> >> In HDFS-7722:
> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >> >>TearDown().
> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >> >>
> >> >> Also I ran mvn test several times on my machine and all tests passed.
> >> >>
> >> >> However, since in DiskChecker#checkDirAccess():
> >> >>
> >> >> private static void checkDirAccess(File dir) throws
> >>DiskErrorException {
> >> >>   if (!dir.isDirectory()) {
> >> >>     throw new DiskErrorException("Not a directory: "
> >> >>                                  + dir.toString());
> >> >>   }
> >> >>
> >> >>   checkAccessByFileMethods(dir);
> >> >> }
> >> >>
> >> >> One potentially safer alternative is replacing data dir with a
> >>regular
> >> >> file to stimulate disk failures.
> >> >>
> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >> >><cn...@hortonworks.com> wrote:
> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> >>> TestDataNodeVolumeFailureReporting, and
> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>permissions
> >> >>>from
> >> >>> directories like the one Colin mentioned to simulate disk failures
> >>at
> >> >>>data
> >> >>> nodes.  I reviewed the code for all of those, and they all appear
> >>to be
> >> >>> doing the necessary work to restore executable permissions at the
> >>end
> >> >>>of
> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >> >>>changes
> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >> >>>though.  I
> >> >>> don¹t know if there are other uncommitted patches that changed these
> >> >>>test
> >> >>> suites.
> >> >>>
> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>died
> >> >>> after removing executable permissions but before restoring them.
> >>That
> >> >>> always would have been a weakness of these test suites, regardless
> >>of
> >> >>>any
> >> >>> recent changes.
> >> >>>
> >> >>> Chris Nauroth
> >> >>> Hortonworks
> >> >>> http://hortonworks.com/
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >> >>>
> >> >>>>Hey Colin,
> >> >>>>
> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
> >>with
> >> >>>>these boxes. He took a look and concluded that some perms are being
> >> >>>>set in
> >> >>>>those directories by our unit tests which are precluding those files
> >> >>>>from
> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
> >>should
> >> >>>>expect this to keep happening until we can fix the test in question
> >>to
> >> >>>>properly clean up after itself.
> >> >>>>
> >> >>>>To help narrow down which commit it was that started this, Andrew
> >>sent
> >> >>>>me
> >> >>>>this info:
> >> >>>>
> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>/
> >> >>>>has
> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
> >>9:32
> >> >>>>UTC
> >> >>>>on March 5th."
> >> >>>>
> >> >>>>--
> >> >>>>Aaron T. Myers
> >> >>>>Software Engineer, Cloudera
> >> >>>>
> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >><cm...@apache.org>
> >> >>>>wrote:
> >> >>>>
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >> >>>>>seem
> >> >>>>> to be failing with some variant of this message:
> >> >>>>>
> >> >>>>> [ERROR] Failed to execute goal
> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>(default-clean)
> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >> >>>>>
> >> >>>>>
> >>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>fs
> >> >>>>>-pr
> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >> >>>>> -> [Help 1]
> >> >>>>>
> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >> >>>>> permissions?
> >> >>>>>
> >> >>>>> Colin
> >> >>>>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lei (Eddy) Xu
> >> >> Software Engineer, Cloudera
> >>
> >>
> >
> >
> >--
> >Sean
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean


Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean


Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean