You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Colin P. McCabe" <cm...@apache.org> on 2015/03/10 21:24:00 UTC

upstream jenkins build broken?

Hi all,

A very quick (and not thorough) survey shows that I can't find any
jenkins jobs that succeeded from the last 24 hours.  Most of them seem
to be failing with some variant of this message:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
on project hadoop-hdfs: Failed to clean project: Failed to delete
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
-> [Help 1]

Any ideas how this happened?  Bad disk, unit test setting wrong permissions?

Colin

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cn...@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this, Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cm...@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
HBase's dev-support folder is where the scripts and support files live.
We've only recently started adding anything to the maven builds that's
specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
add in more if we ran into the same permissions problems y'all are having.

There's also our precommit job itself, though it isn't large[2]. AFAIK, we
don't properly back this up anywhere, we just notify each other of changes
on a particular mail thread[3].

[1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
[2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all read
because I just finished fixing "mvn site" running out of permgen)
[3]: http://s.apache.org/NT0


On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
> repo?  Is there any additional context we need to be aware of?
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>
> >+dev@hbase
> >
> >HBase has recently been cleaning up our precommit jenkins jobs to make
> >them
> >more robust. From what I can tell our stuff started off as an earlier
> >version of what Hadoop uses for testing.
> >
> >Folks on either side open to an experiment of combining our precommit
> >check
> >tooling? In principle we should be looking for the same kinds of things.
> >
> >Naturally we'll still need different jenkins jobs to handle different
> >resource needs and we'd need to figure out where stuff eventually lives,
> >but that could come later.
> >
> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
> >wrote:
> >
> >> The only thing I'm aware of is the failOnError option:
> >>
> >>
> >>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>rs
> >> .html
> >>
> >>
> >> I prefer that we don't disable this, because ignoring different kinds of
> >> failures could leave our build directories in an indeterminate state.
> >>For
> >> example, we could end up with an old class file on the classpath for
> >>test
> >> runs that was supposedly deleted.
> >>
> >> I think it's worth exploring Eddy's suggestion to try simulating failure
> >> by placing a file where the code expects to see a directory.  That might
> >> even let us enable some of these tests that are skipped on Windows,
> >> because Windows allows access for the owner even after permissions have
> >> been stripped.
> >>
> >> Chris Nauroth
> >> Hortonworks
> >> http://hortonworks.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
> >>
> >> >Is there a maven plugin or setting we can use to simply remove
> >> >directories that have no executable permissions on them?  Clearly we
> >> >have the permission to do this from a technical point of view (since
> >> >we created the directories as the jenkins user), it's simply that the
> >> >code refuses to do it.
> >> >
> >> >Otherwise I guess we can just fix those tests...
> >> >
> >> >Colin
> >> >
> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >> >>
> >> >> In HDFS-7722:
> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >> >>TearDown().
> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >> >>
> >> >> Also I ran mvn test several times on my machine and all tests passed.
> >> >>
> >> >> However, since in DiskChecker#checkDirAccess():
> >> >>
> >> >> private static void checkDirAccess(File dir) throws
> >>DiskErrorException {
> >> >>   if (!dir.isDirectory()) {
> >> >>     throw new DiskErrorException("Not a directory: "
> >> >>                                  + dir.toString());
> >> >>   }
> >> >>
> >> >>   checkAccessByFileMethods(dir);
> >> >> }
> >> >>
> >> >> One potentially safer alternative is replacing data dir with a
> >>regular
> >> >> file to stimulate disk failures.
> >> >>
> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >> >><cn...@hortonworks.com> wrote:
> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> >>> TestDataNodeVolumeFailureReporting, and
> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>permissions
> >> >>>from
> >> >>> directories like the one Colin mentioned to simulate disk failures
> >>at
> >> >>>data
> >> >>> nodes.  I reviewed the code for all of those, and they all appear
> >>to be
> >> >>> doing the necessary work to restore executable permissions at the
> >>end
> >> >>>of
> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >> >>>changes
> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >> >>>though.  I
> >> >>> don¹t know if there are other uncommitted patches that changed these
> >> >>>test
> >> >>> suites.
> >> >>>
> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>died
> >> >>> after removing executable permissions but before restoring them.
> >>That
> >> >>> always would have been a weakness of these test suites, regardless
> >>of
> >> >>>any
> >> >>> recent changes.
> >> >>>
> >> >>> Chris Nauroth
> >> >>> Hortonworks
> >> >>> http://hortonworks.com/
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >> >>>
> >> >>>>Hey Colin,
> >> >>>>
> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
> >>with
> >> >>>>these boxes. He took a look and concluded that some perms are being
> >> >>>>set in
> >> >>>>those directories by our unit tests which are precluding those files
> >> >>>>from
> >> >>>>getting deleted. He's going to clean up the boxes for us, but we
> >>should
> >> >>>>expect this to keep happening until we can fix the test in question
> >>to
> >> >>>>properly clean up after itself.
> >> >>>>
> >> >>>>To help narrow down which commit it was that started this, Andrew
> >>sent
> >> >>>>me
> >> >>>>this info:
> >> >>>>
> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>/
> >> >>>>has
> >> >>>>500 perms, so I'm guessing that's the problem. Been that way since
> >>9:32
> >> >>>>UTC
> >> >>>>on March 5th."
> >> >>>>
> >> >>>>--
> >> >>>>Aaron T. Myers
> >> >>>>Software Engineer, Cloudera
> >> >>>>
> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >><cm...@apache.org>
> >> >>>>wrote:
> >> >>>>
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >> >>>>>seem
> >> >>>>> to be failing with some variant of this message:
> >> >>>>>
> >> >>>>> [ERROR] Failed to execute goal
> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>(default-clean)
> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >> >>>>>
> >> >>>>>
> >>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>fs
> >> >>>>>-pr
> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >> >>>>> -> [Help 1]
> >> >>>>>
> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >> >>>>> permissions?
> >> >>>>>
> >> >>>>> Colin
> >> >>>>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lei (Eddy) Xu
> >> >> Software Engineer, Cloudera
> >>
> >>
> >
> >
> >--
> >Sean
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean


Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean


Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cn...@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cm...@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean


Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
+dev@hbase

HBase has recently been cleaning up our precommit jenkins jobs to make them
more robust. From what I can tell our stuff started off as an earlier
version of what Hadoop uses for testing.

Folks on either side open to an experiment of combining our precommit check
tooling? In principle we should be looking for the same kinds of things.

Naturally we'll still need different jenkins jobs to handle different
resource needs and we'd need to figure out where stuff eventually lives,
but that could come later.

On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
> >Is there a maven plugin or setting we can use to simply remove
> >directories that have no executable permissions on them?  Clearly we
> >have the permission to do this from a technical point of view (since
> >we created the directories as the jenkins user), it's simply that the
> >code refuses to do it.
> >
> >Otherwise I guess we can just fix those tests...
> >
> >Colin
> >
> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>
> >> In HDFS-7722:
> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>TearDown().
> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>
> >> Also I ran mvn test several times on my machine and all tests passed.
> >>
> >> However, since in DiskChecker#checkDirAccess():
> >>
> >> private static void checkDirAccess(File dir) throws DiskErrorException {
> >>   if (!dir.isDirectory()) {
> >>     throw new DiskErrorException("Not a directory: "
> >>                                  + dir.toString());
> >>   }
> >>
> >>   checkAccessByFileMethods(dir);
> >> }
> >>
> >> One potentially safer alternative is replacing data dir with a regular
> >> file to stimulate disk failures.
> >>
> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >><cn...@hortonworks.com> wrote:
> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> TestDataNodeVolumeFailureReporting, and
> >>> TestDataNodeVolumeFailureToleration all remove executable permissions
> >>>from
> >>> directories like the one Colin mentioned to simulate disk failures at
> >>>data
> >>> nodes.  I reviewed the code for all of those, and they all appear to be
> >>> doing the necessary work to restore executable permissions at the end
> >>>of
> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >>>changes
> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>though.  I
> >>> don¹t know if there are other uncommitted patches that changed these
> >>>test
> >>> suites.
> >>>
> >>> I suppose it¹s also possible that the JUnit process unexpectedly died
> >>> after removing executable permissions but before restoring them.  That
> >>> always would have been a weakness of these test suites, regardless of
> >>>any
> >>> recent changes.
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>
> >>>>Hey Colin,
> >>>>
> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>>these boxes. He took a look and concluded that some perms are being
> >>>>set in
> >>>>those directories by our unit tests which are precluding those files
> >>>>from
> >>>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>>expect this to keep happening until we can fix the test in question to
> >>>>properly clean up after itself.
> >>>>
> >>>>To help narrow down which commit it was that started this, Andrew sent
> >>>>me
> >>>>this info:
> >>>>
> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >>>>has
> >>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>>UTC
> >>>>on March 5th."
> >>>>
> >>>>--
> >>>>Aaron T. Myers
> >>>>Software Engineer, Cloudera
> >>>>
> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>>wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >>>>>seem
> >>>>> to be failing with some variant of this message:
> >>>>>
> >>>>> [ERROR] Failed to execute goal
> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>>
> >>>>>
> >>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
> >>>>>-pr
> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> -> [Help 1]
> >>>>>
> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>>> permissions?
> >>>>>
> >>>>> Colin
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Lei (Eddy) Xu
> >> Software Engineer, Cloudera
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
+dev@hbase

HBase has recently been cleaning up our precommit jenkins jobs to make them
more robust. From what I can tell our stuff started off as an earlier
version of what Hadoop uses for testing.

Folks on either side open to an experiment of combining our precommit check
tooling? In principle we should be looking for the same kinds of things.

Naturally we'll still need different jenkins jobs to handle different
resource needs and we'd need to figure out where stuff eventually lives,
but that could come later.

On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
> >Is there a maven plugin or setting we can use to simply remove
> >directories that have no executable permissions on them?  Clearly we
> >have the permission to do this from a technical point of view (since
> >we created the directories as the jenkins user), it's simply that the
> >code refuses to do it.
> >
> >Otherwise I guess we can just fix those tests...
> >
> >Colin
> >
> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>
> >> In HDFS-7722:
> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>TearDown().
> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>
> >> Also I ran mvn test several times on my machine and all tests passed.
> >>
> >> However, since in DiskChecker#checkDirAccess():
> >>
> >> private static void checkDirAccess(File dir) throws DiskErrorException {
> >>   if (!dir.isDirectory()) {
> >>     throw new DiskErrorException("Not a directory: "
> >>                                  + dir.toString());
> >>   }
> >>
> >>   checkAccessByFileMethods(dir);
> >> }
> >>
> >> One potentially safer alternative is replacing data dir with a regular
> >> file to stimulate disk failures.
> >>
> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >><cn...@hortonworks.com> wrote:
> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> TestDataNodeVolumeFailureReporting, and
> >>> TestDataNodeVolumeFailureToleration all remove executable permissions
> >>>from
> >>> directories like the one Colin mentioned to simulate disk failures at
> >>>data
> >>> nodes.  I reviewed the code for all of those, and they all appear to be
> >>> doing the necessary work to restore executable permissions at the end
> >>>of
> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >>>changes
> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>though.  I
> >>> don¹t know if there are other uncommitted patches that changed these
> >>>test
> >>> suites.
> >>>
> >>> I suppose it¹s also possible that the JUnit process unexpectedly died
> >>> after removing executable permissions but before restoring them.  That
> >>> always would have been a weakness of these test suites, regardless of
> >>>any
> >>> recent changes.
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>
> >>>>Hey Colin,
> >>>>
> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>>these boxes. He took a look and concluded that some perms are being
> >>>>set in
> >>>>those directories by our unit tests which are precluding those files
> >>>>from
> >>>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>>expect this to keep happening until we can fix the test in question to
> >>>>properly clean up after itself.
> >>>>
> >>>>To help narrow down which commit it was that started this, Andrew sent
> >>>>me
> >>>>this info:
> >>>>
> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >>>>has
> >>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>>UTC
> >>>>on March 5th."
> >>>>
> >>>>--
> >>>>Aaron T. Myers
> >>>>Software Engineer, Cloudera
> >>>>
> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>>wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >>>>>seem
> >>>>> to be failing with some variant of this message:
> >>>>>
> >>>>> [ERROR] Failed to execute goal
> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>>
> >>>>>
> >>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
> >>>>>-pr
> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> -> [Help 1]
> >>>>>
> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>>> permissions?
> >>>>>
> >>>>> Colin
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Lei (Eddy) Xu
> >> Software Engineer, Cloudera
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Wed, Mar 11, 2015 at 2:34 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.

+1.  JIRA?

Colin

>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
>>Is there a maven plugin or setting we can use to simply remove
>>directories that have no executable permissions on them?  Clearly we
>>have the permission to do this from a technical point of view (since
>>we created the directories as the jenkins user), it's simply that the
>>code refuses to do it.
>>
>>Otherwise I guess we can just fix those tests...
>>
>>Colin
>>
>>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>
>>> In HDFS-7722:
>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>TearDown().
>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>
>>> Also I ran mvn test several times on my machine and all tests passed.
>>>
>>> However, since in DiskChecker#checkDirAccess():
>>>
>>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>>   if (!dir.isDirectory()) {
>>>     throw new DiskErrorException("Not a directory: "
>>>                                  + dir.toString());
>>>   }
>>>
>>>   checkAccessByFileMethods(dir);
>>> }
>>>
>>> One potentially safer alternative is replacing data dir with a regular
>>> file to stimulate disk failures.
>>>
>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>><cn...@hortonworks.com> wrote:
>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> TestDataNodeVolumeFailureReporting, and
>>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>from
>>>> directories like the one Colin mentioned to simulate disk failures at
>>>>data
>>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>>> doing the necessary work to restore executable permissions at the end
>>>>of
>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>changes
>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>though.  I
>>>> don¹t know if there are other uncommitted patches that changed these
>>>>test
>>>> suites.
>>>>
>>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> after removing executable permissions but before restoring them.  That
>>>> always would have been a weakness of these test suites, regardless of
>>>>any
>>>> recent changes.
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>
>>>>>Hey Colin,
>>>>>
>>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>>these boxes. He took a look and concluded that some perms are being
>>>>>set in
>>>>>those directories by our unit tests which are precluding those files
>>>>>from
>>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>>expect this to keep happening until we can fix the test in question to
>>>>>properly clean up after itself.
>>>>>
>>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>>me
>>>>>this info:
>>>>>
>>>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>>has
>>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>>UTC
>>>>>on March 5th."
>>>>>
>>>>>--
>>>>>Aaron T. Myers
>>>>>Software Engineer, Cloudera
>>>>>
>>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>>>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>>seem
>>>>>> to be failing with some variant of this message:
>>>>>>
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>>
>>>>>>
>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
>>>>>>-pr
>>>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> -> [Help 1]
>>>>>>
>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>> permissions?
>>>>>>
>>>>>> Colin
>>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lei (Eddy) Xu
>>> Software Engineer, Cloudera
>

Re: upstream jenkins build broken?

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Wed, Mar 11, 2015 at 2:34 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.

+1.  JIRA?

Colin

>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
>>Is there a maven plugin or setting we can use to simply remove
>>directories that have no executable permissions on them?  Clearly we
>>have the permission to do this from a technical point of view (since
>>we created the directories as the jenkins user), it's simply that the
>>code refuses to do it.
>>
>>Otherwise I guess we can just fix those tests...
>>
>>Colin
>>
>>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>
>>> In HDFS-7722:
>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>TearDown().
>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>
>>> Also I ran mvn test several times on my machine and all tests passed.
>>>
>>> However, since in DiskChecker#checkDirAccess():
>>>
>>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>>   if (!dir.isDirectory()) {
>>>     throw new DiskErrorException("Not a directory: "
>>>                                  + dir.toString());
>>>   }
>>>
>>>   checkAccessByFileMethods(dir);
>>> }
>>>
>>> One potentially safer alternative is replacing data dir with a regular
>>> file to stimulate disk failures.
>>>
>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>><cn...@hortonworks.com> wrote:
>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> TestDataNodeVolumeFailureReporting, and
>>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>from
>>>> directories like the one Colin mentioned to simulate disk failures at
>>>>data
>>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>>> doing the necessary work to restore executable permissions at the end
>>>>of
>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>changes
>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>though.  I
>>>> don¹t know if there are other uncommitted patches that changed these
>>>>test
>>>> suites.
>>>>
>>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> after removing executable permissions but before restoring them.  That
>>>> always would have been a weakness of these test suites, regardless of
>>>>any
>>>> recent changes.
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>
>>>>>Hey Colin,
>>>>>
>>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>>these boxes. He took a look and concluded that some perms are being
>>>>>set in
>>>>>those directories by our unit tests which are precluding those files
>>>>>from
>>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>>expect this to keep happening until we can fix the test in question to
>>>>>properly clean up after itself.
>>>>>
>>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>>me
>>>>>this info:
>>>>>
>>>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>>has
>>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>>UTC
>>>>>on March 5th."
>>>>>
>>>>>--
>>>>>Aaron T. Myers
>>>>>Software Engineer, Cloudera
>>>>>
>>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>>>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>>seem
>>>>>> to be failing with some variant of this message:
>>>>>>
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>>
>>>>>>
>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
>>>>>>-pr
>>>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> -> [Help 1]
>>>>>>
>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>> permissions?
>>>>>>
>>>>>> Colin
>>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lei (Eddy) Xu
>>> Software Engineer, Cloudera
>

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
+dev@hbase

HBase has recently been cleaning up our precommit jenkins jobs to make them
more robust. From what I can tell our stuff started off as an earlier
version of what Hadoop uses for testing.

Folks on either side open to an experiment of combining our precommit check
tooling? In principle we should be looking for the same kinds of things.

Naturally we'll still need different jenkins jobs to handle different
resource needs and we'd need to figure out where stuff eventually lives,
but that could come later.

On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
> >Is there a maven plugin or setting we can use to simply remove
> >directories that have no executable permissions on them?  Clearly we
> >have the permission to do this from a technical point of view (since
> >we created the directories as the jenkins user), it's simply that the
> >code refuses to do it.
> >
> >Otherwise I guess we can just fix those tests...
> >
> >Colin
> >
> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>
> >> In HDFS-7722:
> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>TearDown().
> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>
> >> Also I ran mvn test several times on my machine and all tests passed.
> >>
> >> However, since in DiskChecker#checkDirAccess():
> >>
> >> private static void checkDirAccess(File dir) throws DiskErrorException {
> >>   if (!dir.isDirectory()) {
> >>     throw new DiskErrorException("Not a directory: "
> >>                                  + dir.toString());
> >>   }
> >>
> >>   checkAccessByFileMethods(dir);
> >> }
> >>
> >> One potentially safer alternative is replacing data dir with a regular
> >> file to stimulate disk failures.
> >>
> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >><cn...@hortonworks.com> wrote:
> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> TestDataNodeVolumeFailureReporting, and
> >>> TestDataNodeVolumeFailureToleration all remove executable permissions
> >>>from
> >>> directories like the one Colin mentioned to simulate disk failures at
> >>>data
> >>> nodes.  I reviewed the code for all of those, and they all appear to be
> >>> doing the necessary work to restore executable permissions at the end
> >>>of
> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >>>changes
> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>though.  I
> >>> don¹t know if there are other uncommitted patches that changed these
> >>>test
> >>> suites.
> >>>
> >>> I suppose it¹s also possible that the JUnit process unexpectedly died
> >>> after removing executable permissions but before restoring them.  That
> >>> always would have been a weakness of these test suites, regardless of
> >>>any
> >>> recent changes.
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>>
> >>>>Hey Colin,
> >>>>
> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>>these boxes. He took a look and concluded that some perms are being
> >>>>set in
> >>>>those directories by our unit tests which are precluding those files
> >>>>from
> >>>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>>expect this to keep happening until we can fix the test in question to
> >>>>properly clean up after itself.
> >>>>
> >>>>To help narrow down which commit it was that started this, Andrew sent
> >>>>me
> >>>>this info:
> >>>>
> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >>>>has
> >>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>>UTC
> >>>>on March 5th."
> >>>>
> >>>>--
> >>>>Aaron T. Myers
> >>>>Software Engineer, Cloudera
> >>>>
> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>>wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> A very quick (and not thorough) survey shows that I can't find any
> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> >>>>>seem
> >>>>> to be failing with some variant of this message:
> >>>>>
> >>>>> [ERROR] Failed to execute goal
> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>>
> >>>>>
> >>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
> >>>>>-pr
> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>> -> [Help 1]
> >>>>>
> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>>> permissions?
> >>>>>
> >>>>> Colin
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Lei (Eddy) Xu
> >> Software Engineer, Cloudera
>
>


-- 
Sean

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
The only thing I'm aware of is the failOnError option:

http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
.html


I prefer that we don't disable this, because ignoring different kinds of
failures could leave our build directories in an indeterminate state.  For
example, we could end up with an old class file on the classpath for test
runs that was supposedly deleted.

I think it's worth exploring Eddy's suggestion to try simulating failure
by placing a file where the code expects to see a directory.  That might
even let us enable some of these tests that are skipped on Windows,
because Windows allows access for the owner even after permissions have
been stripped.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:

>Is there a maven plugin or setting we can use to simply remove
>directories that have no executable permissions on them?  Clearly we
>have the permission to do this from a technical point of view (since
>we created the directories as the jenkins user), it's simply that the
>code refuses to do it.
>
>Otherwise I guess we can just fix those tests...
>
>Colin
>
>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> Thanks a lot for looking into HDFS-7722, Chris.
>>
>> In HDFS-7722:
>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>TearDown().
>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>
>> Also I ran mvn test several times on my machine and all tests passed.
>>
>> However, since in DiskChecker#checkDirAccess():
>>
>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>   if (!dir.isDirectory()) {
>>     throw new DiskErrorException("Not a directory: "
>>                                  + dir.toString());
>>   }
>>
>>   checkAccessByFileMethods(dir);
>> }
>>
>> One potentially safer alternative is replacing data dir with a regular
>> file to stimulate disk failures.
>>
>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>><cn...@hortonworks.com> wrote:
>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> TestDataNodeVolumeFailureReporting, and
>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>from
>>> directories like the one Colin mentioned to simulate disk failures at
>>>data
>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>> doing the necessary work to restore executable permissions at the end
>>>of
>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>changes
>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>though.  I
>>> don¹t know if there are other uncommitted patches that changed these
>>>test
>>> suites.
>>>
>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>> after removing executable permissions but before restoring them.  That
>>> always would have been a weakness of these test suites, regardless of
>>>any
>>> recent changes.
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>
>>>>Hey Colin,
>>>>
>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>these boxes. He took a look and concluded that some perms are being
>>>>set in
>>>>those directories by our unit tests which are precluding those files
>>>>from
>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>expect this to keep happening until we can fix the test in question to
>>>>properly clean up after itself.
>>>>
>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>me
>>>>this info:
>>>>
>>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>has
>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>UTC
>>>>on March 5th."
>>>>
>>>>--
>>>>Aaron T. Myers
>>>>Software Engineer, Cloudera
>>>>
>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>seem
>>>>> to be failing with some variant of this message:
>>>>>
>>>>> [ERROR] Failed to execute goal
>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>
>>>>>
>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
>>>>>-pr
>>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> -> [Help 1]
>>>>>
>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> permissions?
>>>>>
>>>>> Colin
>>>>>
>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera


Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Yeah I can do that now.

On Tue, Mar 17, 2015 at 2:53 AM, Vinayakumar B <vi...@apache.org>
wrote:

> Seems like all builds of Precommit-HDFS-Build failing with below problem.
>
> FATAL: Command "git clean -fdx" returned status code 1:
> stdout:
> stderr: hudson.plugins.git.GitException
> <
> http://stacktrace.jenkins-ci.org/search?query=hudson.plugins.git.GitException
> >:
> Command "git clean -fdx" returned status code 1:
> stdout:
> stderr:
>
>
>
> Can someone remove "git clean -fdx" from build configurations of
> Precommit-HDFS-Build ?
>
>
> Regards,
> Vinay
>
> On Tue, Mar 17, 2015 at 12:59 PM, Vinayakumar B <vi...@apache.org>
> wrote:
>
> > I have simulated the problem in my env and verified that, both 'git clean
> > -xdf' and 'mvn clean' will not remove the directory.
> > mvn fails where as git simply ignores (not even display any warning) the
> > problem.
> >
> >
> >
> > Regards,
> > Vinay
> >
> > On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> >> Can someone point me to an example build that is broken?
> >>
> >> On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>
> >> > I'm on it. HADOOP-11721
> >> >
> >> > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org>
> wrote:
> >> >
> >> >> +1 for git clean.
> >> >>
> >> >> Colin, can you please get it in ASAP? Currently due to the jenkins
> >> >> issues, we cannot close the 2.7 blockers.
> >> >>
> >> >> Thanks,
> >> >> Haohui
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <
> cmccabe@apache.org>
> >> >> wrote:
> >> >> > If all it takes is someone creating a test that makes a directory
> >> >> > without -x, this is going to happen over and over.
> >> >> >
> >> >> > Let's just fix the problem at the root by running "git clean -fqdx"
> >> in
> >> >> > our jenkins scripts.  If there's no objections I will add this in
> and
> >> >> > un-break the builds.
> >> >> >
> >> >> > best,
> >> >> > Colin
> >> >> >
> >> >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> >> >> >> I filed HDFS-7917 to change the way to simulate disk failures.
> >> >> >>
> >> >> >> But I think we still need infrastructure folks to help with
> jenkins
> >> >> >> scripts to clean the dirs left today.
> >> >> >>
> >> >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
> >> >> wrote:
> >> >> >>> Any updates on this issues? It seems that all HDFS jenkins builds
> >> are
> >> >> >>> still failing.
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> Haohui
> >> >> >>>
> >> >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> >> >> vinayakumarb@apache.org> wrote:
> >> >> >>>> I think the problem started from here.
> >> >> >>>>
> >> >> >>>>
> >> >>
> >>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> >> >> >>>>
> >> >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> >> >> permission.
> >> >> >>>> But in this patch, ReplicationMonitor got NPE and it got
> terminate
> >> >> signal,
> >> >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> >> >> >>>>
> >> >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> >> >> permission
> >> >> >>>> after shutting down cluster. So in this case IMO, permissions
> were
> >> >> never
> >> >> >>>> restored.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>   @After
> >> >> >>>>   public void tearDown() throws Exception {
> >> >> >>>>     if(data_fail != null) {
> >> >> >>>>       FileUtil.setWritable(data_fail, true);
> >> >> >>>>     }
> >> >> >>>>     if(failedDir != null) {
> >> >> >>>>       FileUtil.setWritable(failedDir, true);
> >> >> >>>>     }
> >> >> >>>>     if(cluster != null) {
> >> >> >>>>       cluster.shutdown();
> >> >> >>>>     }
> >> >> >>>>     for (int i = 0; i < 3; i++) {
> >> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
> >> >> true);
> >> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
> >> >> true);
> >> >> >>>>     }
> >> >> >>>>   }
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> Regards,
> >> >> >>>> Vinay
> >> >> >>>>
> >> >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> >> >> vinayakumarb@apache.org>
> >> >> >>>> wrote:
> >> >> >>>>
> >> >> >>>>> When I see the history of these kind of builds, All these are
> >> >> failed on
> >> >> >>>>> node H9.
> >> >> >>>>>
> >> >> >>>>> I think some or the other uncommitted patch would have created
> >> the
> >> >> problem
> >> >> >>>>> and left it there.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> Regards,
> >> >> >>>>> Vinay
> >> >> >>>>>
> >> >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
> >> busbey@cloudera.com>
> >> >> wrote:
> >> >> >>>>>
> >> >> >>>>>> You could rely on a destructive git clean call instead of
> maven
> >> to
> >> >> do the
> >> >> >>>>>> directory removal.
> >> >> >>>>>>
> >> >> >>>>>> --
> >> >> >>>>>> Sean
> >> >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <
> cmccabe@alumni.cmu.edu
> >> >
> >> >> wrote:
> >> >> >>>>>>
> >> >> >>>>>> > Is there a maven plugin or setting we can use to simply
> remove
> >> >> >>>>>> > directories that have no executable permissions on them?
> >> >> Clearly we
> >> >> >>>>>> > have the permission to do this from a technical point of
> view
> >> >> (since
> >> >> >>>>>> > we created the directories as the jenkins user), it's simply
> >> >> that the
> >> >> >>>>>> > code refuses to do it.
> >> >> >>>>>> >
> >> >> >>>>>> > Otherwise I guess we can just fix those tests...
> >> >> >>>>>> >
> >> >> >>>>>> > Colin
> >> >> >>>>>> >
> >> >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> >> >> wrote:
> >> >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> >> >> >>>>>> > >
> >> >> >>>>>> > > In HDFS-7722:
> >> >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
> >> permissions
> >> >> in
> >> >> >>>>>> > TearDown().
> >> >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> >> >> clause.
> >> >> >>>>>> > >
> >> >> >>>>>> > > Also I ran mvn test several times on my machine and all
> >> tests
> >> >> passed.
> >> >> >>>>>> > >
> >> >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> >> >> >>>>>> > >
> >> >> >>>>>> > > private static void checkDirAccess(File dir) throws
> >> >> >>>>>> DiskErrorException {
> >> >> >>>>>> > >   if (!dir.isDirectory()) {
> >> >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> >> >> >>>>>> > >                                  + dir.toString());
> >> >> >>>>>> > >   }
> >> >> >>>>>> > >
> >> >> >>>>>> > >   checkAccessByFileMethods(dir);
> >> >> >>>>>> > > }
> >> >> >>>>>> > >
> >> >> >>>>>> > > One potentially safer alternative is replacing data dir
> >> with a
> >> >> regular
> >> >> >>>>>> > > file to stimulate disk failures.
> >> >> >>>>>> > >
> >> >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> >> >> >>>>>> cnauroth@hortonworks.com>
> >> >> >>>>>> > wrote:
> >> >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> >> >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
> >> >> permissions
> >> >> >>>>>> > from
> >> >> >>>>>> > >> directories like the one Colin mentioned to simulate disk
> >> >> failures at
> >> >> >>>>>> > data
> >> >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they
> all
> >> >> appear to
> >> >> >>>>>> be
> >> >> >>>>>> > >> doing the necessary work to restore executable
> permissions
> >> at
> >> >> the
> >> >> >>>>>> end of
> >> >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen
> that
> >> >> makes
> >> >> >>>>>> > changes
> >> >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still
> looks
> >> >> fine
> >> >> >>>>>> > though.  I
> >> >> >>>>>> > >> don¹t know if there are other uncommitted patches that
> >> >> changed these
> >> >> >>>>>> > test
> >> >> >>>>>> > >> suites.
> >> >> >>>>>> > >>
> >> >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> >> >> unexpectedly died
> >> >> >>>>>> > >> after removing executable permissions but before
> restoring
> >> >> them.
> >> >> >>>>>> That
> >> >> >>>>>> > >> always would have been a weakness of these test suites,
> >> >> regardless of
> >> >> >>>>>> > any
> >> >> >>>>>> > >> recent changes.
> >> >> >>>>>> > >>
> >> >> >>>>>> > >> Chris Nauroth
> >> >> >>>>>> > >> Hortonworks
> >> >> >>>>>> > >> http://hortonworks.com/
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> >> >> wrote:
> >> >> >>>>>> > >>
> >> >> >>>>>> > >>>Hey Colin,
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
> >> >> going on
> >> >> >>>>>> with
> >> >> >>>>>> > >>>these boxes. He took a look and concluded that some perms
> >> are
> >> >> being
> >> >> >>>>>> set
> >> >> >>>>>> > in
> >> >> >>>>>> > >>>those directories by our unit tests which are precluding
> >> >> those files
> >> >> >>>>>> > from
> >> >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us,
> >> but
> >> >> we
> >> >> >>>>>> should
> >> >> >>>>>> > >>>expect this to keep happening until we can fix the test
> in
> >> >> question
> >> >> >>>>>> to
> >> >> >>>>>> > >>>properly clean up after itself.
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>To help narrow down which commit it was that started
> this,
> >> >> Andrew
> >> >> >>>>>> sent
> >> >> >>>>>> > me
> >> >> >>>>>> > >>>this info:
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >>
> >>
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >> >> >>>>>> > has
> >> >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that
> >> way
> >> >> since
> >> >> >>>>>> 9:32
> >> >> >>>>>> > >>>UTC
> >> >> >>>>>> > >>>on March 5th."
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>--
> >> >> >>>>>> > >>>Aaron T. Myers
> >> >> >>>>>> > >>>Software Engineer, Cloudera
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> >> >> cmccabe@apache.org
> >> >> >>>>>> >
> >> >> >>>>>> > >>>wrote:
> >> >> >>>>>> > >>>
> >> >> >>>>>> > >>>> Hi all,
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I
> can't
> >> >> find any
> >> >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.
> Most
> >> >> of them
> >> >> >>>>>> seem
> >> >> >>>>>> > >>>> to be failing with some variant of this message:
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>>> [ERROR] Failed to execute goal
> >> >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >> >> >>>>>> (default-clean)
> >> >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed
> >> to
> >> >> delete
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>>>
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >>
> >>
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> >> >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >> >> >>>>>> > >>>> -> [Help 1]
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test
> setting
> >> >> wrong
> >> >> >>>>>> > >>>> permissions?
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>>> Colin
> >> >> >>>>>> > >>>>
> >> >> >>>>>> > >>
> >> >> >>>>>> > >
> >> >> >>>>>> > >
> >> >> >>>>>> > >
> >> >> >>>>>> > > --
> >> >> >>>>>> > > Lei (Eddy) Xu
> >> >> >>>>>> > > Software Engineer, Cloudera
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Lei (Eddy) Xu
> >> >> >> Software Engineer, Cloudera
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Sean
> >> >
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
Seems like all builds of Precommit-HDFS-Build failing with below problem.

FATAL: Command "git clean -fdx" returned status code 1:
stdout:
stderr: hudson.plugins.git.GitException
<http://stacktrace.jenkins-ci.org/search?query=hudson.plugins.git.GitException>:
Command "git clean -fdx" returned status code 1:
stdout:
stderr:



Can someone remove "git clean -fdx" from build configurations of
Precommit-HDFS-Build ?


Regards,
Vinay

On Tue, Mar 17, 2015 at 12:59 PM, Vinayakumar B <vi...@apache.org>
wrote:

> I have simulated the problem in my env and verified that, both 'git clean
> -xdf' and 'mvn clean' will not remove the directory.
> mvn fails where as git simply ignores (not even display any warning) the
> problem.
>
>
>
> Regards,
> Vinay
>
> On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Can someone point me to an example build that is broken?
>>
>> On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>> > I'm on it. HADOOP-11721
>> >
>> > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org> wrote:
>> >
>> >> +1 for git clean.
>> >>
>> >> Colin, can you please get it in ASAP? Currently due to the jenkins
>> >> issues, we cannot close the 2.7 blockers.
>> >>
>> >> Thanks,
>> >> Haohui
>> >>
>> >>
>> >>
>> >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cm...@apache.org>
>> >> wrote:
>> >> > If all it takes is someone creating a test that makes a directory
>> >> > without -x, this is going to happen over and over.
>> >> >
>> >> > Let's just fix the problem at the root by running "git clean -fqdx"
>> in
>> >> > our jenkins scripts.  If there's no objections I will add this in and
>> >> > un-break the builds.
>> >> >
>> >> > best,
>> >> > Colin
>> >> >
>> >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> >> I filed HDFS-7917 to change the way to simulate disk failures.
>> >> >>
>> >> >> But I think we still need infrastructure folks to help with jenkins
>> >> >> scripts to clean the dirs left today.
>> >> >>
>> >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
>> >> wrote:
>> >> >>> Any updates on this issues? It seems that all HDFS jenkins builds
>> are
>> >> >>> still failing.
>> >> >>>
>> >> >>> Regards,
>> >> >>> Haohui
>> >> >>>
>> >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
>> >> vinayakumarb@apache.org> wrote:
>> >> >>>> I think the problem started from here.
>> >> >>>>
>> >> >>>>
>> >>
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>> >> >>>>
>> >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>> >> permission.
>> >> >>>> But in this patch, ReplicationMonitor got NPE and it got terminate
>> >> signal,
>> >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>> >> >>>>
>> >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>> >> permission
>> >> >>>> after shutting down cluster. So in this case IMO, permissions were
>> >> never
>> >> >>>> restored.
>> >> >>>>
>> >> >>>>
>> >> >>>>   @After
>> >> >>>>   public void tearDown() throws Exception {
>> >> >>>>     if(data_fail != null) {
>> >> >>>>       FileUtil.setWritable(data_fail, true);
>> >> >>>>     }
>> >> >>>>     if(failedDir != null) {
>> >> >>>>       FileUtil.setWritable(failedDir, true);
>> >> >>>>     }
>> >> >>>>     if(cluster != null) {
>> >> >>>>       cluster.shutdown();
>> >> >>>>     }
>> >> >>>>     for (int i = 0; i < 3; i++) {
>> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
>> >> true);
>> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
>> >> true);
>> >> >>>>     }
>> >> >>>>   }
>> >> >>>>
>> >> >>>>
>> >> >>>> Regards,
>> >> >>>> Vinay
>> >> >>>>
>> >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
>> >> vinayakumarb@apache.org>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>> When I see the history of these kind of builds, All these are
>> >> failed on
>> >> >>>>> node H9.
>> >> >>>>>
>> >> >>>>> I think some or the other uncommitted patch would have created
>> the
>> >> problem
>> >> >>>>> and left it there.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Regards,
>> >> >>>>> Vinay
>> >> >>>>>
>> >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
>> busbey@cloudera.com>
>> >> wrote:
>> >> >>>>>
>> >> >>>>>> You could rely on a destructive git clean call instead of maven
>> to
>> >> do the
>> >> >>>>>> directory removal.
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Sean
>> >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
>> >
>> >> wrote:
>> >> >>>>>>
>> >> >>>>>> > Is there a maven plugin or setting we can use to simply remove
>> >> >>>>>> > directories that have no executable permissions on them?
>> >> Clearly we
>> >> >>>>>> > have the permission to do this from a technical point of view
>> >> (since
>> >> >>>>>> > we created the directories as the jenkins user), it's simply
>> >> that the
>> >> >>>>>> > code refuses to do it.
>> >> >>>>>> >
>> >> >>>>>> > Otherwise I guess we can just fix those tests...
>> >> >>>>>> >
>> >> >>>>>> > Colin
>> >> >>>>>> >
>> >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
>> >> wrote:
>> >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>>>>> > >
>> >> >>>>>> > > In HDFS-7722:
>> >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
>> permissions
>> >> in
>> >> >>>>>> > TearDown().
>> >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
>> >> clause.
>> >> >>>>>> > >
>> >> >>>>>> > > Also I ran mvn test several times on my machine and all
>> tests
>> >> passed.
>> >> >>>>>> > >
>> >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
>> >> >>>>>> > >
>> >> >>>>>> > > private static void checkDirAccess(File dir) throws
>> >> >>>>>> DiskErrorException {
>> >> >>>>>> > >   if (!dir.isDirectory()) {
>> >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
>> >> >>>>>> > >                                  + dir.toString());
>> >> >>>>>> > >   }
>> >> >>>>>> > >
>> >> >>>>>> > >   checkAccessByFileMethods(dir);
>> >> >>>>>> > > }
>> >> >>>>>> > >
>> >> >>>>>> > > One potentially safer alternative is replacing data dir
>> with a
>> >> regular
>> >> >>>>>> > > file to stimulate disk failures.
>> >> >>>>>> > >
>> >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> >> >>>>>> cnauroth@hortonworks.com>
>> >> >>>>>> > wrote:
>> >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
>> >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
>> >> permissions
>> >> >>>>>> > from
>> >> >>>>>> > >> directories like the one Colin mentioned to simulate disk
>> >> failures at
>> >> >>>>>> > data
>> >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
>> >> appear to
>> >> >>>>>> be
>> >> >>>>>> > >> doing the necessary work to restore executable permissions
>> at
>> >> the
>> >> >>>>>> end of
>> >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
>> >> makes
>> >> >>>>>> > changes
>> >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
>> >> fine
>> >> >>>>>> > though.  I
>> >> >>>>>> > >> don¹t know if there are other uncommitted patches that
>> >> changed these
>> >> >>>>>> > test
>> >> >>>>>> > >> suites.
>> >> >>>>>> > >>
>> >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
>> >> unexpectedly died
>> >> >>>>>> > >> after removing executable permissions but before restoring
>> >> them.
>> >> >>>>>> That
>> >> >>>>>> > >> always would have been a weakness of these test suites,
>> >> regardless of
>> >> >>>>>> > any
>> >> >>>>>> > >> recent changes.
>> >> >>>>>> > >>
>> >> >>>>>> > >> Chris Nauroth
>> >> >>>>>> > >> Hortonworks
>> >> >>>>>> > >> http://hortonworks.com/
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >>
>> >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>> >> wrote:
>> >> >>>>>> > >>
>> >> >>>>>> > >>>Hey Colin,
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
>> >> going on
>> >> >>>>>> with
>> >> >>>>>> > >>>these boxes. He took a look and concluded that some perms
>> are
>> >> being
>> >> >>>>>> set
>> >> >>>>>> > in
>> >> >>>>>> > >>>those directories by our unit tests which are precluding
>> >> those files
>> >> >>>>>> > from
>> >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us,
>> but
>> >> we
>> >> >>>>>> should
>> >> >>>>>> > >>>expect this to keep happening until we can fix the test in
>> >> question
>> >> >>>>>> to
>> >> >>>>>> > >>>properly clean up after itself.
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>To help narrow down which commit it was that started this,
>> >> Andrew
>> >> >>>>>> sent
>> >> >>>>>> > me
>> >> >>>>>> > >>>this info:
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> >> >>>>>> > has
>> >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that
>> way
>> >> since
>> >> >>>>>> 9:32
>> >> >>>>>> > >>>UTC
>> >> >>>>>> > >>>on March 5th."
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>--
>> >> >>>>>> > >>>Aaron T. Myers
>> >> >>>>>> > >>>Software Engineer, Cloudera
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
>> >> cmccabe@apache.org
>> >> >>>>>> >
>> >> >>>>>> > >>>wrote:
>> >> >>>>>> > >>>
>> >> >>>>>> > >>>> Hi all,
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
>> >> find any
>> >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most
>> >> of them
>> >> >>>>>> seem
>> >> >>>>>> > >>>> to be failing with some variant of this message:
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> [ERROR] Failed to execute goal
>> >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >> >>>>>> (default-clean)
>> >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed
>> to
>> >> delete
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>>
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>>> > >>>> -> [Help 1]
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
>> >> wrong
>> >> >>>>>> > >>>> permissions?
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>>> Colin
>> >> >>>>>> > >>>>
>> >> >>>>>> > >>
>> >> >>>>>> > >
>> >> >>>>>> > >
>> >> >>>>>> > >
>> >> >>>>>> > > --
>> >> >>>>>> > > Lei (Eddy) Xu
>> >> >>>>>> > > Software Engineer, Cloudera
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >
>> >
>> >
>> > --
>> > Sean
>> >
>>
>>
>>
>> --
>> Sean
>>
>
>

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
This problems seems to be gone atleast for now.
Have made a temp ( as of now) commit to restore the execute permissions for
hadoop-hdfs/target/test/data directory.

Problem was often seen on H9 node. But now multiple builds executed on this
node.

Regards,
Vinay

On Tue, Mar 17, 2015 at 9:53 PM, Vinayakumar B <vi...@apache.org>
wrote:

> Yes, Just create some directory with some contents in it within target
> directory. And set permission to 600.
> Then can run either 'mvn clean' or 'git clean'
>
> -Vinay
>
> On Tue, Mar 17, 2015 at 9:13 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> Is the simulation just removing the executable bit on the directory? I'd
>> like to get something I can reproduce locally.
>>
>> On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vi...@apache.org>
>> wrote:
>>
>> > I have simulated the problem in my env and verified that, both 'git
>> clean
>> > -xdf' and 'mvn clean' will not remove the directory.
>> > mvn fails where as git simply ignores (not even display any warning) the
>> > problem.
>> >
>> >
>> >
>> > Regards,
>> > Vinay
>> >
>> > On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com>
>> wrote:
>> >
>> > > Can someone point me to an example build that is broken?
>> > >
>> > > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com>
>> > wrote:
>> > >
>> > > > I'm on it. HADOOP-11721
>> > > >
>> > > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org>
>> wrote:
>> > > >
>> > > >> +1 for git clean.
>> > > >>
>> > > >> Colin, can you please get it in ASAP? Currently due to the jenkins
>> > > >> issues, we cannot close the 2.7 blockers.
>> > > >>
>> > > >> Thanks,
>> > > >> Haohui
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <
>> cmccabe@apache.org
>> > >
>> > > >> wrote:
>> > > >> > If all it takes is someone creating a test that makes a directory
>> > > >> > without -x, this is going to happen over and over.
>> > > >> >
>> > > >> > Let's just fix the problem at the root by running "git clean
>> -fqdx"
>> > in
>> > > >> > our jenkins scripts.  If there's no objections I will add this in
>> > and
>> > > >> > un-break the builds.
>> > > >> >
>> > > >> > best,
>> > > >> > Colin
>> > > >> >
>> > > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com>
>> wrote:
>> > > >> >> I filed HDFS-7917 to change the way to simulate disk failures.
>> > > >> >>
>> > > >> >> But I think we still need infrastructure folks to help with
>> jenkins
>> > > >> >> scripts to clean the dirs left today.
>> > > >> >>
>> > > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricetons@gmail.com
>> >
>> > > >> wrote:
>> > > >> >>> Any updates on this issues? It seems that all HDFS jenkins
>> builds
>> > > are
>> > > >> >>> still failing.
>> > > >> >>>
>> > > >> >>> Regards,
>> > > >> >>> Haohui
>> > > >> >>>
>> > > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
>> > > >> vinayakumarb@apache.org> wrote:
>> > > >> >>>> I think the problem started from here.
>> > > >> >>>>
>> > > >> >>>>
>> > > >>
>> > >
>> >
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>> > > >> >>>>
>> > > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>> > > >> permission.
>> > > >> >>>> But in this patch, ReplicationMonitor got NPE and it got
>> > terminate
>> > > >> signal,
>> > > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>> > > >> >>>>
>> > > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>> > > >> permission
>> > > >> >>>> after shutting down cluster. So in this case IMO, permissions
>> > were
>> > > >> never
>> > > >> >>>> restored.
>> > > >> >>>>
>> > > >> >>>>
>> > > >> >>>>   @After
>> > > >> >>>>   public void tearDown() throws Exception {
>> > > >> >>>>     if(data_fail != null) {
>> > > >> >>>>       FileUtil.setWritable(data_fail, true);
>> > > >> >>>>     }
>> > > >> >>>>     if(failedDir != null) {
>> > > >> >>>>       FileUtil.setWritable(failedDir, true);
>> > > >> >>>>     }
>> > > >> >>>>     if(cluster != null) {
>> > > >> >>>>       cluster.shutdown();
>> > > >> >>>>     }
>> > > >> >>>>     for (int i = 0; i < 3; i++) {
>> > > >> >>>>       FileUtil.setExecutable(new File(dataDir,
>> "data"+(2*i+1)),
>> > > >> true);
>> > > >> >>>>       FileUtil.setExecutable(new File(dataDir,
>> "data"+(2*i+2)),
>> > > >> true);
>> > > >> >>>>     }
>> > > >> >>>>   }
>> > > >> >>>>
>> > > >> >>>>
>> > > >> >>>> Regards,
>> > > >> >>>> Vinay
>> > > >> >>>>
>> > > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
>> > > >> vinayakumarb@apache.org>
>> > > >> >>>> wrote:
>> > > >> >>>>
>> > > >> >>>>> When I see the history of these kind of builds, All these are
>> > > >> failed on
>> > > >> >>>>> node H9.
>> > > >> >>>>>
>> > > >> >>>>> I think some or the other uncommitted patch would have
>> created
>> > the
>> > > >> problem
>> > > >> >>>>> and left it there.
>> > > >> >>>>>
>> > > >> >>>>>
>> > > >> >>>>> Regards,
>> > > >> >>>>> Vinay
>> > > >> >>>>>
>> > > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
>> > busbey@cloudera.com
>> > > >
>> > > >> wrote:
>> > > >> >>>>>
>> > > >> >>>>>> You could rely on a destructive git clean call instead of
>> maven
>> > > to
>> > > >> do the
>> > > >> >>>>>> directory removal.
>> > > >> >>>>>>
>> > > >> >>>>>> --
>> > > >> >>>>>> Sean
>> > > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <
>> > cmccabe@alumni.cmu.edu>
>> > > >> wrote:
>> > > >> >>>>>>
>> > > >> >>>>>> > Is there a maven plugin or setting we can use to simply
>> > remove
>> > > >> >>>>>> > directories that have no executable permissions on them?
>> > > >> Clearly we
>> > > >> >>>>>> > have the permission to do this from a technical point of
>> view
>> > > >> (since
>> > > >> >>>>>> > we created the directories as the jenkins user), it's
>> simply
>> > > >> that the
>> > > >> >>>>>> > code refuses to do it.
>> > > >> >>>>>> >
>> > > >> >>>>>> > Otherwise I guess we can just fix those tests...
>> > > >> >>>>>> >
>> > > >> >>>>>> > Colin
>> > > >> >>>>>> >
>> > > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
>> >
>> > > >> wrote:
>> > > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > In HDFS-7722:
>> > > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
>> > permissions
>> > > >> in
>> > > >> >>>>>> > TearDown().
>> > > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a
>> finally
>> > > >> clause.
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > Also I ran mvn test several times on my machine and all
>> > tests
>> > > >> passed.
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > private static void checkDirAccess(File dir) throws
>> > > >> >>>>>> DiskErrorException {
>> > > >> >>>>>> > >   if (!dir.isDirectory()) {
>> > > >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
>> > > >> >>>>>> > >                                  + dir.toString());
>> > > >> >>>>>> > >   }
>> > > >> >>>>>> > >
>> > > >> >>>>>> > >   checkAccessByFileMethods(dir);
>> > > >> >>>>>> > > }
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > One potentially safer alternative is replacing data dir
>> > with
>> > > a
>> > > >> regular
>> > > >> >>>>>> > > file to stimulate disk failures.
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> > > >> >>>>>> cnauroth@hortonworks.com>
>> > > >> >>>>>> > wrote:
>> > > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> > > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
>> > > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove
>> executable
>> > > >> permissions
>> > > >> >>>>>> > from
>> > > >> >>>>>> > >> directories like the one Colin mentioned to simulate
>> disk
>> > > >> failures at
>> > > >> >>>>>> > data
>> > > >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they
>> all
>> > > >> appear to
>> > > >> >>>>>> be
>> > > >> >>>>>> > >> doing the necessary work to restore executable
>> permissions
>> > > at
>> > > >> the
>> > > >> >>>>>> end of
>> > > >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen
>> > that
>> > > >> makes
>> > > >> >>>>>> > changes
>> > > >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still
>> looks
>> > > >> fine
>> > > >> >>>>>> > though.  I
>> > > >> >>>>>> > >> don¹t know if there are other uncommitted patches that
>> > > >> changed these
>> > > >> >>>>>> > test
>> > > >> >>>>>> > >> suites.
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
>> > > >> unexpectedly died
>> > > >> >>>>>> > >> after removing executable permissions but before
>> restoring
>> > > >> them.
>> > > >> >>>>>> That
>> > > >> >>>>>> > >> always would have been a weakness of these test suites,
>> > > >> regardless of
>> > > >> >>>>>> > any
>> > > >> >>>>>> > >> recent changes.
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >> Chris Nauroth
>> > > >> >>>>>> > >> Hortonworks
>> > > >> >>>>>> > >> http://hortonworks.com/
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <
>> atm@cloudera.com>
>> > > >> wrote:
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >>>Hey Colin,
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra,
>> what's
>> > > >> going on
>> > > >> >>>>>> with
>> > > >> >>>>>> > >>>these boxes. He took a look and concluded that some
>> perms
>> > > are
>> > > >> being
>> > > >> >>>>>> set
>> > > >> >>>>>> > in
>> > > >> >>>>>> > >>>those directories by our unit tests which are
>> precluding
>> > > >> those files
>> > > >> >>>>>> > from
>> > > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for
>> us,
>> > > but
>> > > >> we
>> > > >> >>>>>> should
>> > > >> >>>>>> > >>>expect this to keep happening until we can fix the
>> test in
>> > > >> question
>> > > >> >>>>>> to
>> > > >> >>>>>> > >>>properly clean up after itself.
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>To help narrow down which commit it was that started
>> this,
>> > > >> Andrew
>> > > >> >>>>>> sent
>> > > >> >>>>>> > me
>> > > >> >>>>>> > >>>this info:
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> > > >> >>>>>> >
>> > > >> >>>>>>
>> > > >>
>> > >
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> > > >> >>>>>> > has
>> > > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been
>> that
>> > way
>> > > >> since
>> > > >> >>>>>> 9:32
>> > > >> >>>>>> > >>>UTC
>> > > >> >>>>>> > >>>on March 5th."
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>--
>> > > >> >>>>>> > >>>Aaron T. Myers
>> > > >> >>>>>> > >>>Software Engineer, Cloudera
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
>> > > >> cmccabe@apache.org
>> > > >> >>>>>> >
>> > > >> >>>>>> > >>>wrote:
>> > > >> >>>>>> > >>>
>> > > >> >>>>>> > >>>> Hi all,
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I
>> > can't
>> > > >> find any
>> > > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.
>> > Most
>> > > >> of them
>> > > >> >>>>>> seem
>> > > >> >>>>>> > >>>> to be failing with some variant of this message:
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>>> [ERROR] Failed to execute goal
>> > > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> > > >> >>>>>> (default-clean)
>> > > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project:
>> Failed
>> > to
>> > > >> delete
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> >
>> > > >> >>>>>> >
>> > > >> >>>>>>
>> > > >>
>> > >
>> >
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> > > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> > > >> >>>>>> > >>>> -> [Help 1]
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test
>> > setting
>> > > >> wrong
>> > > >> >>>>>> > >>>> permissions?
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>>> Colin
>> > > >> >>>>>> > >>>>
>> > > >> >>>>>> > >>
>> > > >> >>>>>> > >
>> > > >> >>>>>> > >
>> > > >> >>>>>> > >
>> > > >> >>>>>> > > --
>> > > >> >>>>>> > > Lei (Eddy) Xu
>> > > >> >>>>>> > > Software Engineer, Cloudera
>> > > >> >>>>>> >
>> > > >> >>>>>>
>> > > >> >>>>>
>> > > >> >>>>>
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Lei (Eddy) Xu
>> > > >> >> Software Engineer, Cloudera
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sean
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Sean
>> > >
>> >
>>
>>
>>
>> --
>> Sean
>>
>
>

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
Yes, Just create some directory with some contents in it within target
directory. And set permission to 600.
Then can run either 'mvn clean' or 'git clean'

-Vinay

On Tue, Mar 17, 2015 at 9:13 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Is the simulation just removing the executable bit on the directory? I'd
> like to get something I can reproduce locally.
>
> On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vi...@apache.org>
> wrote:
>
> > I have simulated the problem in my env and verified that, both 'git clean
> > -xdf' and 'mvn clean' will not remove the directory.
> > mvn fails where as git simply ignores (not even display any warning) the
> > problem.
> >
> >
> >
> > Regards,
> > Vinay
> >
> > On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> > > Can someone point me to an example build that is broken?
> > >
> > > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com>
> > wrote:
> > >
> > > > I'm on it. HADOOP-11721
> > > >
> > > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org>
> wrote:
> > > >
> > > >> +1 for git clean.
> > > >>
> > > >> Colin, can you please get it in ASAP? Currently due to the jenkins
> > > >> issues, we cannot close the 2.7 blockers.
> > > >>
> > > >> Thanks,
> > > >> Haohui
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <
> cmccabe@apache.org
> > >
> > > >> wrote:
> > > >> > If all it takes is someone creating a test that makes a directory
> > > >> > without -x, this is going to happen over and over.
> > > >> >
> > > >> > Let's just fix the problem at the root by running "git clean
> -fqdx"
> > in
> > > >> > our jenkins scripts.  If there's no objections I will add this in
> > and
> > > >> > un-break the builds.
> > > >> >
> > > >> > best,
> > > >> > Colin
> > > >> >
> > > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> > > >> >> I filed HDFS-7917 to change the way to simulate disk failures.
> > > >> >>
> > > >> >> But I think we still need infrastructure folks to help with
> jenkins
> > > >> >> scripts to clean the dirs left today.
> > > >> >>
> > > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
> > > >> wrote:
> > > >> >>> Any updates on this issues? It seems that all HDFS jenkins
> builds
> > > are
> > > >> >>> still failing.
> > > >> >>>
> > > >> >>> Regards,
> > > >> >>> Haohui
> > > >> >>>
> > > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> > > >> vinayakumarb@apache.org> wrote:
> > > >> >>>> I think the problem started from here.
> > > >> >>>>
> > > >> >>>>
> > > >>
> > >
> >
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> > > >> >>>>
> > > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> > > >> permission.
> > > >> >>>> But in this patch, ReplicationMonitor got NPE and it got
> > terminate
> > > >> signal,
> > > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> > > >> >>>>
> > > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> > > >> permission
> > > >> >>>> after shutting down cluster. So in this case IMO, permissions
> > were
> > > >> never
> > > >> >>>> restored.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>   @After
> > > >> >>>>   public void tearDown() throws Exception {
> > > >> >>>>     if(data_fail != null) {
> > > >> >>>>       FileUtil.setWritable(data_fail, true);
> > > >> >>>>     }
> > > >> >>>>     if(failedDir != null) {
> > > >> >>>>       FileUtil.setWritable(failedDir, true);
> > > >> >>>>     }
> > > >> >>>>     if(cluster != null) {
> > > >> >>>>       cluster.shutdown();
> > > >> >>>>     }
> > > >> >>>>     for (int i = 0; i < 3; i++) {
> > > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
> > > >> true);
> > > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
> > > >> true);
> > > >> >>>>     }
> > > >> >>>>   }
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> Regards,
> > > >> >>>> Vinay
> > > >> >>>>
> > > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> > > >> vinayakumarb@apache.org>
> > > >> >>>> wrote:
> > > >> >>>>
> > > >> >>>>> When I see the history of these kind of builds, All these are
> > > >> failed on
> > > >> >>>>> node H9.
> > > >> >>>>>
> > > >> >>>>> I think some or the other uncommitted patch would have created
> > the
> > > >> problem
> > > >> >>>>> and left it there.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> Regards,
> > > >> >>>>> Vinay
> > > >> >>>>>
> > > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
> > busbey@cloudera.com
> > > >
> > > >> wrote:
> > > >> >>>>>
> > > >> >>>>>> You could rely on a destructive git clean call instead of
> maven
> > > to
> > > >> do the
> > > >> >>>>>> directory removal.
> > > >> >>>>>>
> > > >> >>>>>> --
> > > >> >>>>>> Sean
> > > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <
> > cmccabe@alumni.cmu.edu>
> > > >> wrote:
> > > >> >>>>>>
> > > >> >>>>>> > Is there a maven plugin or setting we can use to simply
> > remove
> > > >> >>>>>> > directories that have no executable permissions on them?
> > > >> Clearly we
> > > >> >>>>>> > have the permission to do this from a technical point of
> view
> > > >> (since
> > > >> >>>>>> > we created the directories as the jenkins user), it's
> simply
> > > >> that the
> > > >> >>>>>> > code refuses to do it.
> > > >> >>>>>> >
> > > >> >>>>>> > Otherwise I guess we can just fix those tests...
> > > >> >>>>>> >
> > > >> >>>>>> > Colin
> > > >> >>>>>> >
> > > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> > > >> wrote:
> > > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> > > >> >>>>>> > >
> > > >> >>>>>> > > In HDFS-7722:
> > > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
> > permissions
> > > >> in
> > > >> >>>>>> > TearDown().
> > > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> > > >> clause.
> > > >> >>>>>> > >
> > > >> >>>>>> > > Also I ran mvn test several times on my machine and all
> > tests
> > > >> passed.
> > > >> >>>>>> > >
> > > >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> > > >> >>>>>> > >
> > > >> >>>>>> > > private static void checkDirAccess(File dir) throws
> > > >> >>>>>> DiskErrorException {
> > > >> >>>>>> > >   if (!dir.isDirectory()) {
> > > >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> > > >> >>>>>> > >                                  + dir.toString());
> > > >> >>>>>> > >   }
> > > >> >>>>>> > >
> > > >> >>>>>> > >   checkAccessByFileMethods(dir);
> > > >> >>>>>> > > }
> > > >> >>>>>> > >
> > > >> >>>>>> > > One potentially safer alternative is replacing data dir
> > with
> > > a
> > > >> regular
> > > >> >>>>>> > > file to stimulate disk failures.
> > > >> >>>>>> > >
> > > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> > > >> >>>>>> cnauroth@hortonworks.com>
> > > >> >>>>>> > wrote:
> > > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> > > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> > > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove
> executable
> > > >> permissions
> > > >> >>>>>> > from
> > > >> >>>>>> > >> directories like the one Colin mentioned to simulate
> disk
> > > >> failures at
> > > >> >>>>>> > data
> > > >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they
> all
> > > >> appear to
> > > >> >>>>>> be
> > > >> >>>>>> > >> doing the necessary work to restore executable
> permissions
> > > at
> > > >> the
> > > >> >>>>>> end of
> > > >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen
> > that
> > > >> makes
> > > >> >>>>>> > changes
> > > >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still
> looks
> > > >> fine
> > > >> >>>>>> > though.  I
> > > >> >>>>>> > >> don¹t know if there are other uncommitted patches that
> > > >> changed these
> > > >> >>>>>> > test
> > > >> >>>>>> > >> suites.
> > > >> >>>>>> > >>
> > > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> > > >> unexpectedly died
> > > >> >>>>>> > >> after removing executable permissions but before
> restoring
> > > >> them.
> > > >> >>>>>> That
> > > >> >>>>>> > >> always would have been a weakness of these test suites,
> > > >> regardless of
> > > >> >>>>>> > any
> > > >> >>>>>> > >> recent changes.
> > > >> >>>>>> > >>
> > > >> >>>>>> > >> Chris Nauroth
> > > >> >>>>>> > >> Hortonworks
> > > >> >>>>>> > >> http://hortonworks.com/
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> >
> > > >> wrote:
> > > >> >>>>>> > >>
> > > >> >>>>>> > >>>Hey Colin,
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra,
> what's
> > > >> going on
> > > >> >>>>>> with
> > > >> >>>>>> > >>>these boxes. He took a look and concluded that some
> perms
> > > are
> > > >> being
> > > >> >>>>>> set
> > > >> >>>>>> > in
> > > >> >>>>>> > >>>those directories by our unit tests which are precluding
> > > >> those files
> > > >> >>>>>> > from
> > > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for
> us,
> > > but
> > > >> we
> > > >> >>>>>> should
> > > >> >>>>>> > >>>expect this to keep happening until we can fix the test
> in
> > > >> question
> > > >> >>>>>> to
> > > >> >>>>>> > >>>properly clean up after itself.
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>To help narrow down which commit it was that started
> this,
> > > >> Andrew
> > > >> >>>>>> sent
> > > >> >>>>>> > me
> > > >> >>>>>> > >>>this info:
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> > > >> >>>>>> >
> > > >> >>>>>>
> > > >>
> > >
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> > > >> >>>>>> > has
> > > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that
> > way
> > > >> since
> > > >> >>>>>> 9:32
> > > >> >>>>>> > >>>UTC
> > > >> >>>>>> > >>>on March 5th."
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>--
> > > >> >>>>>> > >>>Aaron T. Myers
> > > >> >>>>>> > >>>Software Engineer, Cloudera
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> > > >> cmccabe@apache.org
> > > >> >>>>>> >
> > > >> >>>>>> > >>>wrote:
> > > >> >>>>>> > >>>
> > > >> >>>>>> > >>>> Hi all,
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I
> > can't
> > > >> find any
> > > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.
> > Most
> > > >> of them
> > > >> >>>>>> seem
> > > >> >>>>>> > >>>> to be failing with some variant of this message:
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>>> [ERROR] Failed to execute goal
> > > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> > > >> >>>>>> (default-clean)
> > > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project:
> Failed
> > to
> > > >> delete
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>>>
> > > >> >>>>>> >
> > > >> >>>>>> >
> > > >> >>>>>>
> > > >>
> > >
> >
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> > > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> > > >> >>>>>> > >>>> -> [Help 1]
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test
> > setting
> > > >> wrong
> > > >> >>>>>> > >>>> permissions?
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>>> Colin
> > > >> >>>>>> > >>>>
> > > >> >>>>>> > >>
> > > >> >>>>>> > >
> > > >> >>>>>> > >
> > > >> >>>>>> > >
> > > >> >>>>>> > > --
> > > >> >>>>>> > > Lei (Eddy) Xu
> > > >> >>>>>> > > Software Engineer, Cloudera
> > > >> >>>>>> >
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Lei (Eddy) Xu
> > > >> >> Software Engineer, Cloudera
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Sean
> > > >
> > >
> > >
> > >
> > > --
> > > Sean
> > >
> >
>
>
>
> --
> Sean
>

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Is the simulation just removing the executable bit on the directory? I'd
like to get something I can reproduce locally.

On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vi...@apache.org>
wrote:

> I have simulated the problem in my env and verified that, both 'git clean
> -xdf' and 'mvn clean' will not remove the directory.
> mvn fails where as git simply ignores (not even display any warning) the
> problem.
>
>
>
> Regards,
> Vinay
>
> On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > Can someone point me to an example build that is broken?
> >
> > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> > > I'm on it. HADOOP-11721
> > >
> > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org> wrote:
> > >
> > >> +1 for git clean.
> > >>
> > >> Colin, can you please get it in ASAP? Currently due to the jenkins
> > >> issues, we cannot close the 2.7 blockers.
> > >>
> > >> Thanks,
> > >> Haohui
> > >>
> > >>
> > >>
> > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmccabe@apache.org
> >
> > >> wrote:
> > >> > If all it takes is someone creating a test that makes a directory
> > >> > without -x, this is going to happen over and over.
> > >> >
> > >> > Let's just fix the problem at the root by running "git clean -fqdx"
> in
> > >> > our jenkins scripts.  If there's no objections I will add this in
> and
> > >> > un-break the builds.
> > >> >
> > >> > best,
> > >> > Colin
> > >> >
> > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> > >> >> I filed HDFS-7917 to change the way to simulate disk failures.
> > >> >>
> > >> >> But I think we still need infrastructure folks to help with jenkins
> > >> >> scripts to clean the dirs left today.
> > >> >>
> > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
> > >> wrote:
> > >> >>> Any updates on this issues? It seems that all HDFS jenkins builds
> > are
> > >> >>> still failing.
> > >> >>>
> > >> >>> Regards,
> > >> >>> Haohui
> > >> >>>
> > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> > >> vinayakumarb@apache.org> wrote:
> > >> >>>> I think the problem started from here.
> > >> >>>>
> > >> >>>>
> > >>
> >
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> > >> >>>>
> > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> > >> permission.
> > >> >>>> But in this patch, ReplicationMonitor got NPE and it got
> terminate
> > >> signal,
> > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> > >> >>>>
> > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> > >> permission
> > >> >>>> after shutting down cluster. So in this case IMO, permissions
> were
> > >> never
> > >> >>>> restored.
> > >> >>>>
> > >> >>>>
> > >> >>>>   @After
> > >> >>>>   public void tearDown() throws Exception {
> > >> >>>>     if(data_fail != null) {
> > >> >>>>       FileUtil.setWritable(data_fail, true);
> > >> >>>>     }
> > >> >>>>     if(failedDir != null) {
> > >> >>>>       FileUtil.setWritable(failedDir, true);
> > >> >>>>     }
> > >> >>>>     if(cluster != null) {
> > >> >>>>       cluster.shutdown();
> > >> >>>>     }
> > >> >>>>     for (int i = 0; i < 3; i++) {
> > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
> > >> true);
> > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
> > >> true);
> > >> >>>>     }
> > >> >>>>   }
> > >> >>>>
> > >> >>>>
> > >> >>>> Regards,
> > >> >>>> Vinay
> > >> >>>>
> > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> > >> vinayakumarb@apache.org>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> When I see the history of these kind of builds, All these are
> > >> failed on
> > >> >>>>> node H9.
> > >> >>>>>
> > >> >>>>> I think some or the other uncommitted patch would have created
> the
> > >> problem
> > >> >>>>> and left it there.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Regards,
> > >> >>>>> Vinay
> > >> >>>>>
> > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
> busbey@cloudera.com
> > >
> > >> wrote:
> > >> >>>>>
> > >> >>>>>> You could rely on a destructive git clean call instead of maven
> > to
> > >> do the
> > >> >>>>>> directory removal.
> > >> >>>>>>
> > >> >>>>>> --
> > >> >>>>>> Sean
> > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <
> cmccabe@alumni.cmu.edu>
> > >> wrote:
> > >> >>>>>>
> > >> >>>>>> > Is there a maven plugin or setting we can use to simply
> remove
> > >> >>>>>> > directories that have no executable permissions on them?
> > >> Clearly we
> > >> >>>>>> > have the permission to do this from a technical point of view
> > >> (since
> > >> >>>>>> > we created the directories as the jenkins user), it's simply
> > >> that the
> > >> >>>>>> > code refuses to do it.
> > >> >>>>>> >
> > >> >>>>>> > Otherwise I guess we can just fix those tests...
> > >> >>>>>> >
> > >> >>>>>> > Colin
> > >> >>>>>> >
> > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> > >> wrote:
> > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> > >> >>>>>> > >
> > >> >>>>>> > > In HDFS-7722:
> > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
> permissions
> > >> in
> > >> >>>>>> > TearDown().
> > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> > >> clause.
> > >> >>>>>> > >
> > >> >>>>>> > > Also I ran mvn test several times on my machine and all
> tests
> > >> passed.
> > >> >>>>>> > >
> > >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> > >> >>>>>> > >
> > >> >>>>>> > > private static void checkDirAccess(File dir) throws
> > >> >>>>>> DiskErrorException {
> > >> >>>>>> > >   if (!dir.isDirectory()) {
> > >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> > >> >>>>>> > >                                  + dir.toString());
> > >> >>>>>> > >   }
> > >> >>>>>> > >
> > >> >>>>>> > >   checkAccessByFileMethods(dir);
> > >> >>>>>> > > }
> > >> >>>>>> > >
> > >> >>>>>> > > One potentially safer alternative is replacing data dir
> with
> > a
> > >> regular
> > >> >>>>>> > > file to stimulate disk failures.
> > >> >>>>>> > >
> > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> > >> >>>>>> cnauroth@hortonworks.com>
> > >> >>>>>> > wrote:
> > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
> > >> permissions
> > >> >>>>>> > from
> > >> >>>>>> > >> directories like the one Colin mentioned to simulate disk
> > >> failures at
> > >> >>>>>> > data
> > >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
> > >> appear to
> > >> >>>>>> be
> > >> >>>>>> > >> doing the necessary work to restore executable permissions
> > at
> > >> the
> > >> >>>>>> end of
> > >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen
> that
> > >> makes
> > >> >>>>>> > changes
> > >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
> > >> fine
> > >> >>>>>> > though.  I
> > >> >>>>>> > >> don¹t know if there are other uncommitted patches that
> > >> changed these
> > >> >>>>>> > test
> > >> >>>>>> > >> suites.
> > >> >>>>>> > >>
> > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> > >> unexpectedly died
> > >> >>>>>> > >> after removing executable permissions but before restoring
> > >> them.
> > >> >>>>>> That
> > >> >>>>>> > >> always would have been a weakness of these test suites,
> > >> regardless of
> > >> >>>>>> > any
> > >> >>>>>> > >> recent changes.
> > >> >>>>>> > >>
> > >> >>>>>> > >> Chris Nauroth
> > >> >>>>>> > >> Hortonworks
> > >> >>>>>> > >> http://hortonworks.com/
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> > >> wrote:
> > >> >>>>>> > >>
> > >> >>>>>> > >>>Hey Colin,
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
> > >> going on
> > >> >>>>>> with
> > >> >>>>>> > >>>these boxes. He took a look and concluded that some perms
> > are
> > >> being
> > >> >>>>>> set
> > >> >>>>>> > in
> > >> >>>>>> > >>>those directories by our unit tests which are precluding
> > >> those files
> > >> >>>>>> > from
> > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us,
> > but
> > >> we
> > >> >>>>>> should
> > >> >>>>>> > >>>expect this to keep happening until we can fix the test in
> > >> question
> > >> >>>>>> to
> > >> >>>>>> > >>>properly clean up after itself.
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>To help narrow down which commit it was that started this,
> > >> Andrew
> > >> >>>>>> sent
> > >> >>>>>> > me
> > >> >>>>>> > >>>this info:
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> > >> >>>>>> >
> > >> >>>>>>
> > >>
> > >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> > >> >>>>>> > has
> > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that
> way
> > >> since
> > >> >>>>>> 9:32
> > >> >>>>>> > >>>UTC
> > >> >>>>>> > >>>on March 5th."
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>--
> > >> >>>>>> > >>>Aaron T. Myers
> > >> >>>>>> > >>>Software Engineer, Cloudera
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> > >> cmccabe@apache.org
> > >> >>>>>> >
> > >> >>>>>> > >>>wrote:
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>> Hi all,
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I
> can't
> > >> find any
> > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.
> Most
> > >> of them
> > >> >>>>>> seem
> > >> >>>>>> > >>>> to be failing with some variant of this message:
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> [ERROR] Failed to execute goal
> > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> > >> >>>>>> (default-clean)
> > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed
> to
> > >> delete
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>>
> > >> >>>>>> >
> > >> >>>>>> >
> > >> >>>>>>
> > >>
> >
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> > >> >>>>>> > >>>> -> [Help 1]
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test
> setting
> > >> wrong
> > >> >>>>>> > >>>> permissions?
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> Colin
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>
> > >> >>>>>> > >
> > >> >>>>>> > >
> > >> >>>>>> > >
> > >> >>>>>> > > --
> > >> >>>>>> > > Lei (Eddy) Xu
> > >> >>>>>> > > Software Engineer, Cloudera
> > >> >>>>>> >
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Lei (Eddy) Xu
> > >> >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Sean
> > >
> >
> >
> >
> > --
> > Sean
> >
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
I have simulated the problem in my env and verified that, both 'git clean
-xdf' and 'mvn clean' will not remove the directory.
mvn fails where as git simply ignores (not even display any warning) the
problem.



Regards,
Vinay

On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bu...@cloudera.com> wrote:

> Can someone point me to an example build that is broken?
>
> On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > I'm on it. HADOOP-11721
> >
> > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org> wrote:
> >
> >> +1 for git clean.
> >>
> >> Colin, can you please get it in ASAP? Currently due to the jenkins
> >> issues, we cannot close the 2.7 blockers.
> >>
> >> Thanks,
> >> Haohui
> >>
> >>
> >>
> >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cm...@apache.org>
> >> wrote:
> >> > If all it takes is someone creating a test that makes a directory
> >> > without -x, this is going to happen over and over.
> >> >
> >> > Let's just fix the problem at the root by running "git clean -fqdx" in
> >> > our jenkins scripts.  If there's no objections I will add this in and
> >> > un-break the builds.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> >> >> I filed HDFS-7917 to change the way to simulate disk failures.
> >> >>
> >> >> But I think we still need infrastructure folks to help with jenkins
> >> >> scripts to clean the dirs left today.
> >> >>
> >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
> >> wrote:
> >> >>> Any updates on this issues? It seems that all HDFS jenkins builds
> are
> >> >>> still failing.
> >> >>>
> >> >>> Regards,
> >> >>> Haohui
> >> >>>
> >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> >> vinayakumarb@apache.org> wrote:
> >> >>>> I think the problem started from here.
> >> >>>>
> >> >>>>
> >>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> >> >>>>
> >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> >> permission.
> >> >>>> But in this patch, ReplicationMonitor got NPE and it got terminate
> >> signal,
> >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> >> >>>>
> >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> >> permission
> >> >>>> after shutting down cluster. So in this case IMO, permissions were
> >> never
> >> >>>> restored.
> >> >>>>
> >> >>>>
> >> >>>>   @After
> >> >>>>   public void tearDown() throws Exception {
> >> >>>>     if(data_fail != null) {
> >> >>>>       FileUtil.setWritable(data_fail, true);
> >> >>>>     }
> >> >>>>     if(failedDir != null) {
> >> >>>>       FileUtil.setWritable(failedDir, true);
> >> >>>>     }
> >> >>>>     if(cluster != null) {
> >> >>>>       cluster.shutdown();
> >> >>>>     }
> >> >>>>     for (int i = 0; i < 3; i++) {
> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
> >> true);
> >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
> >> true);
> >> >>>>     }
> >> >>>>   }
> >> >>>>
> >> >>>>
> >> >>>> Regards,
> >> >>>> Vinay
> >> >>>>
> >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> When I see the history of these kind of builds, All these are
> >> failed on
> >> >>>>> node H9.
> >> >>>>>
> >> >>>>> I think some or the other uncommitted patch would have created the
> >> problem
> >> >>>>> and left it there.
> >> >>>>>
> >> >>>>>
> >> >>>>> Regards,
> >> >>>>> Vinay
> >> >>>>>
> >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <busbey@cloudera.com
> >
> >> wrote:
> >> >>>>>
> >> >>>>>> You could rely on a destructive git clean call instead of maven
> to
> >> do the
> >> >>>>>> directory removal.
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Sean
> >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> >> wrote:
> >> >>>>>>
> >> >>>>>> > Is there a maven plugin or setting we can use to simply remove
> >> >>>>>> > directories that have no executable permissions on them?
> >> Clearly we
> >> >>>>>> > have the permission to do this from a technical point of view
> >> (since
> >> >>>>>> > we created the directories as the jenkins user), it's simply
> >> that the
> >> >>>>>> > code refuses to do it.
> >> >>>>>> >
> >> >>>>>> > Otherwise I guess we can just fix those tests...
> >> >>>>>> >
> >> >>>>>> > Colin
> >> >>>>>> >
> >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
> >> wrote:
> >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> >> >>>>>> > >
> >> >>>>>> > > In HDFS-7722:
> >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions
> >> in
> >> >>>>>> > TearDown().
> >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> >> clause.
> >> >>>>>> > >
> >> >>>>>> > > Also I ran mvn test several times on my machine and all tests
> >> passed.
> >> >>>>>> > >
> >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> >> >>>>>> > >
> >> >>>>>> > > private static void checkDirAccess(File dir) throws
> >> >>>>>> DiskErrorException {
> >> >>>>>> > >   if (!dir.isDirectory()) {
> >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> >> >>>>>> > >                                  + dir.toString());
> >> >>>>>> > >   }
> >> >>>>>> > >
> >> >>>>>> > >   checkAccessByFileMethods(dir);
> >> >>>>>> > > }
> >> >>>>>> > >
> >> >>>>>> > > One potentially safer alternative is replacing data dir with
> a
> >> regular
> >> >>>>>> > > file to stimulate disk failures.
> >> >>>>>> > >
> >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> >> >>>>>> cnauroth@hortonworks.com>
> >> >>>>>> > wrote:
> >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
> >> permissions
> >> >>>>>> > from
> >> >>>>>> > >> directories like the one Colin mentioned to simulate disk
> >> failures at
> >> >>>>>> > data
> >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
> >> appear to
> >> >>>>>> be
> >> >>>>>> > >> doing the necessary work to restore executable permissions
> at
> >> the
> >> >>>>>> end of
> >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
> >> makes
> >> >>>>>> > changes
> >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
> >> fine
> >> >>>>>> > though.  I
> >> >>>>>> > >> don¹t know if there are other uncommitted patches that
> >> changed these
> >> >>>>>> > test
> >> >>>>>> > >> suites.
> >> >>>>>> > >>
> >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> >> unexpectedly died
> >> >>>>>> > >> after removing executable permissions but before restoring
> >> them.
> >> >>>>>> That
> >> >>>>>> > >> always would have been a weakness of these test suites,
> >> regardless of
> >> >>>>>> > any
> >> >>>>>> > >> recent changes.
> >> >>>>>> > >>
> >> >>>>>> > >> Chris Nauroth
> >> >>>>>> > >> Hortonworks
> >> >>>>>> > >> http://hortonworks.com/
> >> >>>>>> > >>
> >> >>>>>> > >>
> >> >>>>>> > >>
> >> >>>>>> > >>
> >> >>>>>> > >>
> >> >>>>>> > >>
> >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> >> wrote:
> >> >>>>>> > >>
> >> >>>>>> > >>>Hey Colin,
> >> >>>>>> > >>>
> >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
> >> going on
> >> >>>>>> with
> >> >>>>>> > >>>these boxes. He took a look and concluded that some perms
> are
> >> being
> >> >>>>>> set
> >> >>>>>> > in
> >> >>>>>> > >>>those directories by our unit tests which are precluding
> >> those files
> >> >>>>>> > from
> >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us,
> but
> >> we
> >> >>>>>> should
> >> >>>>>> > >>>expect this to keep happening until we can fix the test in
> >> question
> >> >>>>>> to
> >> >>>>>> > >>>properly clean up after itself.
> >> >>>>>> > >>>
> >> >>>>>> > >>>To help narrow down which commit it was that started this,
> >> Andrew
> >> >>>>>> sent
> >> >>>>>> > me
> >> >>>>>> > >>>this info:
> >> >>>>>> > >>>
> >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >> >>>>>> >
> >> >>>>>>
> >>
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >> >>>>>> > has
> >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
> >> since
> >> >>>>>> 9:32
> >> >>>>>> > >>>UTC
> >> >>>>>> > >>>on March 5th."
> >> >>>>>> > >>>
> >> >>>>>> > >>>--
> >> >>>>>> > >>>Aaron T. Myers
> >> >>>>>> > >>>Software Engineer, Cloudera
> >> >>>>>> > >>>
> >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> >> cmccabe@apache.org
> >> >>>>>> >
> >> >>>>>> > >>>wrote:
> >> >>>>>> > >>>
> >> >>>>>> > >>>> Hi all,
> >> >>>>>> > >>>>
> >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
> >> find any
> >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most
> >> of them
> >> >>>>>> seem
> >> >>>>>> > >>>> to be failing with some variant of this message:
> >> >>>>>> > >>>>
> >> >>>>>> > >>>> [ERROR] Failed to execute goal
> >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >> >>>>>> (default-clean)
> >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >> delete
> >> >>>>>> > >>>>
> >> >>>>>> > >>>>
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >>
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >> >>>>>> > >>>> -> [Help 1]
> >> >>>>>> > >>>>
> >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
> >> wrong
> >> >>>>>> > >>>> permissions?
> >> >>>>>> > >>>>
> >> >>>>>> > >>>> Colin
> >> >>>>>> > >>>>
> >> >>>>>> > >>
> >> >>>>>> > >
> >> >>>>>> > >
> >> >>>>>> > >
> >> >>>>>> > > --
> >> >>>>>> > > Lei (Eddy) Xu
> >> >>>>>> > > Software Engineer, Cloudera
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lei (Eddy) Xu
> >> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
Can someone point me to an example build that is broken?

On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bu...@cloudera.com> wrote:

> I'm on it. HADOOP-11721
>
> On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org> wrote:
>
>> +1 for git clean.
>>
>> Colin, can you please get it in ASAP? Currently due to the jenkins
>> issues, we cannot close the 2.7 blockers.
>>
>> Thanks,
>> Haohui
>>
>>
>>
>> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>> > If all it takes is someone creating a test that makes a directory
>> > without -x, this is going to happen over and over.
>> >
>> > Let's just fix the problem at the root by running "git clean -fqdx" in
>> > our jenkins scripts.  If there's no objections I will add this in and
>> > un-break the builds.
>> >
>> > best,
>> > Colin
>> >
>> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
>> >> I filed HDFS-7917 to change the way to simulate disk failures.
>> >>
>> >> But I think we still need infrastructure folks to help with jenkins
>> >> scripts to clean the dirs left today.
>> >>
>> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com>
>> wrote:
>> >>> Any updates on this issues? It seems that all HDFS jenkins builds are
>> >>> still failing.
>> >>>
>> >>> Regards,
>> >>> Haohui
>> >>>
>> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
>> vinayakumarb@apache.org> wrote:
>> >>>> I think the problem started from here.
>> >>>>
>> >>>>
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>> >>>>
>> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>> permission.
>> >>>> But in this patch, ReplicationMonitor got NPE and it got terminate
>> signal,
>> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>> >>>>
>> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>> permission
>> >>>> after shutting down cluster. So in this case IMO, permissions were
>> never
>> >>>> restored.
>> >>>>
>> >>>>
>> >>>>   @After
>> >>>>   public void tearDown() throws Exception {
>> >>>>     if(data_fail != null) {
>> >>>>       FileUtil.setWritable(data_fail, true);
>> >>>>     }
>> >>>>     if(failedDir != null) {
>> >>>>       FileUtil.setWritable(failedDir, true);
>> >>>>     }
>> >>>>     if(cluster != null) {
>> >>>>       cluster.shutdown();
>> >>>>     }
>> >>>>     for (int i = 0; i < 3; i++) {
>> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
>> true);
>> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
>> true);
>> >>>>     }
>> >>>>   }
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>> Vinay
>> >>>>
>> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
>> vinayakumarb@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> When I see the history of these kind of builds, All these are
>> failed on
>> >>>>> node H9.
>> >>>>>
>> >>>>> I think some or the other uncommitted patch would have created the
>> problem
>> >>>>> and left it there.
>> >>>>>
>> >>>>>
>> >>>>> Regards,
>> >>>>> Vinay
>> >>>>>
>> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com>
>> wrote:
>> >>>>>
>> >>>>>> You could rely on a destructive git clean call instead of maven to
>> do the
>> >>>>>> directory removal.
>> >>>>>>
>> >>>>>> --
>> >>>>>> Sean
>> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>> wrote:
>> >>>>>>
>> >>>>>> > Is there a maven plugin or setting we can use to simply remove
>> >>>>>> > directories that have no executable permissions on them?
>> Clearly we
>> >>>>>> > have the permission to do this from a technical point of view
>> (since
>> >>>>>> > we created the directories as the jenkins user), it's simply
>> that the
>> >>>>>> > code refuses to do it.
>> >>>>>> >
>> >>>>>> > Otherwise I guess we can just fix those tests...
>> >>>>>> >
>> >>>>>> > Colin
>> >>>>>> >
>> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
>> wrote:
>> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> >>>>>> > >
>> >>>>>> > > In HDFS-7722:
>> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions
>> in
>> >>>>>> > TearDown().
>> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
>> clause.
>> >>>>>> > >
>> >>>>>> > > Also I ran mvn test several times on my machine and all tests
>> passed.
>> >>>>>> > >
>> >>>>>> > > However, since in DiskChecker#checkDirAccess():
>> >>>>>> > >
>> >>>>>> > > private static void checkDirAccess(File dir) throws
>> >>>>>> DiskErrorException {
>> >>>>>> > >   if (!dir.isDirectory()) {
>> >>>>>> > >     throw new DiskErrorException("Not a directory: "
>> >>>>>> > >                                  + dir.toString());
>> >>>>>> > >   }
>> >>>>>> > >
>> >>>>>> > >   checkAccessByFileMethods(dir);
>> >>>>>> > > }
>> >>>>>> > >
>> >>>>>> > > One potentially safer alternative is replacing data dir with a
>> regular
>> >>>>>> > > file to stimulate disk failures.
>> >>>>>> > >
>> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> >>>>>> cnauroth@hortonworks.com>
>> >>>>>> > wrote:
>> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
>> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
>> permissions
>> >>>>>> > from
>> >>>>>> > >> directories like the one Colin mentioned to simulate disk
>> failures at
>> >>>>>> > data
>> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
>> appear to
>> >>>>>> be
>> >>>>>> > >> doing the necessary work to restore executable permissions at
>> the
>> >>>>>> end of
>> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
>> makes
>> >>>>>> > changes
>> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
>> fine
>> >>>>>> > though.  I
>> >>>>>> > >> don¹t know if there are other uncommitted patches that
>> changed these
>> >>>>>> > test
>> >>>>>> > >> suites.
>> >>>>>> > >>
>> >>>>>> > >> I suppose it¹s also possible that the JUnit process
>> unexpectedly died
>> >>>>>> > >> after removing executable permissions but before restoring
>> them.
>> >>>>>> That
>> >>>>>> > >> always would have been a weakness of these test suites,
>> regardless of
>> >>>>>> > any
>> >>>>>> > >> recent changes.
>> >>>>>> > >>
>> >>>>>> > >> Chris Nauroth
>> >>>>>> > >> Hortonworks
>> >>>>>> > >> http://hortonworks.com/
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>> wrote:
>> >>>>>> > >>
>> >>>>>> > >>>Hey Colin,
>> >>>>>> > >>>
>> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
>> going on
>> >>>>>> with
>> >>>>>> > >>>these boxes. He took a look and concluded that some perms are
>> being
>> >>>>>> set
>> >>>>>> > in
>> >>>>>> > >>>those directories by our unit tests which are precluding
>> those files
>> >>>>>> > from
>> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but
>> we
>> >>>>>> should
>> >>>>>> > >>>expect this to keep happening until we can fix the test in
>> question
>> >>>>>> to
>> >>>>>> > >>>properly clean up after itself.
>> >>>>>> > >>>
>> >>>>>> > >>>To help narrow down which commit it was that started this,
>> Andrew
>> >>>>>> sent
>> >>>>>> > me
>> >>>>>> > >>>this info:
>> >>>>>> > >>>
>> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>>>>> >
>> >>>>>>
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> >>>>>> > has
>> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
>> since
>> >>>>>> 9:32
>> >>>>>> > >>>UTC
>> >>>>>> > >>>on March 5th."
>> >>>>>> > >>>
>> >>>>>> > >>>--
>> >>>>>> > >>>Aaron T. Myers
>> >>>>>> > >>>Software Engineer, Cloudera
>> >>>>>> > >>>
>> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
>> cmccabe@apache.org
>> >>>>>> >
>> >>>>>> > >>>wrote:
>> >>>>>> > >>>
>> >>>>>> > >>>> Hi all,
>> >>>>>> > >>>>
>> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
>> find any
>> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most
>> of them
>> >>>>>> seem
>> >>>>>> > >>>> to be failing with some variant of this message:
>> >>>>>> > >>>>
>> >>>>>> > >>>> [ERROR] Failed to execute goal
>> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>>>>> (default-clean)
>> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
>> delete
>> >>>>>> > >>>>
>> >>>>>> > >>>>
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>> > >>>> -> [Help 1]
>> >>>>>> > >>>>
>> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
>> wrong
>> >>>>>> > >>>> permissions?
>> >>>>>> > >>>>
>> >>>>>> > >>>> Colin
>> >>>>>> > >>>>
>> >>>>>> > >>
>> >>>>>> > >
>> >>>>>> > >
>> >>>>>> > >
>> >>>>>> > > --
>> >>>>>> > > Lei (Eddy) Xu
>> >>>>>> > > Software Engineer, Cloudera
>> >>>>>> >
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>
>
>
> --
> Sean
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
I'm on it. HADOOP-11721

On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <wh...@apache.org> wrote:

> +1 for git clean.
>
> Colin, can you please get it in ASAP? Currently due to the jenkins
> issues, we cannot close the 2.7 blockers.
>
> Thanks,
> Haohui
>
>
>
> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cm...@apache.org>
> wrote:
> > If all it takes is someone creating a test that makes a directory
> > without -x, this is going to happen over and over.
> >
> > Let's just fix the problem at the root by running "git clean -fqdx" in
> > our jenkins scripts.  If there's no objections I will add this in and
> > un-break the builds.
> >
> > best,
> > Colin
> >
> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> >> I filed HDFS-7917 to change the way to simulate disk failures.
> >>
> >> But I think we still need infrastructure folks to help with jenkins
> >> scripts to clean the dirs left today.
> >>
> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com> wrote:
> >>> Any updates on this issues? It seems that all HDFS jenkins builds are
> >>> still failing.
> >>>
> >>> Regards,
> >>> Haohui
> >>>
> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
> >>>> I think the problem started from here.
> >>>>
> >>>>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> >>>>
> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> permission.
> >>>> But in this patch, ReplicationMonitor got NPE and it got terminate
> signal,
> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> >>>>
> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> permission
> >>>> after shutting down cluster. So in this case IMO, permissions were
> never
> >>>> restored.
> >>>>
> >>>>
> >>>>   @After
> >>>>   public void tearDown() throws Exception {
> >>>>     if(data_fail != null) {
> >>>>       FileUtil.setWritable(data_fail, true);
> >>>>     }
> >>>>     if(failedDir != null) {
> >>>>       FileUtil.setWritable(failedDir, true);
> >>>>     }
> >>>>     if(cluster != null) {
> >>>>       cluster.shutdown();
> >>>>     }
> >>>>     for (int i = 0; i < 3; i++) {
> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
> >>>>     }
> >>>>   }
> >>>>
> >>>>
> >>>> Regards,
> >>>> Vinay
> >>>>
> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> vinayakumarb@apache.org>
> >>>> wrote:
> >>>>
> >>>>> When I see the history of these kind of builds, All these are failed
> on
> >>>>> node H9.
> >>>>>
> >>>>> I think some or the other uncommitted patch would have created the
> problem
> >>>>> and left it there.
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Vinay
> >>>>>
> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >>>>>
> >>>>>> You could rely on a destructive git clean call instead of maven to
> do the
> >>>>>> directory removal.
> >>>>>>
> >>>>>> --
> >>>>>> Sean
> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
> wrote:
> >>>>>>
> >>>>>> > Is there a maven plugin or setting we can use to simply remove
> >>>>>> > directories that have no executable permissions on them?  Clearly
> we
> >>>>>> > have the permission to do this from a technical point of view
> (since
> >>>>>> > we created the directories as the jenkins user), it's simply that
> the
> >>>>>> > code refuses to do it.
> >>>>>> >
> >>>>>> > Otherwise I guess we can just fix those tests...
> >>>>>> >
> >>>>>> > Colin
> >>>>>> >
> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> >>>>>> > >
> >>>>>> > > In HDFS-7722:
> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>>>> > TearDown().
> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> clause.
> >>>>>> > >
> >>>>>> > > Also I ran mvn test several times on my machine and all tests
> passed.
> >>>>>> > >
> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> >>>>>> > >
> >>>>>> > > private static void checkDirAccess(File dir) throws
> >>>>>> DiskErrorException {
> >>>>>> > >   if (!dir.isDirectory()) {
> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> >>>>>> > >                                  + dir.toString());
> >>>>>> > >   }
> >>>>>> > >
> >>>>>> > >   checkAccessByFileMethods(dir);
> >>>>>> > > }
> >>>>>> > >
> >>>>>> > > One potentially safer alternative is replacing data dir with a
> regular
> >>>>>> > > file to stimulate disk failures.
> >>>>>> > >
> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> >>>>>> cnauroth@hortonworks.com>
> >>>>>> > wrote:
> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
> permissions
> >>>>>> > from
> >>>>>> > >> directories like the one Colin mentioned to simulate disk
> failures at
> >>>>>> > data
> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
> appear to
> >>>>>> be
> >>>>>> > >> doing the necessary work to restore executable permissions at
> the
> >>>>>> end of
> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>>> > changes
> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
> >>>>>> > though.  I
> >>>>>> > >> don¹t know if there are other uncommitted patches that changed
> these
> >>>>>> > test
> >>>>>> > >> suites.
> >>>>>> > >>
> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> unexpectedly died
> >>>>>> > >> after removing executable permissions but before restoring
> them.
> >>>>>> That
> >>>>>> > >> always would have been a weakness of these test suites,
> regardless of
> >>>>>> > any
> >>>>>> > >> recent changes.
> >>>>>> > >>
> >>>>>> > >> Chris Nauroth
> >>>>>> > >> Hortonworks
> >>>>>> > >> http://hortonworks.com/
> >>>>>> > >>
> >>>>>> > >>
> >>>>>> > >>
> >>>>>> > >>
> >>>>>> > >>
> >>>>>> > >>
> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
> wrote:
> >>>>>> > >>
> >>>>>> > >>>Hey Colin,
> >>>>>> > >>>
> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
> going on
> >>>>>> with
> >>>>>> > >>>these boxes. He took a look and concluded that some perms are
> being
> >>>>>> set
> >>>>>> > in
> >>>>>> > >>>those directories by our unit tests which are precluding those
> files
> >>>>>> > from
> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>>> should
> >>>>>> > >>>expect this to keep happening until we can fix the test in
> question
> >>>>>> to
> >>>>>> > >>>properly clean up after itself.
> >>>>>> > >>>
> >>>>>> > >>>To help narrow down which commit it was that started this,
> Andrew
> >>>>>> sent
> >>>>>> > me
> >>>>>> > >>>this info:
> >>>>>> > >>>
> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>>> >
> >>>>>>
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> >>>>>> > has
> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
> since
> >>>>>> 9:32
> >>>>>> > >>>UTC
> >>>>>> > >>>on March 5th."
> >>>>>> > >>>
> >>>>>> > >>>--
> >>>>>> > >>>Aaron T. Myers
> >>>>>> > >>>Software Engineer, Cloudera
> >>>>>> > >>>
> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> cmccabe@apache.org
> >>>>>> >
> >>>>>> > >>>wrote:
> >>>>>> > >>>
> >>>>>> > >>>> Hi all,
> >>>>>> > >>>>
> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
> find any
> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> them
> >>>>>> seem
> >>>>>> > >>>> to be failing with some variant of this message:
> >>>>>> > >>>>
> >>>>>> > >>>> [ERROR] Failed to execute goal
> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>>> (default-clean)
> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
> delete
> >>>>>> > >>>>
> >>>>>> > >>>>
> >>>>>> >
> >>>>>> >
> >>>>>>
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>> > >>>> -> [Help 1]
> >>>>>> > >>>>
> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>>> > >>>> permissions?
> >>>>>> > >>>>
> >>>>>> > >>>> Colin
> >>>>>> > >>>>
> >>>>>> > >>
> >>>>>> > >
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > --
> >>>>>> > > Lei (Eddy) Xu
> >>>>>> > > Software Engineer, Cloudera
> >>>>>> >
> >>>>>>
> >>>>>
> >>>>>
> >>
> >>
> >>
> >> --
> >> Lei (Eddy) Xu
> >> Software Engineer, Cloudera
>



-- 
Sean

Re: upstream jenkins build broken?

Posted by Haohui Mai <wh...@apache.org>.
+1 for git clean.

Colin, can you please get it in ASAP? Currently due to the jenkins
issues, we cannot close the 2.7 blockers.

Thanks,
Haohui



On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cm...@apache.org> wrote:
> If all it takes is someone creating a test that makes a directory
> without -x, this is going to happen over and over.
>
> Let's just fix the problem at the root by running "git clean -fqdx" in
> our jenkins scripts.  If there's no objections I will add this in and
> un-break the builds.
>
> best,
> Colin
>
> On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
>> I filed HDFS-7917 to change the way to simulate disk failures.
>>
>> But I think we still need infrastructure folks to help with jenkins
>> scripts to clean the dirs left today.
>>
>> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com> wrote:
>>> Any updates on this issues? It seems that all HDFS jenkins builds are
>>> still failing.
>>>
>>> Regards,
>>> Haohui
>>>
>>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vi...@apache.org> wrote:
>>>> I think the problem started from here.
>>>>
>>>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>>>>
>>>> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
>>>> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
>>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>>>
>>>> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
>>>> after shutting down cluster. So in this case IMO, permissions were never
>>>> restored.
>>>>
>>>>
>>>>   @After
>>>>   public void tearDown() throws Exception {
>>>>     if(data_fail != null) {
>>>>       FileUtil.setWritable(data_fail, true);
>>>>     }
>>>>     if(failedDir != null) {
>>>>       FileUtil.setWritable(failedDir, true);
>>>>     }
>>>>     if(cluster != null) {
>>>>       cluster.shutdown();
>>>>     }
>>>>     for (int i = 0; i < 3; i++) {
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>>>     }
>>>>   }
>>>>
>>>>
>>>> Regards,
>>>> Vinay
>>>>
>>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vi...@apache.org>
>>>> wrote:
>>>>
>>>>> When I see the history of these kind of builds, All these are failed on
>>>>> node H9.
>>>>>
>>>>> I think some or the other uncommitted patch would have created the problem
>>>>> and left it there.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vinay
>>>>>
>>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:
>>>>>
>>>>>> You could rely on a destructive git clean call instead of maven to do the
>>>>>> directory removal.
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>>>
>>>>>> > Is there a maven plugin or setting we can use to simply remove
>>>>>> > directories that have no executable permissions on them?  Clearly we
>>>>>> > have the permission to do this from a technical point of view (since
>>>>>> > we created the directories as the jenkins user), it's simply that the
>>>>>> > code refuses to do it.
>>>>>> >
>>>>>> > Otherwise I guess we can just fix those tests...
>>>>>> >
>>>>>> > Colin
>>>>>> >
>>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> > >
>>>>>> > > In HDFS-7722:
>>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>> > TearDown().
>>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>>> > >
>>>>>> > > Also I ran mvn test several times on my machine and all tests passed.
>>>>>> > >
>>>>>> > > However, since in DiskChecker#checkDirAccess():
>>>>>> > >
>>>>>> > > private static void checkDirAccess(File dir) throws
>>>>>> DiskErrorException {
>>>>>> > >   if (!dir.isDirectory()) {
>>>>>> > >     throw new DiskErrorException("Not a directory: "
>>>>>> > >                                  + dir.toString());
>>>>>> > >   }
>>>>>> > >
>>>>>> > >   checkAccessByFileMethods(dir);
>>>>>> > > }
>>>>>> > >
>>>>>> > > One potentially safer alternative is replacing data dir with a regular
>>>>>> > > file to stimulate disk failures.
>>>>>> > >
>>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> > wrote:
>>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>>> > from
>>>>>> > >> directories like the one Colin mentioned to simulate disk failures at
>>>>>> > data
>>>>>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>>>>>> be
>>>>>> > >> doing the necessary work to restore executable permissions at the
>>>>>> end of
>>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>>> > changes
>>>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>> > though.  I
>>>>>> > >> don¹t know if there are other uncommitted patches that changed these
>>>>>> > test
>>>>>> > >> suites.
>>>>>> > >>
>>>>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>>>> > >> after removing executable permissions but before restoring them.
>>>>>> That
>>>>>> > >> always would have been a weakness of these test suites, regardless of
>>>>>> > any
>>>>>> > >> recent changes.
>>>>>> > >>
>>>>>> > >> Chris Nauroth
>>>>>> > >> Hortonworks
>>>>>> > >> http://hortonworks.com/
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>>> > >>
>>>>>> > >>>Hey Colin,
>>>>>> > >>>
>>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>>>>> with
>>>>>> > >>>these boxes. He took a look and concluded that some perms are being
>>>>>> set
>>>>>> > in
>>>>>> > >>>those directories by our unit tests which are precluding those files
>>>>>> > from
>>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>>>>>> should
>>>>>> > >>>expect this to keep happening until we can fix the test in question
>>>>>> to
>>>>>> > >>>properly clean up after itself.
>>>>>> > >>>
>>>>>> > >>>To help narrow down which commit it was that started this, Andrew
>>>>>> sent
>>>>>> > me
>>>>>> > >>>this info:
>>>>>> > >>>
>>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >
>>>>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>>> > has
>>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>>>>>> 9:32
>>>>>> > >>>UTC
>>>>>> > >>>on March 5th."
>>>>>> > >>>
>>>>>> > >>>--
>>>>>> > >>>Aaron T. Myers
>>>>>> > >>>Software Engineer, Cloudera
>>>>>> > >>>
>>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>>>>>> >
>>>>>> > >>>wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi all,
>>>>>> > >>>>
>>>>>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>> seem
>>>>>> > >>>> to be failing with some variant of this message:
>>>>>> > >>>>
>>>>>> > >>>> [ERROR] Failed to execute goal
>>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> (default-clean)
>>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> >
>>>>>> >
>>>>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> > >>>> -> [Help 1]
>>>>>> > >>>>
>>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>> > >>>> permissions?
>>>>>> > >>>>
>>>>>> > >>>> Colin
>>>>>> > >>>>
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Lei (Eddy) Xu
>>>>>> > > Software Engineer, Cloudera
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
+1 for the git clean command.

HDFS-7917 still might be valuable for enabling us to run a few unit tests
on Windows that are currently skipped.  Let's please keep it open, but
it's less urgent.

Thanks!

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/16/15, 11:54 AM, "Colin P. McCabe" <cm...@apache.org> wrote:

>If all it takes is someone creating a test that makes a directory
>without -x, this is going to happen over and over.
>
>Let's just fix the problem at the root by running "git clean -fqdx" in
>our jenkins scripts.  If there's no objections I will add this in and
>un-break the builds.
>
>best,
>Colin
>
>On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
>> I filed HDFS-7917 to change the way to simulate disk failures.
>>
>> But I think we still need infrastructure folks to help with jenkins
>> scripts to clean the dirs left today.
>>
>> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com> wrote:
>>> Any updates on this issues? It seems that all HDFS jenkins builds are
>>> still failing.
>>>
>>> Regards,
>>> Haohui
>>>
>>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B
>>><vi...@apache.org> wrote:
>>>> I think the problem started from here.
>>>>
>>>> 
>>>>https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/juni
>>>>t/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/test
>>>>UnderReplicationAfterVolFailure/
>>>>
>>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>>>>permission.
>>>> But in this patch, ReplicationMonitor got NPE and it got terminate
>>>>signal,
>>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>>>
>>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>>>>permission
>>>> after shutting down cluster. So in this case IMO, permissions were
>>>>never
>>>> restored.
>>>>
>>>>
>>>>   @After
>>>>   public void tearDown() throws Exception {
>>>>     if(data_fail != null) {
>>>>       FileUtil.setWritable(data_fail, true);
>>>>     }
>>>>     if(failedDir != null) {
>>>>       FileUtil.setWritable(failedDir, true);
>>>>     }
>>>>     if(cluster != null) {
>>>>       cluster.shutdown();
>>>>     }
>>>>     for (int i = 0; i < 3; i++) {
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>>>     }
>>>>   }
>>>>
>>>>
>>>> Regards,
>>>> Vinay
>>>>
>>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B
>>>><vi...@apache.org>
>>>> wrote:
>>>>
>>>>> When I see the history of these kind of builds, All these are failed
>>>>>on
>>>>> node H9.
>>>>>
>>>>> I think some or the other uncommitted patch would have created the
>>>>>problem
>>>>> and left it there.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vinay
>>>>>
>>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com>
>>>>>wrote:
>>>>>
>>>>>> You could rely on a destructive git clean call instead of maven to
>>>>>>do the
>>>>>> directory removal.
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>>wrote:
>>>>>>
>>>>>> > Is there a maven plugin or setting we can use to simply remove
>>>>>> > directories that have no executable permissions on them?  Clearly
>>>>>>we
>>>>>> > have the permission to do this from a technical point of view
>>>>>>(since
>>>>>> > we created the directories as the jenkins user), it's simply that
>>>>>>the
>>>>>> > code refuses to do it.
>>>>>> >
>>>>>> > Otherwise I guess we can just fix those tests...
>>>>>> >
>>>>>> > Colin
>>>>>> >
>>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> > >
>>>>>> > > In HDFS-7722:
>>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>> > TearDown().
>>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>clause.
>>>>>> > >
>>>>>> > > Also I ran mvn test several times on my machine and all tests
>>>>>>passed.
>>>>>> > >
>>>>>> > > However, since in DiskChecker#checkDirAccess():
>>>>>> > >
>>>>>> > > private static void checkDirAccess(File dir) throws
>>>>>> DiskErrorException {
>>>>>> > >   if (!dir.isDirectory()) {
>>>>>> > >     throw new DiskErrorException("Not a directory: "
>>>>>> > >                                  + dir.toString());
>>>>>> > >   }
>>>>>> > >
>>>>>> > >   checkAccessByFileMethods(dir);
>>>>>> > > }
>>>>>> > >
>>>>>> > > One potentially safer alternative is replacing data dir with a
>>>>>>regular
>>>>>> > > file to stimulate disk failures.
>>>>>> > >
>>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> > wrote:
>>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>permissions
>>>>>> > from
>>>>>> > >> directories like the one Colin mentioned to simulate disk
>>>>>>failures at
>>>>>> > data
>>>>>> > >> nodes.  I reviewed the code for all of those, and they all
>>>>>>appear to
>>>>>> be
>>>>>> > >> doing the necessary work to restore executable permissions at
>>>>>>the
>>>>>> end of
>>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>>makes
>>>>>> > changes
>>>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>> > though.  I
>>>>>> > >> don¹t know if there are other uncommitted patches that changed
>>>>>>these
>>>>>> > test
>>>>>> > >> suites.
>>>>>> > >>
>>>>>> > >> I suppose it¹s also possible that the JUnit process
>>>>>>unexpectedly died
>>>>>> > >> after removing executable permissions but before restoring
>>>>>>them.
>>>>>> That
>>>>>> > >> always would have been a weakness of these test suites,
>>>>>>regardless of
>>>>>> > any
>>>>>> > >> recent changes.
>>>>>> > >>
>>>>>> > >> Chris Nauroth
>>>>>> > >> Hortonworks
>>>>>> > >> http://hortonworks.com/
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>>> > >>
>>>>>> > >>>Hey Colin,
>>>>>> > >>>
>>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>going on
>>>>>> with
>>>>>> > >>>these boxes. He took a look and concluded that some perms are
>>>>>>being
>>>>>> set
>>>>>> > in
>>>>>> > >>>those directories by our unit tests which are precluding those
>>>>>>files
>>>>>> > from
>>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but
>>>>>>we
>>>>>> should
>>>>>> > >>>expect this to keep happening until we can fix the test in
>>>>>>question
>>>>>> to
>>>>>> > >>>properly clean up after itself.
>>>>>> > >>>
>>>>>> > >>>To help narrow down which commit it was that started this,
>>>>>>Andrew
>>>>>> sent
>>>>>> > me
>>>>>> > >>>this info:
>>>>>> > >>>
>>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >
>>>>>> 
>>>>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/da
>>>>>>>>>ta3/
>>>>>> > has
>>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>>since
>>>>>> 9:32
>>>>>> > >>>UTC
>>>>>> > >>>on March 5th."
>>>>>> > >>>
>>>>>> > >>>--
>>>>>> > >>>Aaron T. Myers
>>>>>> > >>>Software Engineer, Cloudera
>>>>>> > >>>
>>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>><cmccabe@apache.org
>>>>>> >
>>>>>> > >>>wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi all,
>>>>>> > >>>>
>>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
>>>>>>find any
>>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>>them
>>>>>> seem
>>>>>> > >>>> to be failing with some variant of this message:
>>>>>> > >>>>
>>>>>> > >>>> [ERROR] Failed to execute goal
>>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> (default-clean)
>>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>delete
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> >
>>>>>> >
>>>>>> 
>>>>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop
>>>>>>>>>>-hdfs-pr
>>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> > >>>> -> [Help 1]
>>>>>> > >>>>
>>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>>wrong
>>>>>> > >>>> permissions?
>>>>>> > >>>>
>>>>>> > >>>> Colin
>>>>>> > >>>>
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Lei (Eddy) Xu
>>>>>> > > Software Engineer, Cloudera
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera


Re: upstream jenkins build broken?

Posted by "Colin P. McCabe" <cm...@apache.org>.
If all it takes is someone creating a test that makes a directory
without -x, this is going to happen over and over.

Let's just fix the problem at the root by running "git clean -fqdx" in
our jenkins scripts.  If there's no objections I will add this in and
un-break the builds.

best,
Colin

On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <le...@cloudera.com> wrote:
> I filed HDFS-7917 to change the way to simulate disk failures.
>
> But I think we still need infrastructure folks to help with jenkins
> scripts to clean the dirs left today.
>
> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com> wrote:
>> Any updates on this issues? It seems that all HDFS jenkins builds are
>> still failing.
>>
>> Regards,
>> Haohui
>>
>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vi...@apache.org> wrote:
>>> I think the problem started from here.
>>>
>>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>>>
>>> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
>>> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>>
>>> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
>>> after shutting down cluster. So in this case IMO, permissions were never
>>> restored.
>>>
>>>
>>>   @After
>>>   public void tearDown() throws Exception {
>>>     if(data_fail != null) {
>>>       FileUtil.setWritable(data_fail, true);
>>>     }
>>>     if(failedDir != null) {
>>>       FileUtil.setWritable(failedDir, true);
>>>     }
>>>     if(cluster != null) {
>>>       cluster.shutdown();
>>>     }
>>>     for (int i = 0; i < 3; i++) {
>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>>     }
>>>   }
>>>
>>>
>>> Regards,
>>> Vinay
>>>
>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vi...@apache.org>
>>> wrote:
>>>
>>>> When I see the history of these kind of builds, All these are failed on
>>>> node H9.
>>>>
>>>> I think some or the other uncommitted patch would have created the problem
>>>> and left it there.
>>>>
>>>>
>>>> Regards,
>>>> Vinay
>>>>
>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:
>>>>
>>>>> You could rely on a destructive git clean call instead of maven to do the
>>>>> directory removal.
>>>>>
>>>>> --
>>>>> Sean
>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>>
>>>>> > Is there a maven plugin or setting we can use to simply remove
>>>>> > directories that have no executable permissions on them?  Clearly we
>>>>> > have the permission to do this from a technical point of view (since
>>>>> > we created the directories as the jenkins user), it's simply that the
>>>>> > code refuses to do it.
>>>>> >
>>>>> > Otherwise I guess we can just fix those tests...
>>>>> >
>>>>> > Colin
>>>>> >
>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>>> > >
>>>>> > > In HDFS-7722:
>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>> > TearDown().
>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>>> > >
>>>>> > > Also I ran mvn test several times on my machine and all tests passed.
>>>>> > >
>>>>> > > However, since in DiskChecker#checkDirAccess():
>>>>> > >
>>>>> > > private static void checkDirAccess(File dir) throws
>>>>> DiskErrorException {
>>>>> > >   if (!dir.isDirectory()) {
>>>>> > >     throw new DiskErrorException("Not a directory: "
>>>>> > >                                  + dir.toString());
>>>>> > >   }
>>>>> > >
>>>>> > >   checkAccessByFileMethods(dir);
>>>>> > > }
>>>>> > >
>>>>> > > One potentially safer alternative is replacing data dir with a regular
>>>>> > > file to stimulate disk failures.
>>>>> > >
>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com>
>>>>> > wrote:
>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>> > from
>>>>> > >> directories like the one Colin mentioned to simulate disk failures at
>>>>> > data
>>>>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>>>>> be
>>>>> > >> doing the necessary work to restore executable permissions at the
>>>>> end of
>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>> > changes
>>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>>>> > though.  I
>>>>> > >> don¹t know if there are other uncommitted patches that changed these
>>>>> > test
>>>>> > >> suites.
>>>>> > >>
>>>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>>> > >> after removing executable permissions but before restoring them.
>>>>> That
>>>>> > >> always would have been a weakness of these test suites, regardless of
>>>>> > any
>>>>> > >> recent changes.
>>>>> > >>
>>>>> > >> Chris Nauroth
>>>>> > >> Hortonworks
>>>>> > >> http://hortonworks.com/
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>>> > >>
>>>>> > >>>Hey Colin,
>>>>> > >>>
>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>>>> with
>>>>> > >>>these boxes. He took a look and concluded that some perms are being
>>>>> set
>>>>> > in
>>>>> > >>>those directories by our unit tests which are precluding those files
>>>>> > from
>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>>>>> should
>>>>> > >>>expect this to keep happening until we can fix the test in question
>>>>> to
>>>>> > >>>properly clean up after itself.
>>>>> > >>>
>>>>> > >>>To help narrow down which commit it was that started this, Andrew
>>>>> sent
>>>>> > me
>>>>> > >>>this info:
>>>>> > >>>
>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>> >
>>>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>> > has
>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>>>>> 9:32
>>>>> > >>>UTC
>>>>> > >>>on March 5th."
>>>>> > >>>
>>>>> > >>>--
>>>>> > >>>Aaron T. Myers
>>>>> > >>>Software Engineer, Cloudera
>>>>> > >>>
>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>>>>> >
>>>>> > >>>wrote:
>>>>> > >>>
>>>>> > >>>> Hi all,
>>>>> > >>>>
>>>>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>> seem
>>>>> > >>>> to be failing with some variant of this message:
>>>>> > >>>>
>>>>> > >>>> [ERROR] Failed to execute goal
>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>> (default-clean)
>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>> > >>>>
>>>>> > >>>>
>>>>> >
>>>>> >
>>>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> > >>>> -> [Help 1]
>>>>> > >>>>
>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> > >>>> permissions?
>>>>> > >>>>
>>>>> > >>>> Colin
>>>>> > >>>>
>>>>> > >>
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Lei (Eddy) Xu
>>>>> > > Software Engineer, Cloudera
>>>>> >
>>>>>
>>>>
>>>>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Lei Xu <le...@cloudera.com>.
I filed HDFS-7917 to change the way to simulate disk failures.

But I think we still need infrastructure folks to help with jenkins
scripts to clean the dirs left today.

On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ri...@gmail.com> wrote:
> Any updates on this issues? It seems that all HDFS jenkins builds are
> still failing.
>
> Regards,
> Haohui
>
> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vi...@apache.org> wrote:
>> I think the problem started from here.
>>
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>>
>> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
>> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>
>> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
>> after shutting down cluster. So in this case IMO, permissions were never
>> restored.
>>
>>
>>   @After
>>   public void tearDown() throws Exception {
>>     if(data_fail != null) {
>>       FileUtil.setWritable(data_fail, true);
>>     }
>>     if(failedDir != null) {
>>       FileUtil.setWritable(failedDir, true);
>>     }
>>     if(cluster != null) {
>>       cluster.shutdown();
>>     }
>>     for (int i = 0; i < 3; i++) {
>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>     }
>>   }
>>
>>
>> Regards,
>> Vinay
>>
>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vi...@apache.org>
>> wrote:
>>
>>> When I see the history of these kind of builds, All these are failed on
>>> node H9.
>>>
>>> I think some or the other uncommitted patch would have created the problem
>>> and left it there.
>>>
>>>
>>> Regards,
>>> Vinay
>>>
>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>>> You could rely on a destructive git clean call instead of maven to do the
>>>> directory removal.
>>>>
>>>> --
>>>> Sean
>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>>
>>>> > Is there a maven plugin or setting we can use to simply remove
>>>> > directories that have no executable permissions on them?  Clearly we
>>>> > have the permission to do this from a technical point of view (since
>>>> > we created the directories as the jenkins user), it's simply that the
>>>> > code refuses to do it.
>>>> >
>>>> > Otherwise I guess we can just fix those tests...
>>>> >
>>>> > Colin
>>>> >
>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>> > >
>>>> > > In HDFS-7722:
>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>> > TearDown().
>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>> > >
>>>> > > Also I ran mvn test several times on my machine and all tests passed.
>>>> > >
>>>> > > However, since in DiskChecker#checkDirAccess():
>>>> > >
>>>> > > private static void checkDirAccess(File dir) throws
>>>> DiskErrorException {
>>>> > >   if (!dir.isDirectory()) {
>>>> > >     throw new DiskErrorException("Not a directory: "
>>>> > >                                  + dir.toString());
>>>> > >   }
>>>> > >
>>>> > >   checkAccessByFileMethods(dir);
>>>> > > }
>>>> > >
>>>> > > One potentially safer alternative is replacing data dir with a regular
>>>> > > file to stimulate disk failures.
>>>> > >
>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>> cnauroth@hortonworks.com>
>>>> > wrote:
>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>> > from
>>>> > >> directories like the one Colin mentioned to simulate disk failures at
>>>> > data
>>>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>>>> be
>>>> > >> doing the necessary work to restore executable permissions at the
>>>> end of
>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>> > changes
>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>>> > though.  I
>>>> > >> don¹t know if there are other uncommitted patches that changed these
>>>> > test
>>>> > >> suites.
>>>> > >>
>>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> > >> after removing executable permissions but before restoring them.
>>>> That
>>>> > >> always would have been a weakness of these test suites, regardless of
>>>> > any
>>>> > >> recent changes.
>>>> > >>
>>>> > >> Chris Nauroth
>>>> > >> Hortonworks
>>>> > >> http://hortonworks.com/
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>> > >>
>>>> > >>>Hey Colin,
>>>> > >>>
>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>>> with
>>>> > >>>these boxes. He took a look and concluded that some perms are being
>>>> set
>>>> > in
>>>> > >>>those directories by our unit tests which are precluding those files
>>>> > from
>>>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>>>> should
>>>> > >>>expect this to keep happening until we can fix the test in question
>>>> to
>>>> > >>>properly clean up after itself.
>>>> > >>>
>>>> > >>>To help narrow down which commit it was that started this, Andrew
>>>> sent
>>>> > me
>>>> > >>>this info:
>>>> > >>>
>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>> >
>>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>> > has
>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>>>> 9:32
>>>> > >>>UTC
>>>> > >>>on March 5th."
>>>> > >>>
>>>> > >>>--
>>>> > >>>Aaron T. Myers
>>>> > >>>Software Engineer, Cloudera
>>>> > >>>
>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>>>> >
>>>> > >>>wrote:
>>>> > >>>
>>>> > >>>> Hi all,
>>>> > >>>>
>>>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>> seem
>>>> > >>>> to be failing with some variant of this message:
>>>> > >>>>
>>>> > >>>> [ERROR] Failed to execute goal
>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>> (default-clean)
>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>> > >>>>
>>>> > >>>>
>>>> >
>>>> >
>>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> > >>>> -> [Help 1]
>>>> > >>>>
>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> > >>>> permissions?
>>>> > >>>>
>>>> > >>>> Colin
>>>> > >>>>
>>>> > >>
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Lei (Eddy) Xu
>>>> > > Software Engineer, Cloudera
>>>> >
>>>>
>>>
>>>



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Mai Haohui <ri...@gmail.com>.
Any updates on this issues? It seems that all HDFS jenkins builds are
still failing.

Regards,
Haohui

On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vi...@apache.org> wrote:
> I think the problem started from here.
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>
> As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
> But in this patch, ReplicationMonitor got NPE and it got terminate signal,
> due to which MiniDFSCluster.shutdown() throwing Exception.
>
> But, TestDataNodeVolumeFailure#teardown() is restoring those permission
> after shutting down cluster. So in this case IMO, permissions were never
> restored.
>
>
>   @After
>   public void tearDown() throws Exception {
>     if(data_fail != null) {
>       FileUtil.setWritable(data_fail, true);
>     }
>     if(failedDir != null) {
>       FileUtil.setWritable(failedDir, true);
>     }
>     if(cluster != null) {
>       cluster.shutdown();
>     }
>     for (int i = 0; i < 3; i++) {
>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>     }
>   }
>
>
> Regards,
> Vinay
>
> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vi...@apache.org>
> wrote:
>
>> When I see the history of these kind of builds, All these are failed on
>> node H9.
>>
>> I think some or the other uncommitted patch would have created the problem
>> and left it there.
>>
>>
>> Regards,
>> Vinay
>>
>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> You could rely on a destructive git clean call instead of maven to do the
>>> directory removal.
>>>
>>> --
>>> Sean
>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>>
>>> > Is there a maven plugin or setting we can use to simply remove
>>> > directories that have no executable permissions on them?  Clearly we
>>> > have the permission to do this from a technical point of view (since
>>> > we created the directories as the jenkins user), it's simply that the
>>> > code refuses to do it.
>>> >
>>> > Otherwise I guess we can just fix those tests...
>>> >
>>> > Colin
>>> >
>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>> > >
>>> > > In HDFS-7722:
>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>> > TearDown().
>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>> > >
>>> > > Also I ran mvn test several times on my machine and all tests passed.
>>> > >
>>> > > However, since in DiskChecker#checkDirAccess():
>>> > >
>>> > > private static void checkDirAccess(File dir) throws
>>> DiskErrorException {
>>> > >   if (!dir.isDirectory()) {
>>> > >     throw new DiskErrorException("Not a directory: "
>>> > >                                  + dir.toString());
>>> > >   }
>>> > >
>>> > >   checkAccessByFileMethods(dir);
>>> > > }
>>> > >
>>> > > One potentially safer alternative is replacing data dir with a regular
>>> > > file to stimulate disk failures.
>>> > >
>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>> cnauroth@hortonworks.com>
>>> > wrote:
>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> > >> TestDataNodeVolumeFailureReporting, and
>>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>>> > from
>>> > >> directories like the one Colin mentioned to simulate disk failures at
>>> > data
>>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>>> be
>>> > >> doing the necessary work to restore executable permissions at the
>>> end of
>>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>>> > changes
>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>> > though.  I
>>> > >> don¹t know if there are other uncommitted patches that changed these
>>> > test
>>> > >> suites.
>>> > >>
>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>>> > >> after removing executable permissions but before restoring them.
>>> That
>>> > >> always would have been a weakness of these test suites, regardless of
>>> > any
>>> > >> recent changes.
>>> > >>
>>> > >> Chris Nauroth
>>> > >> Hortonworks
>>> > >> http://hortonworks.com/
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>> > >>
>>> > >>>Hey Colin,
>>> > >>>
>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>> with
>>> > >>>these boxes. He took a look and concluded that some perms are being
>>> set
>>> > in
>>> > >>>those directories by our unit tests which are precluding those files
>>> > from
>>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>>> should
>>> > >>>expect this to keep happening until we can fix the test in question
>>> to
>>> > >>>properly clean up after itself.
>>> > >>>
>>> > >>>To help narrow down which commit it was that started this, Andrew
>>> sent
>>> > me
>>> > >>>this info:
>>> > >>>
>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> >
>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>> > has
>>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>>> 9:32
>>> > >>>UTC
>>> > >>>on March 5th."
>>> > >>>
>>> > >>>--
>>> > >>>Aaron T. Myers
>>> > >>>Software Engineer, Cloudera
>>> > >>>
>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>>> >
>>> > >>>wrote:
>>> > >>>
>>> > >>>> Hi all,
>>> > >>>>
>>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>> seem
>>> > >>>> to be failing with some variant of this message:
>>> > >>>>
>>> > >>>> [ERROR] Failed to execute goal
>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>> (default-clean)
>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>> > >>>>
>>> > >>>>
>>> >
>>> >
>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>> > >>>> -> [Help 1]
>>> > >>>>
>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>> > >>>> permissions?
>>> > >>>>
>>> > >>>> Colin
>>> > >>>>
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Lei (Eddy) Xu
>>> > > Software Engineer, Cloudera
>>> >
>>>
>>
>>

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
I think the problem started from here.

https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/

As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
But in this patch, ReplicationMonitor got NPE and it got terminate signal,
due to which MiniDFSCluster.shutdown() throwing Exception.

But, TestDataNodeVolumeFailure#teardown() is restoring those permission
after shutting down cluster. So in this case IMO, permissions were never
restored.


  @After
  public void tearDown() throws Exception {
    if(data_fail != null) {
      FileUtil.setWritable(data_fail, true);
    }
    if(failedDir != null) {
      FileUtil.setWritable(failedDir, true);
    }
    if(cluster != null) {
      cluster.shutdown();
    }
    for (int i = 0; i < 3; i++) {
      FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
      FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
    }
  }


Regards,
Vinay

On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vi...@apache.org>
wrote:

> When I see the history of these kind of builds, All these are failed on
> node H9.
>
> I think some or the other uncommitted patch would have created the problem
> and left it there.
>
>
> Regards,
> Vinay
>
> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> You could rely on a destructive git clean call instead of maven to do the
>> directory removal.
>>
>> --
>> Sean
>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>>
>> > Is there a maven plugin or setting we can use to simply remove
>> > directories that have no executable permissions on them?  Clearly we
>> > have the permission to do this from a technical point of view (since
>> > we created the directories as the jenkins user), it's simply that the
>> > code refuses to do it.
>> >
>> > Otherwise I guess we can just fix those tests...
>> >
>> > Colin
>> >
>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> > >
>> > > In HDFS-7722:
>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> > TearDown().
>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> > >
>> > > Also I ran mvn test several times on my machine and all tests passed.
>> > >
>> > > However, since in DiskChecker#checkDirAccess():
>> > >
>> > > private static void checkDirAccess(File dir) throws
>> DiskErrorException {
>> > >   if (!dir.isDirectory()) {
>> > >     throw new DiskErrorException("Not a directory: "
>> > >                                  + dir.toString());
>> > >   }
>> > >
>> > >   checkAccessByFileMethods(dir);
>> > > }
>> > >
>> > > One potentially safer alternative is replacing data dir with a regular
>> > > file to stimulate disk failures.
>> > >
>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> cnauroth@hortonworks.com>
>> > wrote:
>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> > >> TestDataNodeVolumeFailureReporting, and
>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>> > from
>> > >> directories like the one Colin mentioned to simulate disk failures at
>> > data
>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>> be
>> > >> doing the necessary work to restore executable permissions at the
>> end of
>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>> > changes
>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>> > though.  I
>> > >> don¹t know if there are other uncommitted patches that changed these
>> > test
>> > >> suites.
>> > >>
>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>> > >> after removing executable permissions but before restoring them.
>> That
>> > >> always would have been a weakness of these test suites, regardless of
>> > any
>> > >> recent changes.
>> > >>
>> > >> Chris Nauroth
>> > >> Hortonworks
>> > >> http://hortonworks.com/
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>> > >>
>> > >>>Hey Colin,
>> > >>>
>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> with
>> > >>>these boxes. He took a look and concluded that some perms are being
>> set
>> > in
>> > >>>those directories by our unit tests which are precluding those files
>> > from
>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>> should
>> > >>>expect this to keep happening until we can fix the test in question
>> to
>> > >>>properly clean up after itself.
>> > >>>
>> > >>>To help narrow down which commit it was that started this, Andrew
>> sent
>> > me
>> > >>>this info:
>> > >>>
>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> > has
>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>> 9:32
>> > >>>UTC
>> > >>>on March 5th."
>> > >>>
>> > >>>--
>> > >>>Aaron T. Myers
>> > >>>Software Engineer, Cloudera
>> > >>>
>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org
>> >
>> > >>>wrote:
>> > >>>
>> > >>>> Hi all,
>> > >>>>
>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> seem
>> > >>>> to be failing with some variant of this message:
>> > >>>>
>> > >>>> [ERROR] Failed to execute goal
>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> (default-clean)
>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> > >>>>
>> > >>>>
>> >
>> >
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> > >>>> -> [Help 1]
>> > >>>>
>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> > >>>> permissions?
>> > >>>>
>> > >>>> Colin
>> > >>>>
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Lei (Eddy) Xu
>> > > Software Engineer, Cloudera
>> >
>>
>
>

Re: upstream jenkins build broken?

Posted by Vinayakumar B <vi...@apache.org>.
When I see the history of these kind of builds, All these are failed on
node H9.

I think some or the other uncommitted patch would have created the problem
and left it there.


Regards,
Vinay

On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bu...@cloudera.com> wrote:

> You could rely on a destructive git clean call instead of maven to do the
> directory removal.
>
> --
> Sean
> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:
>
> > Is there a maven plugin or setting we can use to simply remove
> > directories that have no executable permissions on them?  Clearly we
> > have the permission to do this from a technical point of view (since
> > we created the directories as the jenkins user), it's simply that the
> > code refuses to do it.
> >
> > Otherwise I guess we can just fix those tests...
> >
> > Colin
> >
> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> > > Thanks a lot for looking into HDFS-7722, Chris.
> > >
> > > In HDFS-7722:
> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> > TearDown().
> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> > >
> > > Also I ran mvn test several times on my machine and all tests passed.
> > >
> > > However, since in DiskChecker#checkDirAccess():
> > >
> > > private static void checkDirAccess(File dir) throws DiskErrorException
> {
> > >   if (!dir.isDirectory()) {
> > >     throw new DiskErrorException("Not a directory: "
> > >                                  + dir.toString());
> > >   }
> > >
> > >   checkAccessByFileMethods(dir);
> > > }
> > >
> > > One potentially safer alternative is replacing data dir with a regular
> > > file to stimulate disk failures.
> > >
> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> cnauroth@hortonworks.com>
> > wrote:
> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> > >> TestDataNodeVolumeFailureReporting, and
> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
> > from
> > >> directories like the one Colin mentioned to simulate disk failures at
> > data
> > >> nodes.  I reviewed the code for all of those, and they all appear to
> be
> > >> doing the necessary work to restore executable permissions at the end
> of
> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
> > changes
> > >> in these test suites is HDFS-7722.  That patch still looks fine
> > though.  I
> > >> don¹t know if there are other uncommitted patches that changed these
> > test
> > >> suites.
> > >>
> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
> > >> after removing executable permissions but before restoring them.  That
> > >> always would have been a weakness of these test suites, regardless of
> > any
> > >> recent changes.
> > >>
> > >> Chris Nauroth
> > >> Hortonworks
> > >> http://hortonworks.com/
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> > >>
> > >>>Hey Colin,
> > >>>
> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
> with
> > >>>these boxes. He took a look and concluded that some perms are being
> set
> > in
> > >>>those directories by our unit tests which are precluding those files
> > from
> > >>>getting deleted. He's going to clean up the boxes for us, but we
> should
> > >>>expect this to keep happening until we can fix the test in question to
> > >>>properly clean up after itself.
> > >>>
> > >>>To help narrow down which commit it was that started this, Andrew sent
> > me
> > >>>this info:
> > >>>
> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> > >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> > has
> > >>>500 perms, so I'm guessing that's the problem. Been that way since
> 9:32
> > >>>UTC
> > >>>on March 5th."
> > >>>
> > >>>--
> > >>>Aaron T. Myers
> > >>>Software Engineer, Cloudera
> > >>>
> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> > >>>wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> A very quick (and not thorough) survey shows that I can't find any
> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
> seem
> > >>>> to be failing with some variant of this message:
> > >>>>
> > >>>> [ERROR] Failed to execute goal
> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> (default-clean)
> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> > >>>>
> > >>>>
> >
> >
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> > >>>> -> [Help 1]
> > >>>>
> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> > >>>> permissions?
> > >>>>
> > >>>> Colin
> > >>>>
> > >>
> > >
> > >
> > >
> > > --
> > > Lei (Eddy) Xu
> > > Software Engineer, Cloudera
> >
>

Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
You could rely on a destructive git clean call instead of maven to do the
directory removal.

-- 
Sean
On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:

> Is there a maven plugin or setting we can use to simply remove
> directories that have no executable permissions on them?  Clearly we
> have the permission to do this from a technical point of view (since
> we created the directories as the jenkins user), it's simply that the
> code refuses to do it.
>
> Otherwise I guess we can just fix those tests...
>
> Colin
>
> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> > Thanks a lot for looking into HDFS-7722, Chris.
> >
> > In HDFS-7722:
> > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> TearDown().
> > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >
> > Also I ran mvn test several times on my machine and all tests passed.
> >
> > However, since in DiskChecker#checkDirAccess():
> >
> > private static void checkDirAccess(File dir) throws DiskErrorException {
> >   if (!dir.isDirectory()) {
> >     throw new DiskErrorException("Not a directory: "
> >                                  + dir.toString());
> >   }
> >
> >   checkAccessByFileMethods(dir);
> > }
> >
> > One potentially safer alternative is replacing data dir with a regular
> > file to stimulate disk failures.
> >
> > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
> >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> TestDataNodeVolumeFailureReporting, and
> >> TestDataNodeVolumeFailureToleration all remove executable permissions
> from
> >> directories like the one Colin mentioned to simulate disk failures at
> data
> >> nodes.  I reviewed the code for all of those, and they all appear to be
> >> doing the necessary work to restore executable permissions at the end of
> >> the test.  The only recent uncommitted patch I¹ve seen that makes
> changes
> >> in these test suites is HDFS-7722.  That patch still looks fine
> though.  I
> >> don¹t know if there are other uncommitted patches that changed these
> test
> >> suites.
> >>
> >> I suppose it¹s also possible that the JUnit process unexpectedly died
> >> after removing executable permissions but before restoring them.  That
> >> always would have been a weakness of these test suites, regardless of
> any
> >> recent changes.
> >>
> >> Chris Nauroth
> >> Hortonworks
> >> http://hortonworks.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>
> >>>Hey Colin,
> >>>
> >>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>these boxes. He took a look and concluded that some perms are being set
> in
> >>>those directories by our unit tests which are precluding those files
> from
> >>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>expect this to keep happening until we can fix the test in question to
> >>>properly clean up after itself.
> >>>
> >>>To help narrow down which commit it was that started this, Andrew sent
> me
> >>>this info:
> >>>
> >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> has
> >>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>UTC
> >>>on March 5th."
> >>>
> >>>--
> >>>Aaron T. Myers
> >>>Software Engineer, Cloudera
> >>>
> >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> A very quick (and not thorough) survey shows that I can't find any
> >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
> >>>> to be failing with some variant of this message:
> >>>>
> >>>> [ERROR] Failed to execute goal
> >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>
> >>>>
>
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> -> [Help 1]
> >>>>
> >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>> permissions?
> >>>>
> >>>> Colin
> >>>>
> >>
> >
> >
> >
> > --
> > Lei (Eddy) Xu
> > Software Engineer, Cloudera
>

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
The only thing I'm aware of is the failOnError option:

http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
.html


I prefer that we don't disable this, because ignoring different kinds of
failures could leave our build directories in an indeterminate state.  For
example, we could end up with an old class file on the classpath for test
runs that was supposedly deleted.

I think it's worth exploring Eddy's suggestion to try simulating failure
by placing a file where the code expects to see a directory.  That might
even let us enable some of these tests that are skipped on Windows,
because Windows allows access for the owner even after permissions have
been stripped.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:

>Is there a maven plugin or setting we can use to simply remove
>directories that have no executable permissions on them?  Clearly we
>have the permission to do this from a technical point of view (since
>we created the directories as the jenkins user), it's simply that the
>code refuses to do it.
>
>Otherwise I guess we can just fix those tests...
>
>Colin
>
>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
>> Thanks a lot for looking into HDFS-7722, Chris.
>>
>> In HDFS-7722:
>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>TearDown().
>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>
>> Also I ran mvn test several times on my machine and all tests passed.
>>
>> However, since in DiskChecker#checkDirAccess():
>>
>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>   if (!dir.isDirectory()) {
>>     throw new DiskErrorException("Not a directory: "
>>                                  + dir.toString());
>>   }
>>
>>   checkAccessByFileMethods(dir);
>> }
>>
>> One potentially safer alternative is replacing data dir with a regular
>> file to stimulate disk failures.
>>
>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>><cn...@hortonworks.com> wrote:
>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> TestDataNodeVolumeFailureReporting, and
>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>from
>>> directories like the one Colin mentioned to simulate disk failures at
>>>data
>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>> doing the necessary work to restore executable permissions at the end
>>>of
>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>changes
>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>though.  I
>>> don¹t know if there are other uncommitted patches that changed these
>>>test
>>> suites.
>>>
>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>> after removing executable permissions but before restoring them.  That
>>> always would have been a weakness of these test suites, regardless of
>>>any
>>> recent changes.
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>>
>>>>Hey Colin,
>>>>
>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>these boxes. He took a look and concluded that some perms are being
>>>>set in
>>>>those directories by our unit tests which are precluding those files
>>>>from
>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>expect this to keep happening until we can fix the test in question to
>>>>properly clean up after itself.
>>>>
>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>me
>>>>this info:
>>>>
>>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>has
>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>UTC
>>>>on March 5th."
>>>>
>>>>--
>>>>Aaron T. Myers
>>>>Software Engineer, Cloudera
>>>>
>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>seem
>>>>> to be failing with some variant of this message:
>>>>>
>>>>> [ERROR] Failed to execute goal
>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>
>>>>>
>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
>>>>>-pr
>>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>> -> [Help 1]
>>>>>
>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> permissions?
>>>>>
>>>>> Colin
>>>>>
>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera


Re: upstream jenkins build broken?

Posted by Sean Busbey <bu...@cloudera.com>.
You could rely on a destructive git clean call instead of maven to do the
directory removal.

-- 
Sean
On Mar 11, 2015 4:11 PM, "Colin McCabe" <cm...@alumni.cmu.edu> wrote:

> Is there a maven plugin or setting we can use to simply remove
> directories that have no executable permissions on them?  Clearly we
> have the permission to do this from a technical point of view (since
> we created the directories as the jenkins user), it's simply that the
> code refuses to do it.
>
> Otherwise I guess we can just fix those tests...
>
> Colin
>
> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> > Thanks a lot for looking into HDFS-7722, Chris.
> >
> > In HDFS-7722:
> > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> TearDown().
> > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >
> > Also I ran mvn test several times on my machine and all tests passed.
> >
> > However, since in DiskChecker#checkDirAccess():
> >
> > private static void checkDirAccess(File dir) throws DiskErrorException {
> >   if (!dir.isDirectory()) {
> >     throw new DiskErrorException("Not a directory: "
> >                                  + dir.toString());
> >   }
> >
> >   checkAccessByFileMethods(dir);
> > }
> >
> > One potentially safer alternative is replacing data dir with a regular
> > file to stimulate disk failures.
> >
> > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
> >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> TestDataNodeVolumeFailureReporting, and
> >> TestDataNodeVolumeFailureToleration all remove executable permissions
> from
> >> directories like the one Colin mentioned to simulate disk failures at
> data
> >> nodes.  I reviewed the code for all of those, and they all appear to be
> >> doing the necessary work to restore executable permissions at the end of
> >> the test.  The only recent uncommitted patch I¹ve seen that makes
> changes
> >> in these test suites is HDFS-7722.  That patch still looks fine
> though.  I
> >> don¹t know if there are other uncommitted patches that changed these
> test
> >> suites.
> >>
> >> I suppose it¹s also possible that the JUnit process unexpectedly died
> >> after removing executable permissions but before restoring them.  That
> >> always would have been a weakness of these test suites, regardless of
> any
> >> recent changes.
> >>
> >> Chris Nauroth
> >> Hortonworks
> >> http://hortonworks.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
> >>
> >>>Hey Colin,
> >>>
> >>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
> >>>these boxes. He took a look and concluded that some perms are being set
> in
> >>>those directories by our unit tests which are precluding those files
> from
> >>>getting deleted. He's going to clean up the boxes for us, but we should
> >>>expect this to keep happening until we can fix the test in question to
> >>>properly clean up after itself.
> >>>
> >>>To help narrow down which commit it was that started this, Andrew sent
> me
> >>>this info:
> >>>
> >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> has
> >>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
> >>>UTC
> >>>on March 5th."
> >>>
> >>>--
> >>>Aaron T. Myers
> >>>Software Engineer, Cloudera
> >>>
> >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
> >>>wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> A very quick (and not thorough) survey shows that I can't find any
> >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
> >>>> to be failing with some variant of this message:
> >>>>
> >>>> [ERROR] Failed to execute goal
> >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
> >>>>
> >>>>
>
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> -> [Help 1]
> >>>>
> >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
> >>>> permissions?
> >>>>
> >>>> Colin
> >>>>
> >>
> >
> >
> >
> > --
> > Lei (Eddy) Xu
> > Software Engineer, Cloudera
>

Re: upstream jenkins build broken?

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Is there a maven plugin or setting we can use to simply remove
directories that have no executable permissions on them?  Clearly we
have the permission to do this from a technical point of view (since
we created the directories as the jenkins user), it's simply that the
code refuses to do it.

Otherwise I guess we can just fix those tests...

Colin

On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> Thanks a lot for looking into HDFS-7722, Chris.
>
> In HDFS-7722:
> TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>
> Also I ran mvn test several times on my machine and all tests passed.
>
> However, since in DiskChecker#checkDirAccess():
>
> private static void checkDirAccess(File dir) throws DiskErrorException {
>   if (!dir.isDirectory()) {
>     throw new DiskErrorException("Not a directory: "
>                                  + dir.toString());
>   }
>
>   checkAccessByFileMethods(dir);
> }
>
> One potentially safer alternative is replacing data dir with a regular
> file to stimulate disk failures.
>
> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> TestDataNodeVolumeFailureReporting, and
>> TestDataNodeVolumeFailureToleration all remove executable permissions from
>> directories like the one Colin mentioned to simulate disk failures at data
>> nodes.  I reviewed the code for all of those, and they all appear to be
>> doing the necessary work to restore executable permissions at the end of
>> the test.  The only recent uncommitted patch I¹ve seen that makes changes
>> in these test suites is HDFS-7722.  That patch still looks fine though.  I
>> don¹t know if there are other uncommitted patches that changed these test
>> suites.
>>
>> I suppose it¹s also possible that the JUnit process unexpectedly died
>> after removing executable permissions but before restoring them.  That
>> always would have been a weakness of these test suites, regardless of any
>> recent changes.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>
>>>Hey Colin,
>>>
>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>these boxes. He took a look and concluded that some perms are being set in
>>>those directories by our unit tests which are precluding those files from
>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>expect this to keep happening until we can fix the test in question to
>>>properly clean up after itself.
>>>
>>>To help narrow down which commit it was that started this, Andrew sent me
>>>this info:
>>>
>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>UTC
>>>on March 5th."
>>>
>>>--
>>>Aaron T. Myers
>>>Software Engineer, Cloudera
>>>
>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>wrote:
>>>
>>>> Hi all,
>>>>
>>>> A very quick (and not thorough) survey shows that I can't find any
>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>>>> to be failing with some variant of this message:
>>>>
>>>> [ERROR] Failed to execute goal
>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>
>>>>
>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> -> [Help 1]
>>>>
>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> permissions?
>>>>
>>>> Colin
>>>>
>>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Is there a maven plugin or setting we can use to simply remove
directories that have no executable permissions on them?  Clearly we
have the permission to do this from a technical point of view (since
we created the directories as the jenkins user), it's simply that the
code refuses to do it.

Otherwise I guess we can just fix those tests...

Colin

On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com> wrote:
> Thanks a lot for looking into HDFS-7722, Chris.
>
> In HDFS-7722:
> TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>
> Also I ran mvn test several times on my machine and all tests passed.
>
> However, since in DiskChecker#checkDirAccess():
>
> private static void checkDirAccess(File dir) throws DiskErrorException {
>   if (!dir.isDirectory()) {
>     throw new DiskErrorException("Not a directory: "
>                                  + dir.toString());
>   }
>
>   checkAccessByFileMethods(dir);
> }
>
> One potentially safer alternative is replacing data dir with a regular
> file to stimulate disk failures.
>
> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> TestDataNodeVolumeFailureReporting, and
>> TestDataNodeVolumeFailureToleration all remove executable permissions from
>> directories like the one Colin mentioned to simulate disk failures at data
>> nodes.  I reviewed the code for all of those, and they all appear to be
>> doing the necessary work to restore executable permissions at the end of
>> the test.  The only recent uncommitted patch I¹ve seen that makes changes
>> in these test suites is HDFS-7722.  That patch still looks fine though.  I
>> don¹t know if there are other uncommitted patches that changed these test
>> suites.
>>
>> I suppose it¹s also possible that the JUnit process unexpectedly died
>> after removing executable permissions but before restoring them.  That
>> always would have been a weakness of these test suites, regardless of any
>> recent changes.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>>
>>>Hey Colin,
>>>
>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>these boxes. He took a look and concluded that some perms are being set in
>>>those directories by our unit tests which are precluding those files from
>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>expect this to keep happening until we can fix the test in question to
>>>properly clean up after itself.
>>>
>>>To help narrow down which commit it was that started this, Andrew sent me
>>>this info:
>>>
>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>UTC
>>>on March 5th."
>>>
>>>--
>>>Aaron T. Myers
>>>Software Engineer, Cloudera
>>>
>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>>wrote:
>>>
>>>> Hi all,
>>>>
>>>> A very quick (and not thorough) survey shows that I can't find any
>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>>>> to be failing with some variant of this message:
>>>>
>>>> [ERROR] Failed to execute goal
>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>
>>>>
>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> -> [Help 1]
>>>>
>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> permissions?
>>>>
>>>> Colin
>>>>
>>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Lei Xu <le...@cloudera.com>.
Thanks a lot for looking into HDFS-7722, Chris.

In HDFS-7722:
TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
TestDataNodeHotSwapVolumes reset permissions in a finally clause.

Also I ran mvn test several times on my machine and all tests passed.

However, since in DiskChecker#checkDirAccess():

private static void checkDirAccess(File dir) throws DiskErrorException {
  if (!dir.isDirectory()) {
    throw new DiskErrorException("Not a directory: "
                                 + dir.toString());
  }

  checkAccessByFileMethods(dir);
}

One potentially safer alternative is replacing data dir with a regular
file to stimulate disk failures.

On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> TestDataNodeVolumeFailureReporting, and
> TestDataNodeVolumeFailureToleration all remove executable permissions from
> directories like the one Colin mentioned to simulate disk failures at data
> nodes.  I reviewed the code for all of those, and they all appear to be
> doing the necessary work to restore executable permissions at the end of
> the test.  The only recent uncommitted patch I¹ve seen that makes changes
> in these test suites is HDFS-7722.  That patch still looks fine though.  I
> don¹t know if there are other uncommitted patches that changed these test
> suites.
>
> I suppose it¹s also possible that the JUnit process unexpectedly died
> after removing executable permissions but before restoring them.  That
> always would have been a weakness of these test suites, regardless of any
> recent changes.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:
>
>>Hey Colin,
>>
>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>these boxes. He took a look and concluded that some perms are being set in
>>those directories by our unit tests which are precluding those files from
>>getting deleted. He's going to clean up the boxes for us, but we should
>>expect this to keep happening until we can fix the test in question to
>>properly clean up after itself.
>>
>>To help narrow down which commit it was that started this, Andrew sent me
>>this info:
>>
>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>UTC
>>on March 5th."
>>
>>--
>>Aaron T. Myers
>>Software Engineer, Cloudera
>>
>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>>wrote:
>>
>>> Hi all,
>>>
>>> A very quick (and not thorough) survey shows that I can't find any
>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>>> to be failing with some variant of this message:
>>>
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>
>>>
>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>> -> [Help 1]
>>>
>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>> permissions?
>>>
>>> Colin
>>>
>



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
TestDataNodeVolumeFailureReporting, and
TestDataNodeVolumeFailureToleration all remove executable permissions from
directories like the one Colin mentioned to simulate disk failures at data
nodes.  I reviewed the code for all of those, and they all appear to be
doing the necessary work to restore executable permissions at the end of
the test.  The only recent uncommitted patch I¹ve seen that makes changes
in these test suites is HDFS-7722.  That patch still looks fine though.  I
don¹t know if there are other uncommitted patches that changed these test
suites.

I suppose it¹s also possible that the JUnit process unexpectedly died
after removing executable permissions but before restoring them.  That
always would have been a weakness of these test suites, regardless of any
recent changes.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:

>Hey Colin,
>
>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>these boxes. He took a look and concluded that some perms are being set in
>those directories by our unit tests which are precluding those files from
>getting deleted. He's going to clean up the boxes for us, but we should
>expect this to keep happening until we can fix the test in question to
>properly clean up after itself.
>
>To help narrow down which commit it was that started this, Andrew sent me
>this info:
>
>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>UTC
>on March 5th."
>
>--
>Aaron T. Myers
>Software Engineer, Cloudera
>
>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>wrote:
>
>> Hi all,
>>
>> A very quick (and not thorough) survey shows that I can't find any
>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>> to be failing with some variant of this message:
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>
>> 
>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> -> [Help 1]
>>
>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> permissions?
>>
>> Colin
>>


Re: upstream jenkins build broken?

Posted by Chris Nauroth <cn...@hortonworks.com>.
TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
TestDataNodeVolumeFailureReporting, and
TestDataNodeVolumeFailureToleration all remove executable permissions from
directories like the one Colin mentioned to simulate disk failures at data
nodes.  I reviewed the code for all of those, and they all appear to be
doing the necessary work to restore executable permissions at the end of
the test.  The only recent uncommitted patch I¹ve seen that makes changes
in these test suites is HDFS-7722.  That patch still looks fine though.  I
don¹t know if there are other uncommitted patches that changed these test
suites.

I suppose it¹s also possible that the JUnit process unexpectedly died
after removing executable permissions but before restoring them.  That
always would have been a weakness of these test suites, regardless of any
recent changes.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com> wrote:

>Hey Colin,
>
>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>these boxes. He took a look and concluded that some perms are being set in
>those directories by our unit tests which are precluding those files from
>getting deleted. He's going to clean up the boxes for us, but we should
>expect this to keep happening until we can fix the test in question to
>properly clean up after itself.
>
>To help narrow down which commit it was that started this, Andrew sent me
>this info:
>
>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>UTC
>on March 5th."
>
>--
>Aaron T. Myers
>Software Engineer, Cloudera
>
>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org>
>wrote:
>
>> Hi all,
>>
>> A very quick (and not thorough) survey shows that I can't find any
>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>> to be failing with some variant of this message:
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>
>> 
>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> -> [Help 1]
>>
>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> permissions?
>>
>> Colin
>>


Re: upstream jenkins build broken?

Posted by "Aaron T. Myers" <at...@cloudera.com>.
Hey Colin,

I asked Andrew Bayer, who works with Apache Infra, what's going on with
these boxes. He took a look and concluded that some perms are being set in
those directories by our unit tests which are precluding those files from
getting deleted. He's going to clean up the boxes for us, but we should
expect this to keep happening until we can fix the test in question to
properly clean up after itself.

To help narrow down which commit it was that started this, Andrew sent me
this info:

"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
500 perms, so I'm guessing that's the problem. Been that way since 9:32 UTC
on March 5th."

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Hi all,
>
> A very quick (and not thorough) survey shows that I can't find any
> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
> to be failing with some variant of this message:
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> on project hadoop-hdfs: Failed to clean project: Failed to delete
>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> -> [Help 1]
>
> Any ideas how this happened?  Bad disk, unit test setting wrong
> permissions?
>
> Colin
>

Re: upstream jenkins build broken?

Posted by "Aaron T. Myers" <at...@cloudera.com>.
Hey Colin,

I asked Andrew Bayer, who works with Apache Infra, what's going on with
these boxes. He took a look and concluded that some perms are being set in
those directories by our unit tests which are precluding those files from
getting deleted. He's going to clean up the boxes for us, but we should
expect this to keep happening until we can fix the test in question to
properly clean up after itself.

To help narrow down which commit it was that started this, Andrew sent me
this info:

"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
500 perms, so I'm guessing that's the problem. Been that way since 9:32 UTC
on March 5th."

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Hi all,
>
> A very quick (and not thorough) survey shows that I can't find any
> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
> to be failing with some variant of this message:
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
> on project hadoop-hdfs: Failed to clean project: Failed to delete
>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> -> [Help 1]
>
> Any ideas how this happened?  Bad disk, unit test setting wrong
> permissions?
>
> Colin
>