You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@cloudera.com> on 2015/07/12 07:20:00 UTC

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

As mentioned on HADOOP-12111, there is now an incubator-style proposal:
http://wiki.apache.org/incubator/YetusProposal

On Wed, Jun 24, 2015 at 9:41 AM, Sean Busbey <bu...@cloudera.com> wrote:

> Hi Folks!
>
> Work in a feature branch is now being tracked by HADOOP-12111.
>
> On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> It looks like we have consensus.
>>
>> I'll start drafting up a proposal for the next board meeting (July 15th).
>> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
>> that we did due diligence on whatever we pick.
>>
>> In the mean time, Hadoop PMC would y'all be willing to host us in a
>> branch so that we can start prepping things now? We would want branch
>> commit rights for the proposed new PMC.
>>
>>
>> -Sean
>>
>>
>> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> Oof. I had meant to push on this again but life got in the way and now
>>> the June board meeting is upon us. Sorry everyone. In the event that this
>>> ends up contentious, hopefully one of the copied communities can give us a
>>> branch to work in.
>>>
>>> I know everyone is busy, so here's the short version of this email: I'd
>>> like to move some of the code currently in Hadoop (test-patch) into a new
>>> TLP focused on QA tooling. I'm not sure what the best format for priming
>>> this conversation is. ORC filled in the incubator project proposal
>>> template, but I'm not sure how much that confused the issue. So to start,
>>> I'll just write what I'm hoping we can accomplish in general terms here.
>>>
>>> All software development projects that are community based (that is,
>>> accepting outside contributions) face a common QA problem for vetting
>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>> popular that the weight of the problem drove tool development (i.e.
>>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>>> have adopted their own forks. Unfortunately, in most projects this kind of
>>> QA work is an enabler rather than a primary concern, so often the tooling
>>> is worked on ad-hoc and little shared improvements happen across projects. Since
>>> the tooling itself is never a primary concern, any made is rarely reused
>>> outside of ASF projects.
>>>
>>> Over the last couple months a few of us have been working on
>>> generalizing the tooling present in the Hadoop code base (because it was
>>> the most mature out of all those in the various projects) and it's reached
>>> a point where we think we can start bringing on other downstream users.
>>> This means we need to start establishing things like a release cadence and
>>> to grow the new contributors we have to handle more project responsibility.
>>> Personally, I think that means it's time to move out from under Hadoop to
>>> drive things as our own community. Eventually, I hope the community can
>>> help draw in a group of folks traditionally underrepresented in ASF
>>> projects, namely QA and operations folks.
>>>
>>> I think test-patch by itself has enough scope to justify a project.
>>> Having a solid set of build tools that are customizable to fit the norms of
>>> different software communities is a bunch of work. Making it work well in
>>> both the context of automated test systems like Jenkins and for individual
>>> developers is even more work. We could easily also take over maintenance of
>>> things like shelldocs, since test-patch is the primary consumer of that
>>> currently but it's generally useful tooling.
>>>
>>> In addition to test-patch, I think the proposed project has some future
>>> growth potential. Given some adoption of test-patch to prove utility, the
>>> project could build on the ties it makes to start building tools to help
>>> projects do their own longer-run testing. Note that I'm talking about the
>>> tools to build QA processes and not a particular set of tested components.
>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>> generalizable as a fault injection framework (either based on that code or
>>> something like it). Doing this for arbitrary software is obviously very
>>> difficult, and a part of easing that will be to make (and then favor)
>>> tooling to allow projects to have operational glue that looks the same.
>>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>>> great foundational layer that could bring good daemon handling practices to
>>> a whole slew of software projects. In the event that these frameworks and
>>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>>> of i.e. Bigtop substantially easier.
>>>
>>> I've reached out to a few folks who have been involved in the current
>>> test-patch work or expressed interest in helping out on getting it used in
>>> other projects. Right now, the proposed PMC would be (alphabetical by last
>>> name):
>>>
>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>> jclouds pmc, sqoop pmc, all around Jenkins expert)
>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>> phoenix pmc)
>>> * Allen Wittenauer (hadoop committer)
>>>
>>> That PMC gives us several members and a bunch of folks familiar with the
>>> ASF. Combined with the code already existing in Apache spaces, I think that
>>> gives us sufficient justification for a direct board proposal.
>>>
>>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>>> snail and most of our project will be focused on shell scripts.
>>>
>>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>>> on the new TLP, but I hope that once we have a release that can be
>>> evaluated there'd be enough benefit to strongly encourage it.
>>>
>>> This has mostly been focused on scope and community issues, and I'd love
>>> to talk through any feedback on that. Additionally, are there any other
>>> points folks want to make sure are covered before we have a resolution?
>>>
>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Folks!
>>>>>
>>>>> After working on test-patch with other folks for the last few months,
>>>>> I think we've reached the point where we can make the fastest progress
>>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>>> things into a project focused on just that. I think we have a mature enough
>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>> together a tlp proposal.
>>>>>
>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>>> continue to make things more useful.
>>>>>
>>>>> -Sean
>>>>>
>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bu...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> HBase's dev-support folder is where the scripts and support files
>>>>>> live. We've only recently started adding anything to the maven builds
>>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>>> are having.
>>>>>>
>>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>>> of changes on a particular mail thread[3].
>>>>>>
>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>>> [3]: http://s.apache.org/NT0
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com> wrote:
>>>>>>
>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
>>>>>>> HBase
>>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>>>>>>>
>>>>>>> >+dev@hbase
>>>>>>> >
>>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to
>>>>>>> make
>>>>>>> >them
>>>>>>> >more robust. From what I can tell our stuff started off as an
>>>>>>> earlier
>>>>>>> >version of what Hadoop uses for testing.
>>>>>>> >
>>>>>>> >Folks on either side open to an experiment of combining our
>>>>>>> precommit
>>>>>>> >check
>>>>>>> >tooling? In principle we should be looking for the same kinds of
>>>>>>> things.
>>>>>>> >
>>>>>>> >Naturally we'll still need different jenkins jobs to handle
>>>>>>> different
>>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>>> lives,
>>>>>>> >but that could come later.
>>>>>>> >
>>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>>> cnauroth@hortonworks.com>
>>>>>>> >wrote:
>>>>>>> >
>>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>>> >>rs
>>>>>>> >> .html
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>>> kinds of
>>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>>> state.
>>>>>>> >>For
>>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>>> for
>>>>>>> >>test
>>>>>>> >> runs that was supposedly deleted.
>>>>>>> >>
>>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>>> failure
>>>>>>> >> by placing a file where the code expects to see a directory.
>>>>>>> That might
>>>>>>> >> even let us enable some of these tests that are skipped on
>>>>>>> Windows,
>>>>>>> >> because Windows allows access for the owner even after
>>>>>>> permissions have
>>>>>>> >> been stripped.
>>>>>>> >>
>>>>>>> >> Chris Nauroth
>>>>>>> >> Hortonworks
>>>>>>> >> http://hortonworks.com/
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cm...@alumni.cmu.edu>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> >Is there a maven plugin or setting we can use to simply remove
>>>>>>> >> >directories that have no executable permissions on them?
>>>>>>> Clearly we
>>>>>>> >> >have the permission to do this from a technical point of view
>>>>>>> (since
>>>>>>> >> >we created the directories as the jenkins user), it's simply
>>>>>>> that the
>>>>>>> >> >code refuses to do it.
>>>>>>> >> >
>>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>>> >> >
>>>>>>> >> >Colin
>>>>>>> >> >
>>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <le...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>>> >> >>
>>>>>>> >> >> In HDFS-7722:
>>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
>>>>>>> in
>>>>>>> >> >>TearDown().
>>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>> clause.
>>>>>>> >> >>
>>>>>>> >> >> Also I ran mvn test several times on my machine and all tests
>>>>>>> passed.
>>>>>>> >> >>
>>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>>> >> >>
>>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>>> >>DiskErrorException {
>>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>>> >> >>     throw new DiskErrorException("Not a directory: "
>>>>>>> >> >>                                  + dir.toString());
>>>>>>> >> >>   }
>>>>>>> >> >>
>>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>>> >> >> }
>>>>>>> >> >>
>>>>>>> >> >> One potentially safer alternative is replacing data dir with a
>>>>>>> >>regular
>>>>>>> >> >> file to stimulate disk failures.
>>>>>>> >> >>
>>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>> >> >><cn...@hortonworks.com> wrote:
>>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>> >>permissions
>>>>>>> >> >>>from
>>>>>>> >> >>> directories like the one Colin mentioned to simulate disk
>>>>>>> failures
>>>>>>> >>at
>>>>>>> >> >>>data
>>>>>>> >> >>> nodes.  I reviewed the code for all of those, and they all
>>>>>>> appear
>>>>>>> >>to be
>>>>>>> >> >>> doing the necessary work to restore executable permissions at
>>>>>>> the
>>>>>>> >>end
>>>>>>> >> >>>of
>>>>>>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>>> makes
>>>>>>> >> >>>changes
>>>>>>> >> >>> in these test suites is HDFS-7722.  That patch still looks
>>>>>>> fine
>>>>>>> >> >>>though.  I
>>>>>>> >> >>> don¹t know if there are other uncommitted patches that
>>>>>>> changed these
>>>>>>> >> >>>test
>>>>>>> >> >>> suites.
>>>>>>> >> >>>
>>>>>>> >> >>> I suppose it¹s also possible that the JUnit process
>>>>>>> unexpectedly
>>>>>>> >>died
>>>>>>> >> >>> after removing executable permissions but before restoring
>>>>>>> them.
>>>>>>> >>That
>>>>>>> >> >>> always would have been a weakness of these test suites,
>>>>>>> regardless
>>>>>>> >>of
>>>>>>> >> >>>any
>>>>>>> >> >>> recent changes.
>>>>>>> >> >>>
>>>>>>> >> >>> Chris Nauroth
>>>>>>> >> >>> Hortonworks
>>>>>>> >> >>> http://hortonworks.com/
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <at...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>>>Hey Colin,
>>>>>>> >> >>>>
>>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>> going on
>>>>>>> >>with
>>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are
>>>>>>> being
>>>>>>> >> >>>>set in
>>>>>>> >> >>>>those directories by our unit tests which are precluding
>>>>>>> those files
>>>>>>> >> >>>>from
>>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but
>>>>>>> we
>>>>>>> >>should
>>>>>>> >> >>>>expect this to keep happening until we can fix the test in
>>>>>>> question
>>>>>>> >>to
>>>>>>> >> >>>>properly clean up after itself.
>>>>>>> >> >>>>
>>>>>>> >> >>>>To help narrow down which commit it was that started this,
>>>>>>> Andrew
>>>>>>> >>sent
>>>>>>> >> >>>>me
>>>>>>> >> >>>>this info:
>>>>>>> >> >>>>
>>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >>>>>>/
>>>>>>> >> >>>>has
>>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>>> since
>>>>>>> >>9:32
>>>>>>> >> >>>>UTC
>>>>>>> >> >>>>on March 5th."
>>>>>>> >> >>>>
>>>>>>> >> >>>>--
>>>>>>> >> >>>>Aaron T. Myers
>>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>>> >> >>>>
>>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>>> >><cm...@apache.org>
>>>>>>> >> >>>>wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>>> Hi all,
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't
>>>>>>> find any
>>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most
>>>>>>> of them
>>>>>>> >> >>>>>seem
>>>>>>> >> >>>>> to be failing with some variant of this message:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>> >>(default-clean)
>>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>> delete
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>
>>>>>>> >>
>>>>>>>
>>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>> >>>>>>>fs
>>>>>>> >> >>>>>-pr
>>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>> >> >>>>> -> [Help 1]
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>>> wrong
>>>>>>> >> >>>>> permissions?
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Colin
>>>>>>> >> >>>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> --
>>>>>>> >> >> Lei (Eddy) Xu
>>>>>>> >> >> Software Engineer, Cloudera
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> >--
>>>>>>> >Sean
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean