You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2015/02/02 12:45:24 UTC

Re: Patch review process

Given experience of apache reviews, I don't know how much time to spend on it. I'm curious about Gerrit, but again, if JIRA integration is what is sought, Cruicible sounds better.

Returning to other issues in the discussion

1. Improving test times would make a big difference; locally as well as on Jira.

2. How can we clear through today's backlog without relying on a future piece of technology from magically fixing it?

For clearing the backlog, I don't see any solution other than "people put in time". I know its an obligation for committers to do this, but I also know how little time most of us have to do things other than deal with our own tests failing. As a result, things that aren't viewed as critical get neglected. Shell, build, object stores, cruft cleanup, etc, I think people that care about these areas are going to have to get together and sync up. For some of the stuff it may be quite fast —people may not have noticed, but a few of us have brought the build dependencies forward fairly fast recently, with a goal of Hadoop branch-2/trunk being compatible with recent Guava versions and java 8.

I've been doing some S3/object store work the last couple of weekends; that's slow as test runs take 30+ minutes against the far end, test runs jenkins doesn't do. If anyone else wants to look at the fs/s3 and fs/swift queue their input is welcome.

And of course AW went through the entire backlog of shell stuff & a lot of the not-in-branch-2 features.

So where now? What is a strategy to deal with all those things in the queue?

Re: Patch review process

Posted by Chris Nauroth <cn...@hortonworks.com>.

The main JIRA dashboard for each project has an Issues tab with useful
summary statistics and links to filtered queries, most notably links to
unresolved issues grouped by each project sub-component.

https://issues.apache.org/jira/browse/HADOOP/?selectedTab=com.atlassian.jir
a.jira-projects-plugin:issues-panel

https://issues.apache.org/jira/browse/HDFS/?selectedTab=com.atlassian.jira.
jira-projects-plugin:issues-panel

https://issues.apache.org/jira/browse/MAPREDUCE/?selectedTab=com.atlassian.
jira.jira-projects-plugin:issues-panel

https://issues.apache.org/jira/browse/YARN/?selectedTab=com.atlassian.jira.
jira-projects-plugin:issues-panel

I can see that many of these represent a much smaller, more manageable
work queue than trying to sift through all patch available, or worse yet,
all unresolved.  For example, here are the results for Hadoop Common
native, HDFS snapshots, MapReduce Job History Server, and YARN Capacity
Scheduler.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20HADOOP%20AND%20r
esolution%20%3D%20Unresolved%20AND%20component%20%3D%20native%20ORDER%20BY%
20priority%20DESC

https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDFS%20AND%20res
olution%20%3D%20Unresolved%20AND%20component%20%3D%20snapshots%20ORDER%20BY
%20priority%20DESC

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MAPREDUCE%20AND%
20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20jobhistoryserver%
20ORDER%20BY%20priority%20DESC

https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20res
olution%20%3D%20Unresolved%20AND%20component%20%3D%20capacityscheduler%20OR
DER%20BY%20priority%20DESC

Suppose we ask for committers to act in the Patch Manager role associated
with these per-component queries as their work queues.  If people are
working in an area of expertise, then they'll likely process the queue
efficiently.  If people want to stretch into an area of code where they
are less familiar, then volunteering as Patch Manager could be a way to
ramp up.

The success of this approach would depend on the quality of our JIRA
metadata.  We¹d need to be diligent about assigning each issue to its
correct component.  We may also find a need to restructure the component
breakdown over time.  Right now, it tends to mirror our Java package
structure pretty closely, but something like ³namenode" is quite broad as
a patch queue.

Thoughts?

Chris Nauroth
Hortonworks
http://hortonworks.com/

On 2/4/15, 11:14 AM, "Steve Loughran" <st...@hortonworks.com> wrote:

>
>I'm worrying more about the ongoing situation. As a release approaches
>someone effectively goes full time as the gatekeeper, -for a good release
>they should be saying "too late!" for most features and "only if it's low
>risk" to non-critical bug fixes
>
>Which means that non-critical stuff don't get in as a release approaches.
>This is a good thing for release stability, but not for getting work in.
>
>Active patch queue management should be something going on when the
>releases aren't being made (or even then, just not near the release
>branch).
>
>The problem is it takes time and effort. Time to review the code, test
>it, maybe even tune it a bit to help.
>
>For the big bits of work, if you can get full-time/part time support from
>committers then you do stand a chance of getting it in. But the effort
>needed for those project usually means that the engineer in question has
>been allocated that time by their employer. If its a big project and they
>don't have that support, I think the patch is going to be in trouble. The
>new NFS client proposal is an example of that: I can personally see why
>it'd be nice to have, but I'm not going to go  near it.
>
>For the little bits of work, they take less continuous time and effort,
>but someone who understands the area in question does need to go through
>them, provide feedback and help get them in.
>
>I don't think we do enough there. I understand why not: time and effort,
>but think we miss out in the process.
>
>
>
>On 4 February 2015 at 18:25:05, Colin P. McCabe
>(cmccabe@apache.org<ma...@apache.org>) wrote:
>
>I wonder if this work logically falls under the release manager role.
>
>
>During a release, we generally spend a little bit of time thinking
>about what new features we added, systems we stabilized, interfaces we
>changed, etc. etc. This gives us some perspective to look backwards
>at old JIRAs and either close them as no longer relevant, or target
>them for the next release (with appropriate encouragement to the
>people who might have the expertise to make that happen.)
>
>best,
>Colin

Re: Patch review process

Posted by Steve Loughran <st...@hortonworks.com>.

I'm worrying more about the ongoing situation. As a release approaches someone effectively goes full time as the gatekeeper, -for a good release they should be saying "too late!" for most features and "only if it's low risk" to non-critical bug fixes

Which means that non-critical stuff don't get in as a release approaches. This is a good thing for release stability, but not for getting work in.

Active patch queue management should be something going on when the releases aren't being made (or even then, just not near the release branch).

The problem is it takes time and effort. Time to review the code, test it, maybe even tune it a bit to help.

For the big bits of work, if you can get full-time/part time support from committers then you do stand a chance of getting it in. But the effort needed for those project usually means that the engineer in question has been allocated that time by their employer. If its a big project and they don't have that support, I think the patch is going to be in trouble. The new NFS client proposal is an example of that: I can personally see why it'd be nice to have, but I'm not going to go near it.

For the little bits of work, they take less continuous time and effort, but someone who understands the area in question does need to go through them, provide feedback and help get them in.

I don't think we do enough there. I understand why not: time and effort, but think we miss out in the process.

On 4 February 2015 at 18:25:05, Colin P. McCabe (cmccabe@apache.org<ma...@apache.org>) wrote:

I wonder if this work logically falls under the release manager role.

During a release, we generally spend a little bit of time thinking
about what new features we added, systems we stabilized, interfaces we
changed, etc. etc. This gives us some perspective to look backwards
at old JIRAs and either close them as no longer relevant, or target
them for the next release (with appropriate encouragement to the
people who might have the expertise to make that happen.)

best,
Colin

Re: Patch review process

Posted by Chris Douglas <cd...@apache.org>.

On Wed, Feb 11, 2015 at 2:04 PM, Steve Loughran <st...@hortonworks.com> wrote:
> At the same time, if only 1 person is looking at a part of the codebase & submitting patches, they have inherently recused themselves from reviewing on their own patches. Ideally you want >1 committer tracking a topic. That's someone with competence in the area too, obviously; a barrier to participation in the corner areas.

That was what I was trying to convey. With RTC, if there's only one
person working in an area, then they can't make progress.

> started with https://issues.apache.org/jira/browse/INFRA-9152

Great; thanks Steve.

> though I'm not sure the diff between fisheye and cruicible here; they seem blurred

>From [1]:
FishEye allows you to extract information from your source code
repository and display it in sophisticated reports.
Crucible allows you to request, perform and manage code reviews.

[1] https://confluence.atlassian.com/display/CRUCIBLE/Crucible+and+FishEye

-C

> On Tue, Feb 10, 2015 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> I don¹t anticipate a patch manager introducing a new bottleneck.
>>
>> As originally described by Chris D, the role of the patch manager is not
>> to review and commit all patches in an assigned area. Instead, the
>> responsibility is queue management: following up on dormant jiras to make
>> sure progress is made. This might involve the patch manager doing the
>> review and commit, but it also might mean contacting someone else for
>> review, closing it if it¹s a duplicate, or making a won¹t-fix decision.
>> It¹s the kind of activity that Allen and Steve have done a lot lately.
>>
>> I see the patch manager role as addressing the fact that the community
>> itself has grown large and complex. As others have mentioned, it¹s not
>> always clear to a new contributor who to ask for a code review. A patch
>> manager would be familiar enough with the community to help steer their
>> patches in the right direction.
>>
>> I suppose we don¹t need to formalize this too much. If anyone feels
>> capable of doing this kind of queue management in a certain area of
>> expertise, please dive in. Congratulations, you are now a patch manager!
>> I¹m sure everyone would appreciate it.
>

Re: Patch review process

Posted by Steve Loughran <st...@hortonworks.com>.

On 11 February 2015 at 21:11:25, Chris Douglas (cdouglas@apache.org<ma...@apache.org>) wrote:

+1; ChrisN's formulation is exactly right.

The patch manager can't force (or shame) anyone into caring about your
issue. One of the benefits of RTC is that parts of the code with a
single maintainer are exposed. If you can't find collaborators, either
(a) this isn't the right community for that module or (b) the project
needs to acknowledge and address the "bus factor" [1] for that code.
By observing and directing review, the patch manager accumulates
context most contributors don't have.

At the same time, if only 1 person is looking at a part of the codebase & submitting patches, they have inherently recused themselves from reviewing on their own patches. Ideally you want >1 committer tracking a topic. That's someone with competence in the area too, obviously; a barrier to participation in the corner areas.

Does anyone want to work with INFRA to test Crucible? It looks like
Ambari started exploring it last year [2]. From David's response, it
sounds like they'd be willing to work with a project to experiment,
but most requests have been for Gerrit. -C

[1] http://en.wikipedia.org/wiki/Bus_factor
[2] https://issues.apache.org/jira/browse/INFRA-8430

started with https://issues.apache.org/jira/browse/INFRA-9152 , though I'm not sure the diff between fisheye and cruicible here; they seem blurred

On Tue, Feb 10, 2015 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> I don¹t anticipate a patch manager introducing a new bottleneck.
>
> As originally described by Chris D, the role of the patch manager is not
> to review and commit all patches in an assigned area. Instead, the
> responsibility is queue management: following up on dormant jiras to make
> sure progress is made. This might involve the patch manager doing the
> review and commit, but it also might mean contacting someone else for
> review, closing it if it¹s a duplicate, or making a won¹t-fix decision.
> It¹s the kind of activity that Allen and Steve have done a lot lately.
>
> I see the patch manager role as addressing the fact that the community
> itself has grown large and complex. As others have mentioned, it¹s not
> always clear to a new contributor who to ask for a code review. A patch
> manager would be familiar enough with the community to help steer their
> patches in the right direction.
>
> I suppose we don¹t need to formalize this too much. If anyone feels
> capable of doing this kind of queue management in a certain area of
> expertise, please dive in. Congratulations, you are now a patch manager!
> I¹m sure everyone would appreciate it.

Re: Patch review process

Posted by Chris Douglas <cd...@apache.org>.

+1; ChrisN's formulation is exactly right.

The patch manager can't force (or shame) anyone into caring about your
issue. One of the benefits of RTC is that parts of the code with a
single maintainer are exposed. If you can't find collaborators, either
(a) this isn't the right community for that module or (b) the project
needs to acknowledge and address the "bus factor" [1] for that code.
By observing and directing review, the patch manager accumulates
context most contributors don't have.

Does anyone want to work with INFRA to test Crucible? It looks like
Ambari started exploring it last year [2]. From David's response, it
sounds like they'd be willing to work with a project to experiment,
but most requests have been for Gerrit. -C

[1] http://en.wikipedia.org/wiki/Bus_factor
[2] https://issues.apache.org/jira/browse/INFRA-8430

On Tue, Feb 10, 2015 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> I don¹t anticipate a patch manager introducing a new bottleneck.
>
> As originally described by Chris D, the role of the patch manager is not
> to review and commit all patches in an assigned area.  Instead, the
> responsibility is queue management: following up on dormant jiras to make
> sure progress is made.  This might involve the patch manager doing the
> review and commit, but it also might mean contacting someone else for
> review, closing it if it¹s a duplicate, or making a won¹t-fix decision.
> It¹s the kind of activity that Allen and Steve have done a lot lately.
>
> I see the patch manager role as addressing the fact that the community
> itself has grown large and complex.  As others have mentioned, it¹s not
> always clear to a new contributor who to ask for a code review.  A patch
> manager would be familiar enough with the community to help steer their
> patches in the right direction.
>
> I suppose we don¹t need to formalize this too much.  If anyone feels
> capable of doing this kind of queue management in a certain area of
> expertise, please dive in.  Congratulations, you are now a patch manager!
> I¹m sure everyone would appreciate it.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> On 2/10/15, 9:31 AM, "Tsuyoshi Ozawa" <oz...@apache.org> wrote:
>
>>> How could we speed up?
>>
>>+1 for trying Crucible. We should try whether it's integrated well and
>>it can solve the problem of "splitting discussion". If Crucible solves
>>it, it would be great.
>>
>>About the patch manager, I concern that it can delay reviews if the
>>patch size is too small and the amount of work of patch manager get
>>more and more.
>>
>>About "long-lived old patches", how about making them open
>>automatically when specific periods passes? It can also be a ping to
>>ML and save the time to check old patches.
>>
>>> - Some talk about how to improve precommit. Right now it takes hours to
>>>run
>>the unit tests, which slows down patch iterations. One solution is running
>>tests in parallel (and even distributed). Previous distributed experiments
>>have done a full unit test run in a couple minutes, but it'd be a fair
>>amount of work to actually make this production ready.
>>> - Also mention of putting in place more linting and static analysis.
>>Automating this will save reviewer time.
>>
>>I'm very interested in working this. If the distributed tests
>>environment can be prepared, it can accelerate the development of
>>Hadoop.
>>
>>> To date I've been the sole committer running the tests, reviewing the
>>>code and with a vague idea of what's being going on. That's because (a)
>>>I care about object stores after my experience with getting swift://
>>>in, and (b) I'm not recommending that anyone use it in production until
>>>its been field-tested more.
>>
>>I've heard that swift community started to maintain code.
>>http://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html
>>If we make the components production ready, we need to setup S3 or
>>Swift stubs in test environment. Is this feasible?
>>
>>BTW, Agile board looks helpful for us to know the status of our
>>projects at a glance. Mesos is using it.
>>https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=1
>>
>>Thanks,
>>- Tsuyoshi
>>
>>On Tue, Feb 10, 2015 at 7:10 PM, Steve Loughran <st...@hortonworks.com>
>>wrote:
>>>
>>>
>>>
>>> On 9 February 2015 at 21:18:52, Colin P. McCabe
>>>(cmccabe@apache.org<ma...@apache.org>) wrote:
>>>
>>> What happened with the Crucible experiment? Did we get a chance to
>>> try that out? That would be a great way to speed up patch reviews,
>>> and one that is well-integrated with JIRA.
>>>
>>> I am -1 on Gerrit unless we can find a way to mirror the comments to
>>> JIRA. I think splitting the discussion is a far worse thing that
>>> letting a few minor patches languish for a while (even assuming that
>>> gerrit would solve this, which seems unclear to me). The health of
>>> the community is most important.
>>>
>>> I think it is normal and healthy to post on hdfs-dev, email
>>> developers, or hold a meeting to try to promote your patch and/or
>>> idea. Some of the discussion here seems to be assuming that Hadoop is
>>> a machine for turning patch available JIRAs into commits. It's not.
>>> It's a community, and sometimes it is necessary to email people or
>>> talk to them to get them to help with your JIRA.
>>>
>>>
>>> I know your heart is in the right place, but the JIRA examples given
>>> here are not that persuasive. Both of them are things that we would
>>> not encounter on a real cluster (nobody uses Hadoop with ipv6, nobody
>>> uses Hadoop without setting up DNS).
>>>
>>> Got some bad news there. The real world is messy, and the way Hadoop
>>>tends to fail right now leaves java stack traces that tend to leave
>>>people assuming its Hadoop side.
>>>
>>> Messy networks are extra commonplace amongst people learning to use
>>>Hadoop themselves, future community members, and when you are bringing
>>>up VMs.
>>>
>>> In production, well, talk to your colleagues in support and say "how
>>>often do you field network-related problems?", followed by "do you think
>>>Hadoop could do more to help here?"
>>>
>>>
>>> But, if we find a specific set
>>> of issues that the community has ignored (such as good error messages
>>> in bad networking setups, configuration issues, etc.), then we could
>>> create an umbrella JIRA and make a sustained effort to get it done.
>>>
>>>
>>> Seems like a good strategy.
>>>
>>>  I've just created https://issues.apache.org/jira/browse/HADOOP-11571,
>>>"get S3a production ready". It shipped in Hadoop 2.6; now it's out in
>>>the wild the bug reports are starting to come back in. Mostly scale
>>>related; some failure handling, some improvements to work behind proxies
>>>and with non-AWS endpoints.
>>>
>>>   1.  To date all the s3a code has come from none committers; the
>>>original codebase
>>>   2.  Most of the ongoing dev from is Thomas Demoor at amplidata,
>>>   3.  There's been some support via AWS (HADOOP-10714),
>>>   4.  There's been a couple of patches from Ted Yu after hbase backups
>>>keeled over from too many threads
>>>
>>> One thing that is notable about the s3a (or any of the object store
>>>filesystems) is that Jenkins does not run the tests. Anyone proposing to
>>>+1 a patch based on a Jenkins run (see HADOOP-11488) is going to get a
>>>-1 from me; it takes 30-60 minutes for a test run. You get a bill of
>>>about 50c/month for participating this project
>>>
>>> To date I've been the sole committer running the tests, reviewing the
>>>code and with a vague idea of what's being going on. That's because (a)
>>>I care about object stores after my experience with getting swift://
>>>in, and (b) I'm not recommending that anyone use it in production until
>>>its been field-tested more.
>>>
>>> Who is going to assist me review and test these patches?
>>>
>>>
>>> Perhaps we could also do things like batching findbugs fixes into
>>> fewer JIRAs, as has been suggested before.
>>>
>>> A detail. Findbugs is not the problem
>

Re: Patch review process

Posted by Chris Nauroth <cn...@hortonworks.com>.

I don¹t anticipate a patch manager introducing a new bottleneck.

As originally described by Chris D, the role of the patch manager is not
to review and commit all patches in an assigned area.  Instead, the
responsibility is queue management: following up on dormant jiras to make
sure progress is made.  This might involve the patch manager doing the
review and commit, but it also might mean contacting someone else for
review, closing it if it¹s a duplicate, or making a won¹t-fix decision.
It¹s the kind of activity that Allen and Steve have done a lot lately.

I see the patch manager role as addressing the fact that the community
itself has grown large and complex.  As others have mentioned, it¹s not
always clear to a new contributor who to ask for a code review.  A patch
manager would be familiar enough with the community to help steer their
patches in the right direction.

I suppose we don¹t need to formalize this too much.  If anyone feels
capable of doing this kind of queue management in a certain area of
expertise, please dive in.  Congratulations, you are now a patch manager!
I¹m sure everyone would appreciate it.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On 2/10/15, 9:31 AM, "Tsuyoshi Ozawa" <oz...@apache.org> wrote:

>> How could we speed up?
>
>+1 for trying Crucible. We should try whether it's integrated well and
>it can solve the problem of "splitting discussion". If Crucible solves
>it, it would be great.
>
>About the patch manager, I concern that it can delay reviews if the
>patch size is too small and the amount of work of patch manager get
>more and more.
>
>About "long-lived old patches", how about making them open
>automatically when specific periods passes? It can also be a ping to
>ML and save the time to check old patches.
>
>> - Some talk about how to improve precommit. Right now it takes hours to
>>run
>the unit tests, which slows down patch iterations. One solution is running
>tests in parallel (and even distributed). Previous distributed experiments
>have done a full unit test run in a couple minutes, but it'd be a fair
>amount of work to actually make this production ready.
>> - Also mention of putting in place more linting and static analysis.
>Automating this will save reviewer time.
>
>I'm very interested in working this. If the distributed tests
>environment can be prepared, it can accelerate the development of
>Hadoop.
>
>> To date I've been the sole committer running the tests, reviewing the
>>code and with a vague idea of what's being going on. That's because (a)
>>I care about object stores after my experience with getting swift://
>>in, and (b) I'm not recommending that anyone use it in production until
>>its been field-tested more.
>
>I've heard that swift community started to maintain code.
>http://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html
>If we make the components production ready, we need to setup S3 or
>Swift stubs in test environment. Is this feasible?
>
>BTW, Agile board looks helpful for us to know the status of our
>projects at a glance. Mesos is using it.
>https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=1
>
>Thanks,
>- Tsuyoshi
>
>On Tue, Feb 10, 2015 at 7:10 PM, Steve Loughran <st...@hortonworks.com>
>wrote:
>>
>>
>>
>> On 9 February 2015 at 21:18:52, Colin P. McCabe
>>(cmccabe@apache.org<ma...@apache.org>) wrote:
>>
>> What happened with the Crucible experiment? Did we get a chance to
>> try that out? That would be a great way to speed up patch reviews,
>> and one that is well-integrated with JIRA.
>>
>> I am -1 on Gerrit unless we can find a way to mirror the comments to
>> JIRA. I think splitting the discussion is a far worse thing that
>> letting a few minor patches languish for a while (even assuming that
>> gerrit would solve this, which seems unclear to me). The health of
>> the community is most important.
>>
>> I think it is normal and healthy to post on hdfs-dev, email
>> developers, or hold a meeting to try to promote your patch and/or
>> idea. Some of the discussion here seems to be assuming that Hadoop is
>> a machine for turning patch available JIRAs into commits. It's not.
>> It's a community, and sometimes it is necessary to email people or
>> talk to them to get them to help with your JIRA.
>>
>>
>> I know your heart is in the right place, but the JIRA examples given
>> here are not that persuasive. Both of them are things that we would
>> not encounter on a real cluster (nobody uses Hadoop with ipv6, nobody
>> uses Hadoop without setting up DNS).
>>
>> Got some bad news there. The real world is messy, and the way Hadoop
>>tends to fail right now leaves java stack traces that tend to leave
>>people assuming its Hadoop side.
>>
>> Messy networks are extra commonplace amongst people learning to use
>>Hadoop themselves, future community members, and when you are bringing
>>up VMs.
>>
>> In production, well, talk to your colleagues in support and say "how
>>often do you field network-related problems?", followed by "do you think
>>Hadoop could do more to help here?"
>>
>>
>> But, if we find a specific set
>> of issues that the community has ignored (such as good error messages
>> in bad networking setups, configuration issues, etc.), then we could
>> create an umbrella JIRA and make a sustained effort to get it done.
>>
>>
>> Seems like a good strategy.
>>
>>  I've just created https://issues.apache.org/jira/browse/HADOOP-11571,
>>"get S3a production ready". It shipped in Hadoop 2.6; now it's out in
>>the wild the bug reports are starting to come back in. Mostly scale
>>related; some failure handling, some improvements to work behind proxies
>>and with non-AWS endpoints.
>>
>>   1.  To date all the s3a code has come from none committers; the
>>original codebase
>>   2.  Most of the ongoing dev from is Thomas Demoor at amplidata,
>>   3.  There's been some support via AWS (HADOOP-10714),
>>   4.  There's been a couple of patches from Ted Yu after hbase backups
>>keeled over from too many threads
>>
>> One thing that is notable about the s3a (or any of the object store
>>filesystems) is that Jenkins does not run the tests. Anyone proposing to
>>+1 a patch based on a Jenkins run (see HADOOP-11488) is going to get a
>>-1 from me; it takes 30-60 minutes for a test run. You get a bill of
>>about 50c/month for participating this project
>>
>> To date I've been the sole committer running the tests, reviewing the
>>code and with a vague idea of what's being going on. That's because (a)
>>I care about object stores after my experience with getting swift://
>>in, and (b) I'm not recommending that anyone use it in production until
>>its been field-tested more.
>>
>> Who is going to assist me review and test these patches?
>>
>>
>> Perhaps we could also do things like batching findbugs fixes into
>> fewer JIRAs, as has been suggested before.
>>
>> A detail. Findbugs is not the problem

Re: Patch review process

Posted by Tsuyoshi Ozawa <oz...@apache.org>.

> How could we speed up?

+1 for trying Crucible. We should try whether it's integrated well and
it can solve the problem of "splitting discussion". If Crucible solves
it, it would be great.

About the patch manager, I concern that it can delay reviews if the
patch size is too small and the amount of work of patch manager get
more and more.

About "long-lived old patches", how about making them open
automatically when specific periods passes? It can also be a ping to
ML and save the time to check old patches.

> - Some talk about how to improve precommit. Right now it takes hours to run
the unit tests, which slows down patch iterations. One solution is running
tests in parallel (and even distributed). Previous distributed experiments
have done a full unit test run in a couple minutes, but it'd be a fair
amount of work to actually make this production ready.
> - Also mention of putting in place more linting and static analysis.
Automating this will save reviewer time.

I'm very interested in working this. If the distributed tests
environment can be prepared, it can accelerate the development of
Hadoop.

> To date I've been the sole committer running the tests, reviewing the code and with a vague idea of what's being going on. That's because (a) I care about object stores after my experience with getting swift://  in, and (b) I'm not recommending that anyone use it in production until its been field-tested more.

I've heard that swift community started to maintain code.
http://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html
If we make the components production ready, we need to setup S3 or
Swift stubs in test environment. Is this feasible?

BTW, Agile board looks helpful for us to know the status of our
projects at a glance. Mesos is using it.
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=1

Thanks,
- Tsuyoshi

On Tue, Feb 10, 2015 at 7:10 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
>
> On 9 February 2015 at 21:18:52, Colin P. McCabe (cmccabe@apache.org<ma...@apache.org>) wrote:
>
> What happened with the Crucible experiment? Did we get a chance to
> try that out? That would be a great way to speed up patch reviews,
> and one that is well-integrated with JIRA.
>
> I am -1 on Gerrit unless we can find a way to mirror the comments to
> JIRA. I think splitting the discussion is a far worse thing that
> letting a few minor patches languish for a while (even assuming that
> gerrit would solve this, which seems unclear to me). The health of
> the community is most important.
>
> I think it is normal and healthy to post on hdfs-dev, email
> developers, or hold a meeting to try to promote your patch and/or
> idea. Some of the discussion here seems to be assuming that Hadoop is
> a machine for turning patch available JIRAs into commits. It's not.
> It's a community, and sometimes it is necessary to email people or
> talk to them to get them to help with your JIRA.
>
>
> I know your heart is in the right place, but the JIRA examples given
> here are not that persuasive. Both of them are things that we would
> not encounter on a real cluster (nobody uses Hadoop with ipv6, nobody
> uses Hadoop without setting up DNS).
>
> Got some bad news there. The real world is messy, and the way Hadoop tends to fail right now leaves java stack traces that tend to leave people assuming its Hadoop side.
>
> Messy networks are extra commonplace amongst people learning to use Hadoop themselves, future community members, and when you are bringing up VMs.
>
> In production, well, talk to your colleagues in support and say "how often do you field network-related problems?", followed by "do you think Hadoop could do more to help here?"
>
>
> But, if we find a specific set
> of issues that the community has ignored (such as good error messages
> in bad networking setups, configuration issues, etc.), then we could
> create an umbrella JIRA and make a sustained effort to get it done.
>
>
> Seems like a good strategy.
>
>  I've just created https://issues.apache.org/jira/browse/HADOOP-11571, "get S3a production ready". It shipped in Hadoop 2.6; now it's out in the wild the bug reports are starting to come back in. Mostly scale related; some failure handling, some improvements to work behind proxies and with non-AWS endpoints.
>
>   1.  To date all the s3a code has come from none committers; the original codebase
>   2.  Most of the ongoing dev from is Thomas Demoor at amplidata,
>   3.  There's been some support via AWS (HADOOP-10714),
>   4.  There's been a couple of patches from Ted Yu after hbase backups keeled over from too many threads
>
> One thing that is notable about the s3a (or any of the object store filesystems) is that Jenkins does not run the tests. Anyone proposing to +1 a patch based on a Jenkins run (see HADOOP-11488) is going to get a -1 from me; it takes 30-60 minutes for a test run. You get a bill of about 50c/month for participating this project
>
> To date I've been the sole committer running the tests, reviewing the code and with a vague idea of what's being going on. That's because (a) I care about object stores after my experience with getting swift://  in, and (b) I'm not recommending that anyone use it in production until its been field-tested more.
>
> Who is going to assist me review and test these patches?
>
>
> Perhaps we could also do things like batching findbugs fixes into
> fewer JIRAs, as has been suggested before.
>
> A detail. Findbugs is not the problem

Re: Patch review process

Posted by Steve Loughran <st...@hortonworks.com>.

On 9 February 2015 at 21:18:52, Colin P. McCabe (cmccabe@apache.org<ma...@apache.org>) wrote:

What happened with the Crucible experiment? Did we get a chance to
try that out? That would be a great way to speed up patch reviews,
and one that is well-integrated with JIRA.

I am -1 on Gerrit unless we can find a way to mirror the comments to
JIRA. I think splitting the discussion is a far worse thing that
letting a few minor patches languish for a while (even assuming that
gerrit would solve this, which seems unclear to me). The health of
the community is most important.

I think it is normal and healthy to post on hdfs-dev, email
developers, or hold a meeting to try to promote your patch and/or
idea. Some of the discussion here seems to be assuming that Hadoop is
a machine for turning patch available JIRAs into commits. It's not.
It's a community, and sometimes it is necessary to email people or
talk to them to get them to help with your JIRA.

I know your heart is in the right place, but the JIRA examples given
here are not that persuasive. Both of them are things that we would
not encounter on a real cluster (nobody uses Hadoop with ipv6, nobody
uses Hadoop without setting up DNS).

Got some bad news there. The real world is messy, and the way Hadoop tends to fail right now leaves java stack traces that tend to leave people assuming its Hadoop side.

Messy networks are extra commonplace amongst people learning to use Hadoop themselves, future community members, and when you are bringing up VMs.

In production, well, talk to your colleagues in support and say "how often do you field network-related problems?", followed by "do you think Hadoop could do more to help here?"

But, if we find a specific set
of issues that the community has ignored (such as good error messages
in bad networking setups, configuration issues, etc.), then we could
create an umbrella JIRA and make a sustained effort to get it done.

Seems like a good strategy.

I've just created https://issues.apache.org/jira/browse/HADOOP-11571, "get S3a production ready". It shipped in Hadoop 2.6; now it's out in the wild the bug reports are starting to come back in. Mostly scale related; some failure handling, some improvements to work behind proxies and with non-AWS endpoints.

1. To date all the s3a code has come from none committers; the original codebase
2. Most of the ongoing dev from is Thomas Demoor at amplidata,
3. There's been some support via AWS (HADOOP-10714),
4. There's been a couple of patches from Ted Yu after hbase backups keeled over from too many threads

One thing that is notable about the s3a (or any of the object store filesystems) is that Jenkins does not run the tests. Anyone proposing to +1 a patch based on a Jenkins run (see HADOOP-11488) is going to get a -1 from me; it takes 30-60 minutes for a test run. You get a bill of about 50c/month for participating this project

To date I've been the sole committer running the tests, reviewing the code and with a vague idea of what's being going on. That's because (a) I care about object stores after my experience with getting swift:// in, and (b) I'm not recommending that anyone use it in production until its been field-tested more.

Who is going to assist me review and test these patches?

Perhaps we could also do things like batching findbugs fixes into
fewer JIRAs, as has been suggested before.

A detail. Findbugs is not the problem

Re: Patch review process

Posted by "Colin P. McCabe" <cm...@apache.org>.

What happened with the Crucible experiment?  Did we get a chance to
try that out?  That would be a great way to speed up patch reviews,
and one that is well-integrated with JIRA.

I am -1 on Gerrit unless we can find a way to mirror the comments to
JIRA.  I think splitting the discussion is a far worse thing that
letting a few minor patches languish for a while (even assuming that
gerrit would solve this, which seems unclear to me).  The health of
the community is most important.

I think it is normal and healthy to post on hdfs-dev, email
developers, or hold a meeting to try to promote your patch and/or
idea.  Some of the discussion here seems to be assuming that Hadoop is
a machine for turning patch available JIRAs into commits.  It's not.
It's a community, and sometimes it is necessary to email people or
talk to them to get them to help with your JIRA.

I know your heart is in the right place, but the JIRA examples given
here are not that persuasive.  Both of them are things that we would
not encounter on a real cluster (nobody uses Hadoop with ipv6, nobody
uses Hadoop without setting up DNS).  But, if we find a specific set
of issues that the community has ignored (such as good error messages
in bad networking setups,  configuration issues, etc.), then we could
create an umbrella JIRA and make a sustained effort to get it done.
Perhaps we could also do things like batching findbugs fixes into
fewer JIRAs, as has been suggested before.

best,
Colin

On Mon, Feb 9, 2015 at 2:55 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
>
> On 8 February 2015 at 09:55:42, Karthik Kambatla (kasha@cloudera.com<ma...@cloudera.com>) wrote:
>
> On Fri, Feb 6, 2015 at 6:14 PM, Colin P. McCabe <cm...@apache.org> wrote:
>
>> I think it's healthy to have lots of JIRAs that are "patch available."
>> It means that there is a lot of interest in the project and people
>> want to contribute. It would be unhealthy if JIRAs that really needed
>> to get in were not getting in. But beyond a few horror stories, that
>> usually doesn't seem to happen.
>>
>> I agree that we should make an effort to review things that come from
>> new contributors. I always set aside some time each week to look
>> through the new JIRAs on the list and review ones that I feel like I
>> can do.
>>
>> I think the "patch manager" for a patch should be the person who
>> submitted it. As Chris suggested, if nobody is reviewing, email
>> people who reviewed earlier and ask why. Or email the list and ask if
>> this is the right approach, and bring attention to the issue.
>>
>
> It is definitely great if contributors could reach out to potential
> reviewers and follow-up. However, newer contributors find it hard to figure
> out who to reach out to, and leaving it on them is not very welcoming.
>
> Also, people often find one issue, post a fix out of good will and move on.
> They might not be motivated enough to be the "patch manager" for it. I
> understand that if it is an important patch, someone else will take it up
> and get it in. If it is not or if it is duplicated, that JIRA/patch needs
> to be cleaned up.
>
> Having just looked through the set of JIRAs which I filed but didn't submit patches, they fall into the category "there's something not right here but I'm not prepared to invest the time to get a patch in, as that patch is at risk of being neglected. Why bother".
>
> Which means that queue length isn't a good metric of health, not if it suppresses the creation/submission of new patches.
>
> Indeed, given that foundational bit of a queue theory, "A queue forms in a channel if egress rate < ingress rate", the metric of a functional process should really be channel throughput, ideally in a steady state where patches are being applied, returned for rework, rejected or postponed for sound reasons "this is a post java-7 feature", with postponed ones being returned to.
>
> That's what we should be measuring then: throughput (higher=better) and latency (lower=better).
>
>
>
> I put some time in on saturday looking at issues. As well as closing some of my own and others as an WONTFIX, I did some detailed work on a couple.
>
> It took  ~30 minutes for rigorous review of a patch. One from an active contributor was bigger (HADOOP-11042), but they were already familiar with the expectations of testing and knew their way around the codebase. Another was a smaller patch (HADOOP-3619), but because the patch was further from what we like, it took more effort in providing feedback, and will need another go-around. That's not the fault of the contributor, its just that he wasn't familiar with the codebase and some of our test needs. I could have just fixed the code there-and-then but I'd then be unable to review the patch myself. I also think that by helping the patch submitter complete the patch, they learn more about the hadoop dev process, so future patches will be better.
>
> That was me this weekend then: two patches nurtured along the way, and an implicit commitment to review their next iterations. One more hour booked out of my calendar for later this month.
>
> Which is why I would love to see what we can do about speeding up that review process with better tooling, Gerrit being one whose name keeps coming up. I know the argument against it "splits discussion", but is that worse than "languishes without any feedback on the JIRA at all"?
>
>
> Getting a release out (for a release manager) is already some work. Adding
> more responsibilities to the RM (a voluntary role) makes it less enticing.
> And, distributing the work among multiple workers (with domain knowledge)
> might be more efficient.
>
>
>
> I think finding someone to volunteer as patch manager would be equally unenticing. We effectively have some: Chris N looks after windows problems across the codebase, AW does bash, I do some of the build and object store stuff. But its invariably "when there's spare time", and with a single individual worrying about the problem, the throughput on that area is now a metric of how much spare time that person has. The queue length is therefore a function of other work the individual is doing and other personal commitments.
>
> At the very least, we need to identify >1 person who cares about an area, and have them collaborate. Having better tooling to aid that collaboration is also important. Jira and manual patch viewing isn't enough, at least for me trying to review some work in my spare time
>
> So: where next? Can we set gerrit up to be an optional mechanism for reviewing things, so that those of us who do want to experiment with using to review patches can try it?
>
> -steve

Re: Patch review process

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 February 2015 at 09:55:42, Karthik Kambatla (kasha@cloudera.com<ma...@cloudera.com>) wrote:

On Fri, Feb 6, 2015 at 6:14 PM, Colin P. McCabe <cm...@apache.org> wrote:

> I think it's healthy to have lots of JIRAs that are "patch available."
> It means that there is a lot of interest in the project and people
> want to contribute. It would be unhealthy if JIRAs that really needed
> to get in were not getting in. But beyond a few horror stories, that
> usually doesn't seem to happen.
>
> I agree that we should make an effort to review things that come from
> new contributors. I always set aside some time each week to look
> through the new JIRAs on the list and review ones that I feel like I
> can do.
>
> I think the "patch manager" for a patch should be the person who
> submitted it. As Chris suggested, if nobody is reviewing, email
> people who reviewed earlier and ask why. Or email the list and ask if
> this is the right approach, and bring attention to the issue.
>

It is definitely great if contributors could reach out to potential
reviewers and follow-up. However, newer contributors find it hard to figure
out who to reach out to, and leaving it on them is not very welcoming.

Also, people often find one issue, post a fix out of good will and move on.
They might not be motivated enough to be the "patch manager" for it. I
understand that if it is an important patch, someone else will take it up
and get it in. If it is not or if it is duplicated, that JIRA/patch needs
to be cleaned up.

Having just looked through the set of JIRAs which I filed but didn't submit patches, they fall into the category "there's something not right here but I'm not prepared to invest the time to get a patch in, as that patch is at risk of being neglected. Why bother".

Which means that queue length isn't a good metric of health, not if it suppresses the creation/submission of new patches.

Indeed, given that foundational bit of a queue theory, "A queue forms in a channel if egress rate < ingress rate", the metric of a functional process should really be channel throughput, ideally in a steady state where patches are being applied, returned for rework, rejected or postponed for sound reasons "this is a post java-7 feature", with postponed ones being returned to.

That's what we should be measuring then: throughput (higher=better) and latency (lower=better).

I put some time in on saturday looking at issues. As well as closing some of my own and others as an WONTFIX, I did some detailed work on a couple.

It took  ~30 minutes for rigorous review of a patch. One from an active contributor was bigger (HADOOP-11042), but they were already familiar with the expectations of testing and knew their way around the codebase. Another was a smaller patch (HADOOP-3619), but because the patch was further from what we like, it took more effort in providing feedback, and will need another go-around. That's not the fault of the contributor, its just that he wasn't familiar with the codebase and some of our test needs. I could have just fixed the code there-and-then but I'd then be unable to review the patch myself. I also think that by helping the patch submitter complete the patch, they learn more about the hadoop dev process, so future patches will be better.

That was me this weekend then: two patches nurtured along the way, and an implicit commitment to review their next iterations. One more hour booked out of my calendar for later this month.

Which is why I would love to see what we can do about speeding up that review process with better tooling, Gerrit being one whose name keeps coming up. I know the argument against it "splits discussion", but is that worse than "languishes without any feedback on the JIRA at all"?

Getting a release out (for a release manager) is already some work. Adding
more responsibilities to the RM (a voluntary role) makes it less enticing.
And, distributing the work among multiple workers (with domain knowledge)
might be more efficient.

I think finding someone to volunteer as patch manager would be equally unenticing. We effectively have some: Chris N looks after windows problems across the codebase, AW does bash, I do some of the build and object store stuff. But its invariably "when there's spare time", and with a single individual worrying about the problem, the throughput on that area is now a metric of how much spare time that person has. The queue length is therefore a function of other work the individual is doing and other personal commitments.

At the very least, we need to identify >1 person who cares about an area, and have them collaborate. Having better tooling to aid that collaboration is also important. Jira and manual patch viewing isn't enough, at least for me trying to review some work in my spare time

So: where next? Can we set gerrit up to be an optional mechanism for reviewing things, so that those of us who do want to experiment with using to review patches can try it?

-steve

Re: Patch review process

Posted by Karthik Kambatla <ka...@cloudera.com>.

On Fri, Feb 6, 2015 at 6:14 PM, Colin P. McCabe <cm...@apache.org> wrote:

> I think it's healthy to have lots of JIRAs that are "patch available."
>  It means that there is a lot of interest in the project and people
> want to contribute.  It would be unhealthy if JIRAs that really needed
> to get in were not getting in.  But beyond a few horror stories, that
> usually doesn't seem to happen.
>
> I agree that we should make an effort to review things that come from
> new contributors.  I always set aside some time each week to look
> through the new JIRAs on the list and review ones that I feel like I
> can do.
>
> I think the "patch manager" for a patch should be the person who
> submitted it.  As Chris suggested, if nobody is reviewing, email
> people who reviewed earlier and ask why.  Or email the list and ask if
> this is the right approach, and bring attention to the issue.
>

It is definitely great if contributors could reach out to potential
reviewers and follow-up. However, newer contributors find it hard to figure
out who to reach out to, and leaving it on them is not very welcoming.

Also, people often find one issue, post a fix out of good will and move on.
They might not be motivated enough to be the "patch manager" for it. I
understand that if it is an important patch, someone else will take it up
and get it in. If it is not or if it is duplicated, that JIRA/patch needs
to be cleaned up.


>
> I do like the idea of cleaning up old JIRAs that no longer apply or
> that have been abandoned.  And perhaps picking up on a few issues that
> we have forgotten about.  But it is part of release management in my
> mind.  The release manager decides that we need to get features and
> bugfixes X, Y, and Z in release Q, and then pushes on the JIRAs and
> committers responsible for making this happen.  Since JIRAs implement
> features and bugfixes they naturally fall under release management.
> This is how several companies that I've worked at have done it
> internally...
>

Getting a release out (for a release manager) is already some work. Adding
more responsibilities to the RM (a voluntary role) makes it less enticing.
And, distributing the work among multiple workers (with domain knowledge)
might be more efficient.


>
> cheers,
> Colin
>
> On Thu, Feb 5, 2015 at 4:18 PM, Akira AJISAKA
> <aj...@oss.nttdata.co.jp> wrote:
> > I'm thinking it's unhealthy to have over 1000 JIRAs patch available.
> > Reviewers should be more welcome and should review patches from
> everywhere
> > to increase developers and future reviewers.
> >
> > I'm not completely sure patch managers will make it healthy, however,
> > changing the process (and this discussion) would help improving our
> > mindsets.
> >
> > @Committers: Let's review more patches!
> > @Developers: You can also review patches you are interested in. Your
> > comments will help committers to review and merge them.
> > (As you can see, the above comments don't have any enforcement.)
> >
> > Regards,
> > Akira
> >
> >
> > On 2/4/15 13:52, Karthik Kambatla wrote:
> >>
> >> +1 to patch managers per component.
> >>
> >>
> >> On Wed, Feb 4, 2015 at 12:29 PM, Allen Wittenauer <aw...@altiscale.com>
> >> wrote:
> >>
> >>>
> >>>          Is process really the problem?  Or, more directly, how does
> any
> >>> of
> >>> this actually increase the pool beyond the (I’m feeling generous today)
> >>> 10
> >>> or so committers (never mind PMC) that actually review patches that
> come
> >>> from outside their employers on a regular basis?
> >>>
> >>
> >> Process might not be the source of the problem, however process will
> help
> >> with alleviating the current situation.
> >>
> >> It would definitely help to increase the number of active committers.
> >> Might
> >> not be very hard to add committers, but I don't know of a way to make
> them
> >> active.
> >>
> >>
> >>>
> >>>          To put this in perspective, there are over 1000 JIRAs in patch
> >>> available status across all three projects right now. That’s not even
> >>> counting the ones that I know I’ve personally removed the PA status on
> >>> because the patch no longer applies...
> >>>
> >>>
> >>> On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org>
> wrote:
> >>>
> >>>> Release managers are just committers trying to roll releases; it's not
> >>>> an enduring role. A patch manager is just someone helping to track
> >>>> work and direct reviewers to issues. The job doesn't come with a hat.
> >>>> We could look into a badge and gun if that would help.
> >>>
> >>>
> >>
> >> Badge and gun will ensure a single patch-manager per component.
> >>
> >>
> >>>>
> >>>> This doesn't require a lot of hand-wringing or diagnosis. If you're
> >>>> concerned about the queue, then start trying to find reviewers for
> >>>> viable patches.
> >>>>
> >>>> We should also close issues that require too much work to fix, or at
> >>>> least mark them for "Later". Not every idea needs to end in a commit,
> >>>> but silence is frustrating for contributors. -C
> >>>
> >>>
> >>
> >> +1.
> >>
> >>
> >>>>
> >>>> On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org>
> >>>
> >>> wrote:
> >>>>>
> >>>>> I wonder if this work logically falls under the release manager role.
> >>>>>
> >>>>> During a release, we generally spend a little bit of time thinking
> >>>>> about what new features we added, systems we stabilized, interfaces
> we
> >>>>> changed, etc. etc.  This gives us some perspective to look backwards
> >>>>> at old JIRAs and either close them as no longer relevant, or target
> >>>>> them for the next release (with appropriate encouragement to the
> >>>>> people who might have the expertise to make that happen.)
> >>>>>
> >>>>> best,
> >>>>> Colin
> >>>>>
> >>>>> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> +1 on the idea of patch managers. As the patch managers should have
> >>>>>> good expertise on the specific fields, they are more productive on
> >>>>>> reviewing the patches and driving the development on the specific
> >>>>>> fields forward.
> >>>>>>
> >>>>>>
> >>>>>> ~Haohui
> >>>>>>
> >>>>>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <
> >>>
> >>> cnauroth@hortonworks.com> wrote:
> >>>>>>>
> >>>>>>> I like the idea of patch managers monitoring specific queues of
> >>>
> >>> issues,
> >>>>>>>
> >>>>>>> perhaps implemented as a set of jira filters on different values
> for
> >>>
> >>> the
> >>>>>>>
> >>>>>>> component or label fields.  Right now, looking at the whole HADOOP
> >>>
> >>> backlog
> >>>>>>>
> >>>>>>> is daunting.  Using separate filtered review queues could help each
> >>>>>>> reviewer focus and parallelize the work.
> >>>>>>>
> >>>>>>> Going back to the topic of tooling, I just learned that multiple
> >>>
> >>> Apache
> >>>>>>>
> >>>>>>> projects have expressed interest in Gerrit recently.  I've never
> used
> >>>>>>> Gerrit and so can¹t speak in favor or against it, but I think
> >>>
> >>> consistency
> >>>>>>>
> >>>>>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.
> >>>>>>> The
> >>>>>>> issue is closed, but there is recent discussion in the comments.
> >>>>>>>
> >>>>>>> https://issues.apache.org/jira/browse/INFRA-2205
> >>>>>>>
> >>>>>>>
> >>>>>>> Chris Nauroth
> >>>>>>> Hortonworks
> >>>>>>> http://hortonworks.com/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
> >>>>>>>
> >>>>>>>> Many projects have unofficial "patch managers":
> >>>>>>>>
> >>>>>>>>
> >>>
> >>>
> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
> >>>>>>>>
> >>>>>>>>
> >>>
> >>>
> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
> >>>>>>>>
> >>>>>>>>
> >>>
> >>>
> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
> >>>>>>>>
> >>>>>>>>
> >>>
> >>>
> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
> >>>>>>>>
> >>>>>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
> >>>>>>>>
> >>>>>>>> People who go through outstanding issues, ensuring that each has
> >>>>>>>> reached a stable state, or at least a willing reviewer. -C
> >>>>>>>>
> >>>>>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <
> >>>
> >>> stevel@hortonworks.com>
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Given experience of apache reviews, I don't know how much time to
> >>>
> >>> spend
> >>>>>>>>>
> >>>>>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration
> is
> >>>
> >>> what
> >>>>>>>>>
> >>>>>>>>> is sought, Cruicible sounds better.
> >>>>>>>>>
> >>>>>>>>> Returning to other issues in the discussion
> >>>>>>>>>
> >>>>>>>>> 1. Improving test times would make a big difference; locally as
> >>>
> >>> well as
> >>>>>>>>>
> >>>>>>>>> on Jira.
> >>>>>>>>>
> >>>>>>>>> 2. How can we clear through today's backlog without relying on a
> >>>
> >>> future
> >>>>>>>>>
> >>>>>>>>> piece of technology from magically fixing it?
> >>>>>>>>>
> >>>>>>>>> For clearing the backlog, I don't see any solution other than
> >>>
> >>> "people
> >>>>>>>>>
> >>>>>>>>> put in time". I know its an obligation for committers to do this,
> >>>
> >>> but  I
> >>>>>>>>>
> >>>>>>>>> also know how little time most of us have to do things other than
> >>>
> >>> deal
> >>>>>>>>>
> >>>>>>>>> with our own tests failing. As a result, things that aren't
> viewed
> >>>
> >>> as
> >>>>>>>>>
> >>>>>>>>> critical get neglected. Shell, build, object stores, cruft
> cleanup,
> >>>
> >>> etc,
> >>>>>>>>>
> >>>>>>>>> I think people that care about these areas are going to have to
> get
> >>>>>>>>> together and sync up. For some of the stuff it may be quite fast
> >>>
> >>> ‹people
> >>>>>>>>>
> >>>>>>>>> may not have noticed, but a few of us have brought the build
> >>>>>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
> >>>>>>>>> branch-2/trunk being compatible with recent Guava versions and
> java
> >>>
> >>> 8.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I've been doing some S3/object store work the last couple of
> >>>
> >>> weekends;
> >>>>>>>>>
> >>>>>>>>> that's slow as test runs take 30+ minutes against the far end,
> test
> >>>
> >>> runs
> >>>>>>>>>
> >>>>>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
> >>>>>>>>> fs/swift queue their input is welcome.
> >>>>>>>>>
> >>>>>>>>> And of course AW went through the entire backlog of shell stuff
> & a
> >>>
> >>> lot
> >>>>>>>>>
> >>>>>>>>> of the not-in-branch-2 features.
> >>>>>>>>>
> >>>>>>>>> So where now? What is a strategy to deal with all those things in
> >>>
> >>> the
> >>>>>>>>>
> >>>>>>>>> queue?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>
> >>>
> >>
> >>
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Patch review process

Posted by Steve Loughran <st...@hortonworks.com>.

On 7 February 2015 at 02:14:39, Colin P. McCabe (cmccabe@apache.org<ma...@apache.org>) wrote:
I think it's healthy to have lots of JIRAs that are "patch available."
It means that there is a lot of interest in the project and people
want to contribute. It would be unhealthy if JIRAs that really needed
to get in were not getting in. But beyond a few horror stories, that
usually doesn't seem to happen.

I believe it is easier for you or I to assert that than it is for someone to submit a patch which really matters to them, only to find it languishes ignored, because it doesn't appear to matter to anyone who has the rights to get it into the code.

I agree that we should make an effort to review things that come from
new contributors. I always set aside some time each week to look
through the new JIRAs on the list and review ones that I feel like I
can do.

I think the "patch manager" for a patch should be the person who
submitted it. As Chris suggested, if nobody is reviewing, email
people who reviewed earlier and ask why. Or email the list and ask if
this is the right approach, and bring attention to the issue.

Is the fact that you have keep asking people to look at your patch a good one? Its certainly a sign that the submitter feels it matters, but it also shows there's no active queue management,

I suspect it also tends to be easier to pull off if you are already known in the community. I know a certain AW will now note that it helps to share employers with other committers, but we also tend to review and +1 code work by people you already know and are reasonably good at working with. (i.e you don't fear their code, trust them to care about issues like compatibility, testing, etc). Certainly I appreciate Alan's +1s for my languishing patches.

If you aren't known, if you have just one patch which appears to only surface in your env, risk of neglect.

example:

https://issues.apache.org/jira/browse/HADOOP-3426 "Datanode does not start up if the local machines DNS isn't working right and dfs.datanode.dns.interface==default"

my home lan, my broken /etc/resolv.conf, my patch. And until in Hadoop: my private branch needed to work. And now its in, I'm happy with that specific problem being addressed.

Except, there's one nearby about failing better in an IPv6 world, that's been around for a while and nobody has looked at

https://issues.apache.org/jira/browse/HADOOP-3619

It's little ones like that that I think can fall by the wayside (I'm looking at it now). Here's someone pushing the boundaries: running without IPv6 disabled -and instead of us picking up the early lessons, they are being ignored unless/until they become issues in the runup to a release.

And, we are trying to be a community here, which means encouraging more contributions. Those of us working full time on it should be able to allocate some time, even if only weekends outside the release phase, to catching up with the work queue.

There's an article here that makes this point —that OSS projects should be inclusive, not exclusive, which means encouraging a more diverse set of contributors.

http://www.curiousefficiency.org/posts/2015/01/abuse-is-not-ok.html

We can't do that if we restrict our reviews to work by known people,

The other issue I find with the "harass people until they commit it" strategy is that it scales badly. Not just from the # of people submitting patches, but from the #of patches. If I have a small 4 line patch, is it worth the effort of chasing people round to get it in, or should I save my effort for the more transformational patches?

Furthermore, as a recipient of such emails, after I while I get more ruthless about ignoring them. Though I think I'll look at a few today, including one that colleague of Colin's has been asking for (HADOOP-11293), as I feel sorry for anyone attempting a minor-but-widereaching bit of code cleanup.

I do like the idea of cleaning up old JIRAs that no longer apply or
that have been abandoned. And perhaps picking up on a few issues that
we have forgotten about.

But it is part of release management in my
mind.
The release manager decides that we need to get features and
bugfixes X, Y, and Z in release Q, and then pushes on the JIRAs and
committers responsible for making this happen. Since JIRAs implement
features and bugfixes they naturally fall under release management.
This is how several companies that I've worked at have done it
internally...

At release time it's too late to do things that are important yet whose roll-out is considered a threat to the code. If you were to look at the history of any JIRA related to updating Jetty you can see this: we know the problems, but don't want to go there, especially near a release time. And, given the stress induced by the "great protobuf upgrade of 2013", I agree. Except now its not release time, nobody has gone near Jetty again.

Anyway, I'm going to review some patches this weekend. Please DO NOT email suggestions to me, as that will only re-inforce the "email-priority-scheduler" algorithm I have just argued against. I will pick some minor ones from people with little or no contribution history, or ones that I care about but have forgotten to review.

-Steve

Re: Patch review process

Posted by "Colin P. McCabe" <cm...@apache.org>.

I think it's healthy to have lots of JIRAs that are "patch available."
 It means that there is a lot of interest in the project and people
want to contribute.  It would be unhealthy if JIRAs that really needed
to get in were not getting in.  But beyond a few horror stories, that
usually doesn't seem to happen.

I agree that we should make an effort to review things that come from
new contributors.  I always set aside some time each week to look
through the new JIRAs on the list and review ones that I feel like I
can do.

I think the "patch manager" for a patch should be the person who
submitted it.  As Chris suggested, if nobody is reviewing, email
people who reviewed earlier and ask why.  Or email the list and ask if
this is the right approach, and bring attention to the issue.

I do like the idea of cleaning up old JIRAs that no longer apply or
that have been abandoned.  And perhaps picking up on a few issues that
we have forgotten about.  But it is part of release management in my
mind.  The release manager decides that we need to get features and
bugfixes X, Y, and Z in release Q, and then pushes on the JIRAs and
committers responsible for making this happen.  Since JIRAs implement
features and bugfixes they naturally fall under release management.
This is how several companies that I've worked at have done it
internally...

cheers,
Colin

On Thu, Feb 5, 2015 at 4:18 PM, Akira AJISAKA
<aj...@oss.nttdata.co.jp> wrote:
> I'm thinking it's unhealthy to have over 1000 JIRAs patch available.
> Reviewers should be more welcome and should review patches from everywhere
> to increase developers and future reviewers.
>
> I'm not completely sure patch managers will make it healthy, however,
> changing the process (and this discussion) would help improving our
> mindsets.
>
> @Committers: Let's review more patches!
> @Developers: You can also review patches you are interested in. Your
> comments will help committers to review and merge them.
> (As you can see, the above comments don't have any enforcement.)
>
> Regards,
> Akira
>
>
> On 2/4/15 13:52, Karthik Kambatla wrote:
>>
>> +1 to patch managers per component.
>>
>>
>> On Wed, Feb 4, 2015 at 12:29 PM, Allen Wittenauer <aw...@altiscale.com>
>> wrote:
>>
>>>
>>>          Is process really the problem?  Or, more directly, how does any
>>> of
>>> this actually increase the pool beyond the (I’m feeling generous today)
>>> 10
>>> or so committers (never mind PMC) that actually review patches that come
>>> from outside their employers on a regular basis?
>>>
>>
>> Process might not be the source of the problem, however process will help
>> with alleviating the current situation.
>>
>> It would definitely help to increase the number of active committers.
>> Might
>> not be very hard to add committers, but I don't know of a way to make them
>> active.
>>
>>
>>>
>>>          To put this in perspective, there are over 1000 JIRAs in patch
>>> available status across all three projects right now. That’s not even
>>> counting the ones that I know I’ve personally removed the PA status on
>>> because the patch no longer applies...
>>>
>>>
>>> On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org> wrote:
>>>
>>>> Release managers are just committers trying to roll releases; it's not
>>>> an enduring role. A patch manager is just someone helping to track
>>>> work and direct reviewers to issues. The job doesn't come with a hat.
>>>> We could look into a badge and gun if that would help.
>>>
>>>
>>
>> Badge and gun will ensure a single patch-manager per component.
>>
>>
>>>>
>>>> This doesn't require a lot of hand-wringing or diagnosis. If you're
>>>> concerned about the queue, then start trying to find reviewers for
>>>> viable patches.
>>>>
>>>> We should also close issues that require too much work to fix, or at
>>>> least mark them for "Later". Not every idea needs to end in a commit,
>>>> but silence is frustrating for contributors. -C
>>>
>>>
>>
>> +1.
>>
>>
>>>>
>>>> On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org>
>>>
>>> wrote:
>>>>>
>>>>> I wonder if this work logically falls under the release manager role.
>>>>>
>>>>> During a release, we generally spend a little bit of time thinking
>>>>> about what new features we added, systems we stabilized, interfaces we
>>>>> changed, etc. etc.  This gives us some perspective to look backwards
>>>>> at old JIRAs and either close them as no longer relevant, or target
>>>>> them for the next release (with appropriate encouragement to the
>>>>> people who might have the expertise to make that happen.)
>>>>>
>>>>> best,
>>>>> Colin
>>>>>
>>>>> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
>>>>>>
>>>>>> +1 on the idea of patch managers. As the patch managers should have
>>>>>> good expertise on the specific fields, they are more productive on
>>>>>> reviewing the patches and driving the development on the specific
>>>>>> fields forward.
>>>>>>
>>>>>>
>>>>>> ~Haohui
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <
>>>
>>> cnauroth@hortonworks.com> wrote:
>>>>>>>
>>>>>>> I like the idea of patch managers monitoring specific queues of
>>>
>>> issues,
>>>>>>>
>>>>>>> perhaps implemented as a set of jira filters on different values for
>>>
>>> the
>>>>>>>
>>>>>>> component or label fields.  Right now, looking at the whole HADOOP
>>>
>>> backlog
>>>>>>>
>>>>>>> is daunting.  Using separate filtered review queues could help each
>>>>>>> reviewer focus and parallelize the work.
>>>>>>>
>>>>>>> Going back to the topic of tooling, I just learned that multiple
>>>
>>> Apache
>>>>>>>
>>>>>>> projects have expressed interest in Gerrit recently.  I've never used
>>>>>>> Gerrit and so can¹t speak in favor or against it, but I think
>>>
>>> consistency
>>>>>>>
>>>>>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.
>>>>>>> The
>>>>>>> issue is closed, but there is recent discussion in the comments.
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/INFRA-2205
>>>>>>>
>>>>>>>
>>>>>>> Chris Nauroth
>>>>>>> Hortonworks
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>>>>>>
>>>>>>>> Many projects have unofficial "patch managers":
>>>>>>>>
>>>>>>>>
>>>
>>> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>>>>>>
>>>>>>>>
>>>
>>> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>>>>>>
>>>>>>>>
>>>
>>> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>>>>>>
>>>>>>>>
>>>
>>> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>>>>>>
>>>>>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>>>>>>
>>>>>>>> People who go through outstanding issues, ensuring that each has
>>>>>>>> reached a stable state, or at least a willing reviewer. -C
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <
>>>
>>> stevel@hortonworks.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Given experience of apache reviews, I don't know how much time to
>>>
>>> spend
>>>>>>>>>
>>>>>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration is
>>>
>>> what
>>>>>>>>>
>>>>>>>>> is sought, Cruicible sounds better.
>>>>>>>>>
>>>>>>>>> Returning to other issues in the discussion
>>>>>>>>>
>>>>>>>>> 1. Improving test times would make a big difference; locally as
>>>
>>> well as
>>>>>>>>>
>>>>>>>>> on Jira.
>>>>>>>>>
>>>>>>>>> 2. How can we clear through today's backlog without relying on a
>>>
>>> future
>>>>>>>>>
>>>>>>>>> piece of technology from magically fixing it?
>>>>>>>>>
>>>>>>>>> For clearing the backlog, I don't see any solution other than
>>>
>>> "people
>>>>>>>>>
>>>>>>>>> put in time". I know its an obligation for committers to do this,
>>>
>>> but  I
>>>>>>>>>
>>>>>>>>> also know how little time most of us have to do things other than
>>>
>>> deal
>>>>>>>>>
>>>>>>>>> with our own tests failing. As a result, things that aren't viewed
>>>
>>> as
>>>>>>>>>
>>>>>>>>> critical get neglected. Shell, build, object stores, cruft cleanup,
>>>
>>> etc,
>>>>>>>>>
>>>>>>>>> I think people that care about these areas are going to have to get
>>>>>>>>> together and sync up. For some of the stuff it may be quite fast
>>>
>>> ‹people
>>>>>>>>>
>>>>>>>>> may not have noticed, but a few of us have brought the build
>>>>>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
>>>>>>>>> branch-2/trunk being compatible with recent Guava versions and java
>>>
>>> 8.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've been doing some S3/object store work the last couple of
>>>
>>> weekends;
>>>>>>>>>
>>>>>>>>> that's slow as test runs take 30+ minutes against the far end, test
>>>
>>> runs
>>>>>>>>>
>>>>>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>>>>>> fs/swift queue their input is welcome.
>>>>>>>>>
>>>>>>>>> And of course AW went through the entire backlog of shell stuff & a
>>>
>>> lot
>>>>>>>>>
>>>>>>>>> of the not-in-branch-2 features.
>>>>>>>>>
>>>>>>>>> So where now? What is a strategy to deal with all those things in
>>>
>>> the
>>>>>>>>>
>>>>>>>>> queue?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>
>>>
>>
>>
>

Re: Patch review process

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

I'm thinking it's unhealthy to have over 1000 JIRAs patch available. 
Reviewers should be more welcome and should review patches from 
everywhere to increase developers and future reviewers.

I'm not completely sure patch managers will make it healthy, however, 
changing the process (and this discussion) would help improving our 
mindsets.

@Committers: Let's review more patches!
@Developers: You can also review patches you are interested in. Your 
comments will help committers to review and merge them.
(As you can see, the above comments don't have any enforcement.)

Regards,
Akira

On 2/4/15 13:52, Karthik Kambatla wrote:
> +1 to patch managers per component.
>
>
> On Wed, Feb 4, 2015 at 12:29 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
>>
>>          Is process really the problem?  Or, more directly, how does any of
>> this actually increase the pool beyond the (I’m feeling generous today) 10
>> or so committers (never mind PMC) that actually review patches that come
>> from outside their employers on a regular basis?
>>
>
> Process might not be the source of the problem, however process will help
> with alleviating the current situation.
>
> It would definitely help to increase the number of active committers. Might
> not be very hard to add committers, but I don't know of a way to make them
> active.
>
>
>>
>>          To put this in perspective, there are over 1000 JIRAs in patch
>> available status across all three projects right now. That’s not even
>> counting the ones that I know I’ve personally removed the PA status on
>> because the patch no longer applies...
>>
>>
>> On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org> wrote:
>>
>>> Release managers are just committers trying to roll releases; it's not
>>> an enduring role. A patch manager is just someone helping to track
>>> work and direct reviewers to issues. The job doesn't come with a hat.
>>> We could look into a badge and gun if that would help.
>>
>
> Badge and gun will ensure a single patch-manager per component.
>
>
>>>
>>> This doesn't require a lot of hand-wringing or diagnosis. If you're
>>> concerned about the queue, then start trying to find reviewers for
>>> viable patches.
>>>
>>> We should also close issues that require too much work to fix, or at
>>> least mark them for "Later". Not every idea needs to end in a commit,
>>> but silence is frustrating for contributors. -C
>>
>
> +1.
>
>
>>>
>>> On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org>
>> wrote:
>>>> I wonder if this work logically falls under the release manager role.
>>>>
>>>> During a release, we generally spend a little bit of time thinking
>>>> about what new features we added, systems we stabilized, interfaces we
>>>> changed, etc. etc.  This gives us some perspective to look backwards
>>>> at old JIRAs and either close them as no longer relevant, or target
>>>> them for the next release (with appropriate encouragement to the
>>>> people who might have the expertise to make that happen.)
>>>>
>>>> best,
>>>> Colin
>>>>
>>>> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
>>>>> +1 on the idea of patch managers. As the patch managers should have
>>>>> good expertise on the specific fields, they are more productive on
>>>>> reviewing the patches and driving the development on the specific
>>>>> fields forward.
>>>>>
>>>>>
>>>>> ~Haohui
>>>>>
>>>>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <
>> cnauroth@hortonworks.com> wrote:
>>>>>> I like the idea of patch managers monitoring specific queues of
>> issues,
>>>>>> perhaps implemented as a set of jira filters on different values for
>> the
>>>>>> component or label fields.  Right now, looking at the whole HADOOP
>> backlog
>>>>>> is daunting.  Using separate filtered review queues could help each
>>>>>> reviewer focus and parallelize the work.
>>>>>>
>>>>>> Going back to the topic of tooling, I just learned that multiple
>> Apache
>>>>>> projects have expressed interest in Gerrit recently.  I've never used
>>>>>> Gerrit and so can¹t speak in favor or against it, but I think
>> consistency
>>>>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
>>>>>> issue is closed, but there is recent discussion in the comments.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/INFRA-2205
>>>>>>
>>>>>>
>>>>>> Chris Nauroth
>>>>>> Hortonworks
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>>>>>
>>>>>>> Many projects have unofficial "patch managers":
>>>>>>>
>>>>>>>
>> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>>>>>
>> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>>>>>
>> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>>>>>
>> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>>>>>
>>>>>>> People who go through outstanding issues, ensuring that each has
>>>>>>> reached a stable state, or at least a willing reviewer. -C
>>>>>>>
>>>>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <
>> stevel@hortonworks.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Given experience of apache reviews, I don't know how much time to
>> spend
>>>>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration is
>> what
>>>>>>>> is sought, Cruicible sounds better.
>>>>>>>>
>>>>>>>> Returning to other issues in the discussion
>>>>>>>>
>>>>>>>> 1. Improving test times would make a big difference; locally as
>> well as
>>>>>>>> on Jira.
>>>>>>>>
>>>>>>>> 2. How can we clear through today's backlog without relying on a
>> future
>>>>>>>> piece of technology from magically fixing it?
>>>>>>>>
>>>>>>>> For clearing the backlog, I don't see any solution other than
>> "people
>>>>>>>> put in time". I know its an obligation for committers to do this,
>> but  I
>>>>>>>> also know how little time most of us have to do things other than
>> deal
>>>>>>>> with our own tests failing. As a result, things that aren't viewed
>> as
>>>>>>>> critical get neglected. Shell, build, object stores, cruft cleanup,
>> etc,
>>>>>>>> I think people that care about these areas are going to have to get
>>>>>>>> together and sync up. For some of the stuff it may be quite fast
>> ‹people
>>>>>>>> may not have noticed, but a few of us have brought the build
>>>>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
>>>>>>>> branch-2/trunk being compatible with recent Guava versions and java
>> 8.
>>>>>>>>
>>>>>>>> I've been doing some S3/object store work the last couple of
>> weekends;
>>>>>>>> that's slow as test runs take 30+ minutes against the far end, test
>> runs
>>>>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>>>>> fs/swift queue their input is welcome.
>>>>>>>>
>>>>>>>> And of course AW went through the entire backlog of shell stuff & a
>> lot
>>>>>>>> of the not-in-branch-2 features.
>>>>>>>>
>>>>>>>> So where now? What is a strategy to deal with all those things in
>> the
>>>>>>>> queue?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>
>>
>
>

Re: Patch review process

Posted by Karthik Kambatla <ka...@cloudera.com>.

+1 to patch managers per component.


On Wed, Feb 4, 2015 at 12:29 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

>
>         Is process really the problem?  Or, more directly, how does any of
> this actually increase the pool beyond the (I’m feeling generous today) 10
> or so committers (never mind PMC) that actually review patches that come
> from outside their employers on a regular basis?
>

Process might not be the source of the problem, however process will help
with alleviating the current situation.

It would definitely help to increase the number of active committers. Might
not be very hard to add committers, but I don't know of a way to make them
active.


>
>         To put this in perspective, there are over 1000 JIRAs in patch
> available status across all three projects right now. That’s not even
> counting the ones that I know I’ve personally removed the PA status on
> because the patch no longer applies...
>
>
> On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org> wrote:
>
> > Release managers are just committers trying to roll releases; it's not
> > an enduring role. A patch manager is just someone helping to track
> > work and direct reviewers to issues. The job doesn't come with a hat.
> > We could look into a badge and gun if that would help.
>

Badge and gun will ensure a single patch-manager per component.


> >
> > This doesn't require a lot of hand-wringing or diagnosis. If you're
> > concerned about the queue, then start trying to find reviewers for
> > viable patches.
> >
> > We should also close issues that require too much work to fix, or at
> > least mark them for "Later". Not every idea needs to end in a commit,
> > but silence is frustrating for contributors. -C
>

+1.


> >
> > On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org>
> wrote:
> >> I wonder if this work logically falls under the release manager role.
> >>
> >> During a release, we generally spend a little bit of time thinking
> >> about what new features we added, systems we stabilized, interfaces we
> >> changed, etc. etc.  This gives us some perspective to look backwards
> >> at old JIRAs and either close them as no longer relevant, or target
> >> them for the next release (with appropriate encouragement to the
> >> people who might have the expertise to make that happen.)
> >>
> >> best,
> >> Colin
> >>
> >> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
> >>> +1 on the idea of patch managers. As the patch managers should have
> >>> good expertise on the specific fields, they are more productive on
> >>> reviewing the patches and driving the development on the specific
> >>> fields forward.
> >>>
> >>>
> >>> ~Haohui
> >>>
> >>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <
> cnauroth@hortonworks.com> wrote:
> >>>> I like the idea of patch managers monitoring specific queues of
> issues,
> >>>> perhaps implemented as a set of jira filters on different values for
> the
> >>>> component or label fields.  Right now, looking at the whole HADOOP
> backlog
> >>>> is daunting.  Using separate filtered review queues could help each
> >>>> reviewer focus and parallelize the work.
> >>>>
> >>>> Going back to the topic of tooling, I just learned that multiple
> Apache
> >>>> projects have expressed interest in Gerrit recently.  I've never used
> >>>> Gerrit and so can¹t speak in favor or against it, but I think
> consistency
> >>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
> >>>> issue is closed, but there is recent discussion in the comments.
> >>>>
> >>>> https://issues.apache.org/jira/browse/INFRA-2205
> >>>>
> >>>>
> >>>> Chris Nauroth
> >>>> Hortonworks
> >>>> http://hortonworks.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
> >>>>
> >>>>> Many projects have unofficial "patch managers":
> >>>>>
> >>>>>
> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
> >>>>>
> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
> >>>>>
> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
> >>>>>
> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
> >>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
> >>>>>
> >>>>> People who go through outstanding issues, ensuring that each has
> >>>>> reached a stable state, or at least a willing reviewer. -C
> >>>>>
> >>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <
> stevel@hortonworks.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Given experience of apache reviews, I don't know how much time to
> spend
> >>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration is
> what
> >>>>>> is sought, Cruicible sounds better.
> >>>>>>
> >>>>>> Returning to other issues in the discussion
> >>>>>>
> >>>>>> 1. Improving test times would make a big difference; locally as
> well as
> >>>>>> on Jira.
> >>>>>>
> >>>>>> 2. How can we clear through today's backlog without relying on a
> future
> >>>>>> piece of technology from magically fixing it?
> >>>>>>
> >>>>>> For clearing the backlog, I don't see any solution other than
> "people
> >>>>>> put in time". I know its an obligation for committers to do this,
> but  I
> >>>>>> also know how little time most of us have to do things other than
> deal
> >>>>>> with our own tests failing. As a result, things that aren't viewed
> as
> >>>>>> critical get neglected. Shell, build, object stores, cruft cleanup,
> etc,
> >>>>>> I think people that care about these areas are going to have to get
> >>>>>> together and sync up. For some of the stuff it may be quite fast
> ‹people
> >>>>>> may not have noticed, but a few of us have brought the build
> >>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
> >>>>>> branch-2/trunk being compatible with recent Guava versions and java
> 8.
> >>>>>>
> >>>>>> I've been doing some S3/object store work the last couple of
> weekends;
> >>>>>> that's slow as test runs take 30+ minutes against the far end, test
> runs
> >>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
> >>>>>> fs/swift queue their input is welcome.
> >>>>>>
> >>>>>> And of course AW went through the entire backlog of shell stuff & a
> lot
> >>>>>> of the not-in-branch-2 features.
> >>>>>>
> >>>>>> So where now? What is a strategy to deal with all those things in
> the
> >>>>>> queue?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
>
>


-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.
--------------------------------------------
http://five.sentenc.es

Re: Patch review process

Posted by Chris Douglas <cd...@apache.org>.

There are many ways to find reviewers. Look at the set of watchers,
email people who work on that component (check git if you're unsure
who's been there recently), or even email random committers and ask
for leads. Privately ask people why they stopped responding to an
issue. Even if an issue has a +1 from a coworker or close
collaborator, solicit feedback from new committers to set their
expectations for the role. Be gracious to reviewers; you're inviting
them to volunteer.

A "patch manager" isn't process, but a role that's available, useful,
and appreciated. -C

On Wed, Feb 4, 2015 at 12:29 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
>         Is process really the problem?  Or, more directly, how does any of this actually increase the pool beyond the (I’m feeling generous today) 10 or so committers (never mind PMC) that actually review patches that come from outside their employers on a regular basis?
>
>         To put this in perspective, there are over 1000 JIRAs in patch available status across all three projects right now. That’s not even counting the ones that I know I’ve personally removed the PA status on because the patch no longer applies...
>
>
> On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org> wrote:
>
>> Release managers are just committers trying to roll releases; it's not
>> an enduring role. A patch manager is just someone helping to track
>> work and direct reviewers to issues. The job doesn't come with a hat.
>> We could look into a badge and gun if that would help.
>>
>> This doesn't require a lot of hand-wringing or diagnosis. If you're
>> concerned about the queue, then start trying to find reviewers for
>> viable patches.
>>
>> We should also close issues that require too much work to fix, or at
>> least mark them for "Later". Not every idea needs to end in a commit,
>> but silence is frustrating for contributors. -C
>>
>> On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org> wrote:
>>> I wonder if this work logically falls under the release manager role.
>>>
>>> During a release, we generally spend a little bit of time thinking
>>> about what new features we added, systems we stabilized, interfaces we
>>> changed, etc. etc.  This gives us some perspective to look backwards
>>> at old JIRAs and either close them as no longer relevant, or target
>>> them for the next release (with appropriate encouragement to the
>>> people who might have the expertise to make that happen.)
>>>
>>> best,
>>> Colin
>>>
>>> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
>>>> +1 on the idea of patch managers. As the patch managers should have
>>>> good expertise on the specific fields, they are more productive on
>>>> reviewing the patches and driving the development on the specific
>>>> fields forward.
>>>>
>>>>
>>>> ~Haohui
>>>>
>>>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>>> I like the idea of patch managers monitoring specific queues of issues,
>>>>> perhaps implemented as a set of jira filters on different values for the
>>>>> component or label fields.  Right now, looking at the whole HADOOP backlog
>>>>> is daunting.  Using separate filtered review queues could help each
>>>>> reviewer focus and parallelize the work.
>>>>>
>>>>> Going back to the topic of tooling, I just learned that multiple Apache
>>>>> projects have expressed interest in Gerrit recently.  I've never used
>>>>> Gerrit and so can¹t speak in favor or against it, but I think consistency
>>>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
>>>>> issue is closed, but there is recent discussion in the comments.
>>>>>
>>>>> https://issues.apache.org/jira/browse/INFRA-2205
>>>>>
>>>>>
>>>>> Chris Nauroth
>>>>> Hortonworks
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>>>>
>>>>>> Many projects have unofficial "patch managers":
>>>>>>
>>>>>> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>>>> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>>>> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>>>> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>>>>
>>>>>> People who go through outstanding issues, ensuring that each has
>>>>>> reached a stable state, or at least a willing reviewer. -C
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Given experience of apache reviews, I don't know how much time to spend
>>>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>>>>>> is sought, Cruicible sounds better.
>>>>>>>
>>>>>>> Returning to other issues in the discussion
>>>>>>>
>>>>>>> 1. Improving test times would make a big difference; locally as well as
>>>>>>> on Jira.
>>>>>>>
>>>>>>> 2. How can we clear through today's backlog without relying on a future
>>>>>>> piece of technology from magically fixing it?
>>>>>>>
>>>>>>> For clearing the backlog, I don't see any solution other than "people
>>>>>>> put in time". I know its an obligation for committers to do this, but  I
>>>>>>> also know how little time most of us have to do things other than deal
>>>>>>> with our own tests failing. As a result, things that aren't viewed as
>>>>>>> critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>>>>>> I think people that care about these areas are going to have to get
>>>>>>> together and sync up. For some of the stuff it may be quite fast ‹people
>>>>>>> may not have noticed, but a few of us have brought the build
>>>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
>>>>>>> branch-2/trunk being compatible with recent Guava versions and java 8.
>>>>>>>
>>>>>>> I've been doing some S3/object store work the last couple of weekends;
>>>>>>> that's slow as test runs take 30+ minutes against the far end, test runs
>>>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>>>> fs/swift queue their input is welcome.
>>>>>>>
>>>>>>> And of course AW went through the entire backlog of shell stuff & a lot
>>>>>>> of the not-in-branch-2 features.
>>>>>>>
>>>>>>> So where now? What is a strategy to deal with all those things in the
>>>>>>> queue?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>

Re: Patch review process

Posted by Allen Wittenauer <aw...@altiscale.com>.

	Is process really the problem?  Or, more directly, how does any of this actually increase the pool beyond the (I’m feeling generous today) 10 or so committers (never mind PMC) that actually review patches that come from outside their employers on a regular basis?

	To put this in perspective, there are over 1000 JIRAs in patch available status across all three projects right now. That’s not even counting the ones that I know I’ve personally removed the PA status on because the patch no longer applies...


On Feb 4, 2015, at 12:10 PM, Chris Douglas <cd...@apache.org> wrote:

> Release managers are just committers trying to roll releases; it's not
> an enduring role. A patch manager is just someone helping to track
> work and direct reviewers to issues. The job doesn't come with a hat.
> We could look into a badge and gun if that would help.
> 
> This doesn't require a lot of hand-wringing or diagnosis. If you're
> concerned about the queue, then start trying to find reviewers for
> viable patches.
> 
> We should also close issues that require too much work to fix, or at
> least mark them for "Later". Not every idea needs to end in a commit,
> but silence is frustrating for contributors. -C
> 
> On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org> wrote:
>> I wonder if this work logically falls under the release manager role.
>> 
>> During a release, we generally spend a little bit of time thinking
>> about what new features we added, systems we stabilized, interfaces we
>> changed, etc. etc.  This gives us some perspective to look backwards
>> at old JIRAs and either close them as no longer relevant, or target
>> them for the next release (with appropriate encouragement to the
>> people who might have the expertise to make that happen.)
>> 
>> best,
>> Colin
>> 
>> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
>>> +1 on the idea of patch managers. As the patch managers should have
>>> good expertise on the specific fields, they are more productive on
>>> reviewing the patches and driving the development on the specific
>>> fields forward.
>>> 
>>> 
>>> ~Haohui
>>> 
>>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>> I like the idea of patch managers monitoring specific queues of issues,
>>>> perhaps implemented as a set of jira filters on different values for the
>>>> component or label fields.  Right now, looking at the whole HADOOP backlog
>>>> is daunting.  Using separate filtered review queues could help each
>>>> reviewer focus and parallelize the work.
>>>> 
>>>> Going back to the topic of tooling, I just learned that multiple Apache
>>>> projects have expressed interest in Gerrit recently.  I've never used
>>>> Gerrit and so can¹t speak in favor or against it, but I think consistency
>>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
>>>> issue is closed, but there is recent discussion in the comments.
>>>> 
>>>> https://issues.apache.org/jira/browse/INFRA-2205
>>>> 
>>>> 
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>>> 
>>>>> Many projects have unofficial "patch managers":
>>>>> 
>>>>> http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>>> 7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>>> zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>>> pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>>> 54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>>> 
>>>>> People who go through outstanding issues, ensuring that each has
>>>>> reached a stable state, or at least a willing reviewer. -C
>>>>> 
>>>>> On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>>>>> wrote:
>>>>>> 
>>>>>> Given experience of apache reviews, I don't know how much time to spend
>>>>>> on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>>>>> is sought, Cruicible sounds better.
>>>>>> 
>>>>>> Returning to other issues in the discussion
>>>>>> 
>>>>>> 1. Improving test times would make a big difference; locally as well as
>>>>>> on Jira.
>>>>>> 
>>>>>> 2. How can we clear through today's backlog without relying on a future
>>>>>> piece of technology from magically fixing it?
>>>>>> 
>>>>>> For clearing the backlog, I don't see any solution other than "people
>>>>>> put in time". I know its an obligation for committers to do this, but  I
>>>>>> also know how little time most of us have to do things other than deal
>>>>>> with our own tests failing. As a result, things that aren't viewed as
>>>>>> critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>>>>> I think people that care about these areas are going to have to get
>>>>>> together and sync up. For some of the stuff it may be quite fast ‹people
>>>>>> may not have noticed, but a few of us have brought the build
>>>>>> dependencies forward fairly fast recently, with a goal of Hadoop
>>>>>> branch-2/trunk being compatible with recent Guava versions and java 8.
>>>>>> 
>>>>>> I've been doing some S3/object store work the last couple of weekends;
>>>>>> that's slow as test runs take 30+ minutes against the far end, test runs
>>>>>> jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>>> fs/swift queue their input is welcome.
>>>>>> 
>>>>>> And of course AW went through the entire backlog of shell stuff & a lot
>>>>>> of the not-in-branch-2 features.
>>>>>> 
>>>>>> So where now? What is a strategy to deal with all those things in the
>>>>>> queue?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>

Re: Patch review process

Posted by Chris Douglas <cd...@apache.org>.

Release managers are just committers trying to roll releases; it's not
an enduring role. A patch manager is just someone helping to track
work and direct reviewers to issues. The job doesn't come with a hat.
We could look into a badge and gun if that would help.

This doesn't require a lot of hand-wringing or diagnosis. If you're
concerned about the queue, then start trying to find reviewers for
viable patches.

We should also close issues that require too much work to fix, or at
least mark them for "Later". Not every idea needs to end in a commit,
but silence is frustrating for contributors. -C

On Wed, Feb 4, 2015 at 10:24 AM, Colin P. McCabe <cm...@apache.org> wrote:
> I wonder if this work logically falls under the release manager role.
>
> During a release, we generally spend a little bit of time thinking
> about what new features we added, systems we stabilized, interfaces we
> changed, etc. etc.  This gives us some perspective to look backwards
> at old JIRAs and either close them as no longer relevant, or target
> them for the next release (with appropriate encouragement to the
> people who might have the expertise to make that happen.)
>
> best,
> Colin
>
> On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
>> +1 on the idea of patch managers. As the patch managers should have
>> good expertise on the specific fields, they are more productive on
>> reviewing the patches and driving the development on the specific
>> fields forward.
>>
>>
>> ~Haohui
>>
>> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> I like the idea of patch managers monitoring specific queues of issues,
>>> perhaps implemented as a set of jira filters on different values for the
>>> component or label fields.  Right now, looking at the whole HADOOP backlog
>>> is daunting.  Using separate filtered review queues could help each
>>> reviewer focus and parallelize the work.
>>>
>>> Going back to the topic of tooling, I just learned that multiple Apache
>>> projects have expressed interest in Gerrit recently.  I've never used
>>> Gerrit and so can¹t speak in favor or against it, but I think consistency
>>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
>>> issue is closed, but there is recent discussion in the comments.
>>>
>>> https://issues.apache.org/jira/browse/INFRA-2205
>>>
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>>
>>>>Many projects have unofficial "patch managers":
>>>>
>>>>http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>>7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>>zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>>pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>>54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>>
>>>>People who go through outstanding issues, ensuring that each has
>>>>reached a stable state, or at least a willing reviewer. -C
>>>>
>>>>On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>>>>wrote:
>>>>>
>>>>> Given experience of apache reviews, I don't know how much time to spend
>>>>>on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>>>>is sought, Cruicible sounds better.
>>>>>
>>>>> Returning to other issues in the discussion
>>>>>
>>>>> 1. Improving test times would make a big difference; locally as well as
>>>>>on Jira.
>>>>>
>>>>> 2. How can we clear through today's backlog without relying on a future
>>>>>piece of technology from magically fixing it?
>>>>>
>>>>> For clearing the backlog, I don't see any solution other than "people
>>>>>put in time". I know its an obligation for committers to do this, but  I
>>>>>also know how little time most of us have to do things other than deal
>>>>>with our own tests failing. As a result, things that aren't viewed as
>>>>>critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>>>>I think people that care about these areas are going to have to get
>>>>>together and sync up. For some of the stuff it may be quite fast ‹people
>>>>>may not have noticed, but a few of us have brought the build
>>>>>dependencies forward fairly fast recently, with a goal of Hadoop
>>>>>branch-2/trunk being compatible with recent Guava versions and java 8.
>>>>>
>>>>> I've been doing some S3/object store work the last couple of weekends;
>>>>>that's slow as test runs take 30+ minutes against the far end, test runs
>>>>>jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>>fs/swift queue their input is welcome.
>>>>>
>>>>> And of course AW went through the entire backlog of shell stuff & a lot
>>>>>of the not-in-branch-2 features.
>>>>>
>>>>> So where now? What is a strategy to deal with all those things in the
>>>>>queue?
>>>>>
>>>>>
>>>>>
>>>>>
>>>

Re: Patch review process

Posted by "Colin P. McCabe" <cm...@apache.org>.

I wonder if this work logically falls under the release manager role.

During a release, we generally spend a little bit of time thinking
about what new features we added, systems we stabilized, interfaces we
changed, etc. etc.  This gives us some perspective to look backwards
at old JIRAs and either close them as no longer relevant, or target
them for the next release (with appropriate encouragement to the
people who might have the expertise to make that happen.)

best,
Colin

On Mon, Feb 2, 2015 at 2:03 PM, Mai Haohui <ri...@gmail.com> wrote:
> +1 on the idea of patch managers. As the patch managers should have
> good expertise on the specific fields, they are more productive on
> reviewing the patches and driving the development on the specific
> fields forward.
>
>
> ~Haohui
>
> On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> I like the idea of patch managers monitoring specific queues of issues,
>> perhaps implemented as a set of jira filters on different values for the
>> component or label fields.  Right now, looking at the whole HADOOP backlog
>> is daunting.  Using separate filtered review queues could help each
>> reviewer focus and parallelize the work.
>>
>> Going back to the topic of tooling, I just learned that multiple Apache
>> projects have expressed interest in Gerrit recently.  I've never used
>> Gerrit and so can¹t speak in favor or against it, but I think consistency
>> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
>> issue is closed, but there is recent discussion in the comments.
>>
>> https://issues.apache.org/jira/browse/INFRA-2205
>>
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>>
>>>Many projects have unofficial "patch managers":
>>>
>>>http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>>7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>>zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>>pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>>54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>>
>>>People who go through outstanding issues, ensuring that each has
>>>reached a stable state, or at least a willing reviewer. -C
>>>
>>>On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>>>wrote:
>>>>
>>>> Given experience of apache reviews, I don't know how much time to spend
>>>>on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>>>is sought, Cruicible sounds better.
>>>>
>>>> Returning to other issues in the discussion
>>>>
>>>> 1. Improving test times would make a big difference; locally as well as
>>>>on Jira.
>>>>
>>>> 2. How can we clear through today's backlog without relying on a future
>>>>piece of technology from magically fixing it?
>>>>
>>>> For clearing the backlog, I don't see any solution other than "people
>>>>put in time". I know its an obligation for committers to do this, but  I
>>>>also know how little time most of us have to do things other than deal
>>>>with our own tests failing. As a result, things that aren't viewed as
>>>>critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>>>I think people that care about these areas are going to have to get
>>>>together and sync up. For some of the stuff it may be quite fast ‹people
>>>>may not have noticed, but a few of us have brought the build
>>>>dependencies forward fairly fast recently, with a goal of Hadoop
>>>>branch-2/trunk being compatible with recent Guava versions and java 8.
>>>>
>>>> I've been doing some S3/object store work the last couple of weekends;
>>>>that's slow as test runs take 30+ minutes against the far end, test runs
>>>>jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>>fs/swift queue their input is welcome.
>>>>
>>>> And of course AW went through the entire backlog of shell stuff & a lot
>>>>of the not-in-branch-2 features.
>>>>
>>>> So where now? What is a strategy to deal with all those things in the
>>>>queue?
>>>>
>>>>
>>>>
>>>>
>>

Re: Patch review process

Posted by Mai Haohui <ri...@gmail.com>.

+1 on the idea of patch managers. As the patch managers should have
good expertise on the specific fields, they are more productive on
reviewing the patches and driving the development on the specific
fields forward.


~Haohui

On Mon, Feb 2, 2015 at 12:55 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> I like the idea of patch managers monitoring specific queues of issues,
> perhaps implemented as a set of jira filters on different values for the
> component or label fields.  Right now, looking at the whole HADOOP backlog
> is daunting.  Using separate filtered review queues could help each
> reviewer focus and parallelize the work.
>
> Going back to the topic of tooling, I just learned that multiple Apache
> projects have expressed interest in Gerrit recently.  I've never used
> Gerrit and so can¹t speak in favor or against it, but I think consistency
> across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
> issue is closed, but there is recent discussion in the comments.
>
> https://issues.apache.org/jira/browse/INFRA-2205
>
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:
>
>>Many projects have unofficial "patch managers":
>>
>>http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>>7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>>zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>>pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>>54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>>
>>People who go through outstanding issues, ensuring that each has
>>reached a stable state, or at least a willing reviewer. -C
>>
>>On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>>wrote:
>>>
>>> Given experience of apache reviews, I don't know how much time to spend
>>>on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>>is sought, Cruicible sounds better.
>>>
>>> Returning to other issues in the discussion
>>>
>>> 1. Improving test times would make a big difference; locally as well as
>>>on Jira.
>>>
>>> 2. How can we clear through today's backlog without relying on a future
>>>piece of technology from magically fixing it?
>>>
>>> For clearing the backlog, I don't see any solution other than "people
>>>put in time". I know its an obligation for committers to do this, but  I
>>>also know how little time most of us have to do things other than deal
>>>with our own tests failing. As a result, things that aren't viewed as
>>>critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>>I think people that care about these areas are going to have to get
>>>together and sync up. For some of the stuff it may be quite fast ‹people
>>>may not have noticed, but a few of us have brought the build
>>>dependencies forward fairly fast recently, with a goal of Hadoop
>>>branch-2/trunk being compatible with recent Guava versions and java 8.
>>>
>>> I've been doing some S3/object store work the last couple of weekends;
>>>that's slow as test runs take 30+ minutes against the far end, test runs
>>>jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>>fs/swift queue their input is welcome.
>>>
>>> And of course AW went through the entire backlog of shell stuff & a lot
>>>of the not-in-branch-2 features.
>>>
>>> So where now? What is a strategy to deal with all those things in the
>>>queue?
>>>
>>>
>>>
>>>
>

Re: Patch review process

Posted by Chris Nauroth <cn...@hortonworks.com>.

I like the idea of patch managers monitoring specific queues of issues,
perhaps implemented as a set of jira filters on different values for the
component or label fields.  Right now, looking at the whole HADOOP backlog
is daunting.  Using separate filtered review queues could help each
reviewer focus and parallelize the work.

Going back to the topic of tooling, I just learned that multiple Apache
projects have expressed interest in Gerrit recently.  I've never used
Gerrit and so can¹t speak in favor or against it, but I think consistency
across Apache has benefits.  Issue INFRA-2205 has the discussion.  The
issue is closed, but there is recent discussion in the comments.

https://issues.apache.org/jira/browse/INFRA-2205


Chris Nauroth
Hortonworks
http://hortonworks.com/






On 2/2/15, 3:56 AM, "Chris Douglas" <cd...@apache.org> wrote:

>Many projects have unofficial "patch managers":
>
>http://cp.mcafee.com/d/avndxMs73gsrhojju7f9TsdTdETsuK-MOOMrhKUqem76kkkPqdT
>7HLIcILCQrK6zB5ByVEVKrJ3mURCj1heIpRwoH4HjBPpeIpRwoH4HjBPvKLKeSovW_8ELfK6zB
>zHTbFICzBPAQrICzBNXBHFShhlKCNOEuvkzaT0QSyrhdTVeZXTLuZXCXCM0qQqEdSB0zmBenPU
>pgDIvbGX3ifG_2v0U35JDoCnlS6AvyrnlH0KxVAL7VJNwnu7cLCzALq6JNHcCsjH6to6aNaQVs
>54ZgHlrJmSNf-00CS4QSjobZ8Qg1rpS9Cy2fCpuod42QqS-B3qr1LpPX92TieQHh
>
>People who go through outstanding issues, ensuring that each has
>reached a stable state, or at least a willing reviewer. -C
>
>On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com>
>wrote:
>>
>> Given experience of apache reviews, I don't know how much time to spend
>>on it. I'm curious about Gerrit, but again, if JIRA integration is what
>>is sought, Cruicible sounds better.
>>
>> Returning to other issues in the discussion
>>
>> 1. Improving test times would make a big difference; locally as well as
>>on Jira.
>>
>> 2. How can we clear through today's backlog without relying on a future
>>piece of technology from magically fixing it?
>>
>> For clearing the backlog, I don't see any solution other than "people
>>put in time". I know its an obligation for committers to do this, but  I
>>also know how little time most of us have to do things other than deal
>>with our own tests failing. As a result, things that aren't viewed as
>>critical get neglected. Shell, build, object stores, cruft cleanup, etc,
>>I think people that care about these areas are going to have to get
>>together and sync up. For some of the stuff it may be quite fast ‹people
>>may not have noticed, but a few of us have brought the build
>>dependencies forward fairly fast recently, with a goal of Hadoop
>>branch-2/trunk being compatible with recent Guava versions and java 8.
>>
>> I've been doing some S3/object store work the last couple of weekends;
>>that's slow as test runs take 30+ minutes against the far end, test runs
>>jenkins doesn't do. If anyone else wants to look at the fs/s3 and
>>fs/swift queue their input is welcome.
>>
>> And of course AW went through the entire backlog of shell stuff & a lot
>>of the not-in-branch-2 features.
>>
>> So where now? What is a strategy to deal with all those things in the
>>queue?
>>
>>
>>
>>

Re: Patch review process

Posted by Chris Douglas <cd...@apache.org>.

Many projects have unofficial "patch managers":

http://producingoss.com/en/share-management.html#patch-manager

People who go through outstanding issues, ensuring that each has
reached a stable state, or at least a willing reviewer. -C

On Mon, Feb 2, 2015 at 3:45 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
> Given experience of apache reviews, I don't know how much time to spend on it. I'm curious about Gerrit, but again, if JIRA integration is what is sought, Cruicible sounds better.
>
> Returning to other issues in the discussion
>
> 1. Improving test times would make a big difference; locally as well as on Jira.
>
> 2. How can we clear through today's backlog without relying on a future piece of technology from magically fixing it?
>
> For clearing the backlog, I don't see any solution other than "people put in time". I know its an obligation for committers to do this, but  I also know how little time most of us have to do things other than deal with our own tests failing. As a result, things that aren't viewed as critical get neglected. Shell, build, object stores, cruft cleanup, etc, I think people that care about these areas are going to have to get together and sync up. For some of the stuff it may be quite fast —people may not have noticed, but a few of us have brought the build dependencies forward fairly fast recently, with a goal of Hadoop branch-2/trunk being compatible with recent Guava versions and java 8.
>
> I've been doing some S3/object store work the last couple of weekends; that's slow as test runs take 30+ minutes against the far end, test runs jenkins doesn't do. If anyone else wants to look at the fs/s3 and fs/swift queue their input is welcome.
>
> And of course AW went through the entire backlog of shell stuff & a lot of the not-in-branch-2 features.
>
> So where now? What is a strategy to deal with all those things in the queue?
>
>
>
>