You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Thomas Weise <th...@apache.org> on 2018/11/17 02:49:13 UTC

[DISCUSS] Reverting commits on green post-commit status

We have observed instances of changes being reverted in master that have
been authored following the contributor guidelines and pass all tests (post
commit). While we generally seem to have quite a bit of revert action
happening [1], this thread is about those instances that are outside of our
documented policies.

For a contributor, it isn't a good experience to see reverts (especially
not out of the blue) after a PR has been reviewed, all tests pass and
generally care has been taken to do the right things.

Changes can have unforeseen effects downstream. Usually releases provide
the opportunity to mitigate such issues, not necessarily by a revert, but
in many cases by another change that keeps everyone happy. Unexpected
reverts can break someone else and are disruptive.

Some discussion already took place on one specific PR [3], but that is just
an example and by no means an isolated incident.

On some of these reverts, "internal" problems with an important runner are
cited, with little to no explanation. It would be nice if folks with more
insight can shed more light on this.

I hope that as outcome of this discussion, we can arrive at a better
understanding of why and when such reverts were necessary and possibly find
ways to largely avoid them going forward.

Thanks!
Thomas


[1]
https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
[2] https://beam.apache.org/contribute/postcommits-policies/
[3] https://github.com/apache/beam/pull/7012

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Robert Bradshaw <ro...@google.com>.
Two hours is too quick for people around the world to respond. If
something is obviously wrong, that may be fine, but otherwise we
should give others time to respond.

I think there's another important distinction that impacts urgency and
impact: by definition, Beam's on Precommit/Postcommit tests run at
HEAD, but upstream projects are not so tightly constrained.

Thanks for offering to update the documentation! I'll be happy to
review once the policy settles here.

On Tue, Nov 20, 2018 at 10:15 PM Ruoyun Huang <ru...@google.com> wrote:
>
> The instructions in the post-commit policies page [1] is helpful, by not clear enough regarding what 'Rollback First' exactly means in the case of breaking a downstream project.  The discussions in this thread makes things more well defined.  I summarized things as an updating PR [2], please suggest if I missed anything, or whether we need that update at all.
>
> Also, one idea is we can have a time limit finding alternatives instead of rollbacks. That makes the steps more explicit and actionable, thus less likely anyone gets bad feelings, which then partially solves the concern that Maximilian brought up. Something like:  (4) If coulnd't figure out root cause within 2 hours, a rollback automatically becomes the best option.   Though that also depends on how much urgency such situation requires.
>
> [1] https://beam.apache.org/contribute/postcommits-policies-details/index.html#rollback_first
> [2] PR draft:  https://github.com/apache/beam/pull/7095
>
> On Mon, Nov 19, 2018 at 1:49 PM Maximilian Michels <mx...@apache.org> wrote:
>>
>> The way I read Thomas' original email is that it's generally not a nice
>> sign for a contributor if her work gets reverted. We all come from
>> different backgrounds. For some, reverting is just a tool to get the job
>> done, for others it might come across as offensive.
>>
>> I know of communities where reverting is the absolute last resort. Now,
>> Beam needs to find its own way. I think there are definitive advantages
>> to reverting quickly.
>>
>> In the most obvious case, when our tests are broken and a fix is not
>> viable, reverting unblocks other contributors to test their code. I
>> think this has been working fine in the past.
>>
>> In the less obvious case, an external Runner or system is broken due to
>> an update in the master. IMHO this does not warrant an immediate revert
>> on its own. As already mentioned, there should be some justification for
>> a rollback. This is not to make people's life harder but to figure out
>> whether the problem can be solved upstream or downstream, or with a
>> combination of both.
>>
>> I think Thomas wanted to address this latter case. It seems like we're
>> all more or less on the same page. The core problem is more related to
>> communicating reverts in a way that helps contributors to save face and
>> the community to work efficiently.
>>
>> Thanks,
>> Max
>>
>> On 19.11.18 10:51, Robert Bradshaw wrote:
>> > If something breaks Beam's post (or especially pre) commit tests, I
>> > agree that rollback is typically the best option and can be done
>> > quickly. The situation is totally different if it breaks downstream
>> > projects in which Kenn's three points are good criteria for determining
>> > if we should rollback, which should not be assumed to be the default option.
>> >
>> > I would say the root cause of the problem is insufficient visibility and
>> > testing. If external-to-beam tests (or production jobs) are broken in
>> > such a way that rollback is desired, I would say the onus (maybe not a
>> > hard requirement, but a high bar for exceptions) is on whoever is asking
>> > for the rollback to create and submit an external test that demonstrates
>> > the issue. It is their choice whether this is easier than rolling
>> > forward or otherwise working around the breakage. This seems like the
>> > only long-term sustainable option and should get us out of this bad
>> > situation.
>> >
>> > (As an aside, the bar for rolling back a runner-specific PR that brake
>> > that runner may be lower, though still not automatic as other changes
>> > may depend on it.)
>> >
>> > - Robert
>> >
>> > On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <kenn@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     Just adapting my PR commentary to this thread:
>> >
>> >     Our rollback first policy cannot apply to a change that passes all
>> >     of Beam's postcommit tests. It *does* apply to Beam's postcommit
>> >     suites for each and every runner; they are equally important in this
>> >     regard.
>> >
>> >     The purpose of rapid rollback without discussion is foremost to
>> >     restore test signal and not to disrupt the work of other
>> >     contributors, that is why it is OK to roll back before figuring out
>> >     if the change was actually bad. If that isn't at stake, the policy
>> >     doesn't make sense to apply.
>> >
>> >     But...
>> >
>> >       - We have at least three examples of runners where there are
>> >     probably tests outside the Beam repo: Dataflow, Samza runner, and
>> >     IBM Streams.
>> >       - We also may have users that try running their production loads
>> >     against Beam master branch to learn early whether the next release
>> >     will break them.
>> >
>> >     These are success stories for Beam. We should respect these other
>> >     sources of information for what they are: users and vendors giving
>> >     us a heads up about changes that will be a problem if we release them.
>> >
>> >     Often rollback is still a good option but IMO it is no longer
>> >     automatically the best option, and may not even be OK. I would say
>> >     that the case must be made clearly and *publicly* that
>> >
>> >     (1) something is actually broken
>> >     (2) the revert fixes the problem
>> >     (3) revert is the best option
>> >
>> >     In this scenario there is time to consider. An important and common
>> >     case is that a perfectly fine change exposes something already
>> >     broken, so the best option may be sickbaying downstream or pinning
>> >     their version/commit of Beam until they can fix.
>> >
>> >     Kenn
>> >
>> >
>> >     On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <altay@google.com
>> >     <ma...@google.com>> wrote:
>> >      >
>> >      > It sounds like we are in agreement that addressing issues sooner
>> >     is better. I think reverting is in general the less stressful option
>> >     because it allows a solution to be developed in parallel. Even with
>> >     that, it is not the only option we have and based on the severity
>> >     and the complexity of the problem we can consider other options.
>> >     Fixing forward might be feasible in some cases.
>> >      >
>> >      > We can bring issues back to the mailing list. This would be akin
>> >     to bringing any issues to the mailing list. I think JIRA is a better
>> >     tool for that. These reverts are happening because of an issue, and
>> >     JIRA allows informing all involved parties, creates emails to the
>> >     issues list for later searching through mailing archives, and
>> >     creates a record of things in structured way with components.
>> >      >
>> >      > We could establish general policies about for all reverts to have
>> >     an issue (which we already do because they are regular PRs),
>> >     including all people in the discussion (including the author and
>> >     reviewers) and follow up with new tests to expands Beam's test coverage.
>> >      >
>> >      > On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise
>> >     <thomas.weise@gmail.com <ma...@gmail.com>> wrote:
>> >      >>
>> >      >>
>> >      >>
>> >      >> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <altay@google.com
>> >     <ma...@google.com>> wrote:
>> >      >>>
>> >      >>> Thank you for bringing this discussion back to the mailing list.
>> >      >>>
>> >      >>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <thw@apache.org
>> >     <ma...@apache.org>> wrote:
>> >      >>>>
>> >      >>>> We have observed instances of changes being reverted in master
>> >     that have been authored following the contributor guidelines and
>> >     pass all tests (post commit). While we generally seem to have quite
>> >     a bit of revert action happening [1], this thread is about those
>> >     instances that are outside of our documented policies.
>> >      >>>>
>> >      >>>> For a contributor, it isn't a good experience to see reverts
>> >     (especially not out of the blue) after a PR has been reviewed, all
>> >     tests pass and generally care has been taken to do the right things.
>> >      >>>
>> >      >>>
>> >      >>> I completely agree. Everyone involved needs to have the context
>> >     about why a change is being reverted. A JIRA with information is
>> >     probably a good way to do it, similar to the any other issue we track.
>> >      >>>
>> >      >>>>
>> >      >>>>
>> >      >>>> Changes can have unforeseen effects downstream. Usually
>> >     releases provide the opportunity to mitigate such issues, not
>> >     necessarily by a revert, but in many cases by another change that
>> >     keeps everyone happy. Unexpected reverts can break someone else and
>> >     are disruptive.
>> >      >>>
>> >      >>>
>> >      >>> I disagree about waiting until release to resolve an identified
>> >     issue. Let's say we become aware of an issue through channels other
>> >     than Beam tests (e.g. user reports, a contributor running into
>> >     something not captured in Beam tests) and we know that it is a
>> >     credible issue that will block the release anyway. Addressing the
>> >     issue sooner will be less painful than addressing later. (I am not
>> >     suggesting addressing every such issue similar to we do not block
>> >     releases on every open issue. There needs to be a due diligence to
>> >     understand the severity and the impact.)
>> >      >>>
>> >      >>> We can improve on the above process. If we end up reverting a
>> >     change as a result of a report that is not covered by existing Beam
>> >     tests, we could expand the tests to catch same class of problems
>> >     even earlier by means of Beam's own tests.
>> >      >>
>> >      >>
>> >      >> I'm referring to the release process as an example how such
>> >     issues can be addressed. I'm not suggesting to wait until a release
>> >     to address issues; as you say the sooner they are identified the better.
>> >      >>
>> >      >> However, I don't agree with taking out the revert hammer on the
>> >     slightest sign that there could be a downstream problem. There are
>> >     better ways to handle this. First of all, I think that these issues
>> >     should be visible on the dev mailing list so that there is more
>> >     awareness overall (which will lead to better test coverage and
>> >     useful feedback in general). Second, we should make an effort to
>> >     resolve issues in a forward looking way. If after all it turns out
>> >     that a revert is most appropriate for the situation, it should
>> >     follow explicit agreement.
>> >      >>
>> >      >>>
>> >      >>>
>> >      >>>>
>> >      >>>>
>> >      >>>> Some discussion already took place on one specific PR [3], but
>> >     that is just an example and by no means an isolated incident.
>> >      >>>>
>> >      >>>> On some of these reverts, "internal" problems
>> >      >>>
>> >      >>>
>> >      >>> This is a communication issue that needs be addressed. Over
>> >     time we are getting new contributors and that is great, but it also
>> >     means these new folks need to understand operating conditions of
>> >     this community. Giving direct feedback would generally be most
>> >     efficient here.
>> >      >>>
>> >      >>>>
>> >      >>>> with an important runner are cited, with little to no
>> >     explanation. It would be nice if folks with more insight can shed
>> >     more light on this.
>> >      >>>
>> >      >>>
>> >      >>> All runners we accepted at out master branch and include in our
>> >     releases is important. I do not find a reference to an important
>> >     runner in the examples other than a question about what would the
>> >     outcome be for a runner other than Dataflow. I think the outcome
>> >     would be the same, and in my opinion it should be. We need to be
>> >     careful not breaking any runner we support.
>> >      >>>
>> >      >>>>
>> >      >>>> I hope that as outcome of this discussion, we can arrive at a
>> >     better understanding of why and when such reverts were necessary and
>> >     possibly find ways to largely avoid them going forward.
>> >      >>>>
>> >      >>>> Thanks!
>> >      >>>> Thomas
>> >      >>>>
>> >      >>>>
>> >      >>>> [1]
>> >     https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>> >      >>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>> >      >>>> [3] https://github.com/apache/beam/pull/7012
>> >      >>>>
>> >      >>>
>> >
>
>
>
> --
> ================
> Ruoyun  Huang
>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Mikhail Gryzykhin <mi...@google.com>.
Following the discussion, we are discussing as how to address failures
on external projects caused by changes to Beam project.

I believe that that rollback first in case of red pre/postcommit tests is a
valid option, since it blocks Beam development process.

In case if downstream project has test failure outside of Beam project,
doing a rollback in Beam can be really frustrating for Beam developers.

I'd support Ken and Robert on this:
1. It is best to start dev-list discussion for visibility and for a
decision on
further course of action.
2. Interested party is motivated expose or implement relevant testm(s)
inside Beam testing suites to prevent similar issue happening again.
This will also simplify the rollback question since the policy will be
applied due to tests going red. Ideally adding test will go together with
starting the discussion on dev-list and request of rollback.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Tue, Nov 20, 2018 at 1:15 PM Ruoyun Huang <ru...@google.com> wrote:

> The instructions in the post-commit policies page [1] is helpful, by not
> clear enough regarding what 'Rollback First' exactly means in the case of
> breaking a downstream project.  The discussions in this thread makes things
> more well defined.  I summarized things as an updating PR
> <https://github.com/apache/beam/pull/7095> [2], please suggest if I
> missed anything, or whether we need that update at all.
>
> Also, one idea is we can have a time limit finding alternatives instead of
> rollbacks. That makes the steps more explicit and actionable, thus less
> likely anyone gets bad feelings, which then partially solves the concern
> that Maximilian brought up. Something like:  (4) If coulnd't figure out
> root cause within 2 hours, a rollback automatically becomes the best
> option.   Though that also depends on how much urgency such situation
> requires.
>
> [1]
> https://beam.apache.org/contribute/postcommits-policies-details/index.html#rollback_first
> [2] PR draft:  https://github.com/apache/beam/pull/7095
>
> On Mon, Nov 19, 2018 at 1:49 PM Maximilian Michels <mx...@apache.org> wrote:
>
>> The way I read Thomas' original email is that it's generally not a nice
>> sign for a contributor if her work gets reverted. We all come from
>> different backgrounds. For some, reverting is just a tool to get the job
>> done, for others it might come across as offensive.
>>
>> I know of communities where reverting is the absolute last resort. Now,
>> Beam needs to find its own way. I think there are definitive advantages
>> to reverting quickly.
>>
>> In the most obvious case, when our tests are broken and a fix is not
>> viable, reverting unblocks other contributors to test their code. I
>> think this has been working fine in the past.
>>
>> In the less obvious case, an external Runner or system is broken due to
>> an update in the master. IMHO this does not warrant an immediate revert
>> on its own. As already mentioned, there should be some justification for
>> a rollback. This is not to make people's life harder but to figure out
>> whether the problem can be solved upstream or downstream, or with a
>> combination of both.
>>
>> I think Thomas wanted to address this latter case. It seems like we're
>> all more or less on the same page. The core problem is more related to
>> communicating reverts in a way that helps contributors to save face and
>> the community to work efficiently.
>>
>> Thanks,
>> Max
>>
>> On 19.11.18 10:51, Robert Bradshaw wrote:
>> > If something breaks Beam's post (or especially pre) commit tests, I
>> > agree that rollback is typically the best option and can be done
>> > quickly. The situation is totally different if it breaks downstream
>> > projects in which Kenn's three points are good criteria for determining
>> > if we should rollback, which should not be assumed to be the default
>> option.
>> >
>> > I would say the root cause of the problem is insufficient visibility
>> and
>> > testing. If external-to-beam tests (or production jobs) are broken in
>> > such a way that rollback is desired, I would say the onus (maybe not a
>> > hard requirement, but a high bar for exceptions) is on whoever is
>> asking
>> > for the rollback to create and submit an external test that
>> demonstrates
>> > the issue. It is their choice whether this is easier than rolling
>> > forward or otherwise working around the breakage. This seems like the
>> > only long-term sustainable option and should get us out of this bad
>> > situation.
>> >
>> > (As an aside, the bar for rolling back a runner-specific PR that brake
>> > that runner may be lower, though still not automatic as other changes
>> > may depend on it.)
>> >
>> > - Robert
>> >
>> > On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <kenn@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     Just adapting my PR commentary to this thread:
>> >
>> >     Our rollback first policy cannot apply to a change that passes all
>> >     of Beam's postcommit tests. It *does* apply to Beam's postcommit
>> >     suites for each and every runner; they are equally important in this
>> >     regard.
>> >
>> >     The purpose of rapid rollback without discussion is foremost to
>> >     restore test signal and not to disrupt the work of other
>> >     contributors, that is why it is OK to roll back before figuring out
>> >     if the change was actually bad. If that isn't at stake, the policy
>> >     doesn't make sense to apply.
>> >
>> >     But...
>> >
>> >       - We have at least three examples of runners where there are
>> >     probably tests outside the Beam repo: Dataflow, Samza runner, and
>> >     IBM Streams.
>> >       - We also may have users that try running their production loads
>> >     against Beam master branch to learn early whether the next release
>> >     will break them.
>> >
>> >     These are success stories for Beam. We should respect these other
>> >     sources of information for what they are: users and vendors giving
>> >     us a heads up about changes that will be a problem if we release
>> them.
>> >
>> >     Often rollback is still a good option but IMO it is no longer
>> >     automatically the best option, and may not even be OK. I would say
>> >     that the case must be made clearly and *publicly* that
>> >
>> >     (1) something is actually broken
>> >     (2) the revert fixes the problem
>> >     (3) revert is the best option
>> >
>> >     In this scenario there is time to consider. An important and common
>> >     case is that a perfectly fine change exposes something already
>> >     broken, so the best option may be sickbaying downstream or pinning
>> >     their version/commit of Beam until they can fix.
>> >
>> >     Kenn
>> >
>> >
>> >     On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <altay@google.com
>> >     <ma...@google.com>> wrote:
>> >      >
>> >      > It sounds like we are in agreement that addressing issues sooner
>> >     is better. I think reverting is in general the less stressful option
>> >     because it allows a solution to be developed in parallel. Even with
>> >     that, it is not the only option we have and based on the severity
>> >     and the complexity of the problem we can consider other options.
>> >     Fixing forward might be feasible in some cases.
>> >      >
>> >      > We can bring issues back to the mailing list. This would be akin
>> >     to bringing any issues to the mailing list. I think JIRA is a better
>> >     tool for that. These reverts are happening because of an issue, and
>> >     JIRA allows informing all involved parties, creates emails to the
>> >     issues list for later searching through mailing archives, and
>> >     creates a record of things in structured way with components.
>> >      >
>> >      > We could establish general policies about for all reverts to have
>> >     an issue (which we already do because they are regular PRs),
>> >     including all people in the discussion (including the author and
>> >     reviewers) and follow up with new tests to expands Beam's test
>> coverage.
>> >      >
>> >      > On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise
>> >     <thomas.weise@gmail.com <ma...@gmail.com>> wrote:
>> >      >>
>> >      >>
>> >      >>
>> >      >> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <altay@google.com
>> >     <ma...@google.com>> wrote:
>> >      >>>
>> >      >>> Thank you for bringing this discussion back to the mailing
>> list.
>> >      >>>
>> >      >>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <thw@apache.org
>> >     <ma...@apache.org>> wrote:
>> >      >>>>
>> >      >>>> We have observed instances of changes being reverted in master
>> >     that have been authored following the contributor guidelines and
>> >     pass all tests (post commit). While we generally seem to have quite
>> >     a bit of revert action happening [1], this thread is about those
>> >     instances that are outside of our documented policies.
>> >      >>>>
>> >      >>>> For a contributor, it isn't a good experience to see reverts
>> >     (especially not out of the blue) after a PR has been reviewed, all
>> >     tests pass and generally care has been taken to do the right things.
>> >      >>>
>> >      >>>
>> >      >>> I completely agree. Everyone involved needs to have the context
>> >     about why a change is being reverted. A JIRA with information is
>> >     probably a good way to do it, similar to the any other issue we
>> track.
>> >      >>>
>> >      >>>>
>> >      >>>>
>> >      >>>> Changes can have unforeseen effects downstream. Usually
>> >     releases provide the opportunity to mitigate such issues, not
>> >     necessarily by a revert, but in many cases by another change that
>> >     keeps everyone happy. Unexpected reverts can break someone else and
>> >     are disruptive.
>> >      >>>
>> >      >>>
>> >      >>> I disagree about waiting until release to resolve an identified
>> >     issue. Let's say we become aware of an issue through channels other
>> >     than Beam tests (e.g. user reports, a contributor running into
>> >     something not captured in Beam tests) and we know that it is a
>> >     credible issue that will block the release anyway. Addressing the
>> >     issue sooner will be less painful than addressing later. (I am not
>> >     suggesting addressing every such issue similar to we do not block
>> >     releases on every open issue. There needs to be a due diligence to
>> >     understand the severity and the impact.)
>> >      >>>
>> >      >>> We can improve on the above process. If we end up reverting a
>> >     change as a result of a report that is not covered by existing Beam
>> >     tests, we could expand the tests to catch same class of problems
>> >     even earlier by means of Beam's own tests.
>> >      >>
>> >      >>
>> >      >> I'm referring to the release process as an example how such
>> >     issues can be addressed. I'm not suggesting to wait until a release
>> >     to address issues; as you say the sooner they are identified the
>> better.
>> >      >>
>> >      >> However, I don't agree with taking out the revert hammer on the
>> >     slightest sign that there could be a downstream problem. There are
>> >     better ways to handle this. First of all, I think that these issues
>> >     should be visible on the dev mailing list so that there is more
>> >     awareness overall (which will lead to better test coverage and
>> >     useful feedback in general). Second, we should make an effort to
>> >     resolve issues in a forward looking way. If after all it turns out
>> >     that a revert is most appropriate for the situation, it should
>> >     follow explicit agreement.
>> >      >>
>> >      >>>
>> >      >>>
>> >      >>>>
>> >      >>>>
>> >      >>>> Some discussion already took place on one specific PR [3], but
>> >     that is just an example and by no means an isolated incident.
>> >      >>>>
>> >      >>>> On some of these reverts, "internal" problems
>> >      >>>
>> >      >>>
>> >      >>> This is a communication issue that needs be addressed. Over
>> >     time we are getting new contributors and that is great, but it also
>> >     means these new folks need to understand operating conditions of
>> >     this community. Giving direct feedback would generally be most
>> >     efficient here.
>> >      >>>
>> >      >>>>
>> >      >>>> with an important runner are cited, with little to no
>> >     explanation. It would be nice if folks with more insight can shed
>> >     more light on this.
>> >      >>>
>> >      >>>
>> >      >>> All runners we accepted at out master branch and include in our
>> >     releases is important. I do not find a reference to an important
>> >     runner in the examples other than a question about what would the
>> >     outcome be for a runner other than Dataflow. I think the outcome
>> >     would be the same, and in my opinion it should be. We need to be
>> >     careful not breaking any runner we support.
>> >      >>>
>> >      >>>>
>> >      >>>> I hope that as outcome of this discussion, we can arrive at a
>> >     better understanding of why and when such reverts were necessary and
>> >     possibly find ways to largely avoid them going forward.
>> >      >>>>
>> >      >>>> Thanks!
>> >      >>>> Thomas
>> >      >>>>
>> >      >>>>
>> >      >>>> [1]
>> >
>> https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>> >      >>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>> >      >>>> [3] https://github.com/apache/beam/pull/7012
>> >      >>>>
>> >      >>>
>> >
>>
>
>
> --
> ================
> Ruoyun  Huang
>
>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Ruoyun Huang <ru...@google.com>.
The instructions in the post-commit policies page [1] is helpful, by not
clear enough regarding what 'Rollback First' exactly means in the case of
breaking a downstream project.  The discussions in this thread makes things
more well defined.  I summarized things as an updating PR
<https://github.com/apache/beam/pull/7095> [2], please suggest if I missed
anything, or whether we need that update at all.

Also, one idea is we can have a time limit finding alternatives instead of
rollbacks. That makes the steps more explicit and actionable, thus less
likely anyone gets bad feelings, which then partially solves the concern
that Maximilian brought up. Something like:  (4) If coulnd't figure out
root cause within 2 hours, a rollback automatically becomes the best
option.   Though that also depends on how much urgency such situation
requires.

[1]
https://beam.apache.org/contribute/postcommits-policies-details/index.html#rollback_first
[2] PR draft:  https://github.com/apache/beam/pull/7095

On Mon, Nov 19, 2018 at 1:49 PM Maximilian Michels <mx...@apache.org> wrote:

> The way I read Thomas' original email is that it's generally not a nice
> sign for a contributor if her work gets reverted. We all come from
> different backgrounds. For some, reverting is just a tool to get the job
> done, for others it might come across as offensive.
>
> I know of communities where reverting is the absolute last resort. Now,
> Beam needs to find its own way. I think there are definitive advantages
> to reverting quickly.
>
> In the most obvious case, when our tests are broken and a fix is not
> viable, reverting unblocks other contributors to test their code. I
> think this has been working fine in the past.
>
> In the less obvious case, an external Runner or system is broken due to
> an update in the master. IMHO this does not warrant an immediate revert
> on its own. As already mentioned, there should be some justification for
> a rollback. This is not to make people's life harder but to figure out
> whether the problem can be solved upstream or downstream, or with a
> combination of both.
>
> I think Thomas wanted to address this latter case. It seems like we're
> all more or less on the same page. The core problem is more related to
> communicating reverts in a way that helps contributors to save face and
> the community to work efficiently.
>
> Thanks,
> Max
>
> On 19.11.18 10:51, Robert Bradshaw wrote:
> > If something breaks Beam's post (or especially pre) commit tests, I
> > agree that rollback is typically the best option and can be done
> > quickly. The situation is totally different if it breaks downstream
> > projects in which Kenn's three points are good criteria for determining
> > if we should rollback, which should not be assumed to be the default
> option.
> >
> > I would say the root cause of the problem is insufficient visibility and
> > testing. If external-to-beam tests (or production jobs) are broken in
> > such a way that rollback is desired, I would say the onus (maybe not a
> > hard requirement, but a high bar for exceptions) is on whoever is asking
> > for the rollback to create and submit an external test that demonstrates
> > the issue. It is their choice whether this is easier than rolling
> > forward or otherwise working around the breakage. This seems like the
> > only long-term sustainable option and should get us out of this bad
> > situation.
> >
> > (As an aside, the bar for rolling back a runner-specific PR that brake
> > that runner may be lower, though still not automatic as other changes
> > may depend on it.)
> >
> > - Robert
> >
> > On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <kenn@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Just adapting my PR commentary to this thread:
> >
> >     Our rollback first policy cannot apply to a change that passes all
> >     of Beam's postcommit tests. It *does* apply to Beam's postcommit
> >     suites for each and every runner; they are equally important in this
> >     regard.
> >
> >     The purpose of rapid rollback without discussion is foremost to
> >     restore test signal and not to disrupt the work of other
> >     contributors, that is why it is OK to roll back before figuring out
> >     if the change was actually bad. If that isn't at stake, the policy
> >     doesn't make sense to apply.
> >
> >     But...
> >
> >       - We have at least three examples of runners where there are
> >     probably tests outside the Beam repo: Dataflow, Samza runner, and
> >     IBM Streams.
> >       - We also may have users that try running their production loads
> >     against Beam master branch to learn early whether the next release
> >     will break them.
> >
> >     These are success stories for Beam. We should respect these other
> >     sources of information for what they are: users and vendors giving
> >     us a heads up about changes that will be a problem if we release
> them.
> >
> >     Often rollback is still a good option but IMO it is no longer
> >     automatically the best option, and may not even be OK. I would say
> >     that the case must be made clearly and *publicly* that
> >
> >     (1) something is actually broken
> >     (2) the revert fixes the problem
> >     (3) revert is the best option
> >
> >     In this scenario there is time to consider. An important and common
> >     case is that a perfectly fine change exposes something already
> >     broken, so the best option may be sickbaying downstream or pinning
> >     their version/commit of Beam until they can fix.
> >
> >     Kenn
> >
> >
> >     On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <altay@google.com
> >     <ma...@google.com>> wrote:
> >      >
> >      > It sounds like we are in agreement that addressing issues sooner
> >     is better. I think reverting is in general the less stressful option
> >     because it allows a solution to be developed in parallel. Even with
> >     that, it is not the only option we have and based on the severity
> >     and the complexity of the problem we can consider other options.
> >     Fixing forward might be feasible in some cases.
> >      >
> >      > We can bring issues back to the mailing list. This would be akin
> >     to bringing any issues to the mailing list. I think JIRA is a better
> >     tool for that. These reverts are happening because of an issue, and
> >     JIRA allows informing all involved parties, creates emails to the
> >     issues list for later searching through mailing archives, and
> >     creates a record of things in structured way with components.
> >      >
> >      > We could establish general policies about for all reverts to have
> >     an issue (which we already do because they are regular PRs),
> >     including all people in the discussion (including the author and
> >     reviewers) and follow up with new tests to expands Beam's test
> coverage.
> >      >
> >      > On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise
> >     <thomas.weise@gmail.com <ma...@gmail.com>> wrote:
> >      >>
> >      >>
> >      >>
> >      >> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <altay@google.com
> >     <ma...@google.com>> wrote:
> >      >>>
> >      >>> Thank you for bringing this discussion back to the mailing list.
> >      >>>
> >      >>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <thw@apache.org
> >     <ma...@apache.org>> wrote:
> >      >>>>
> >      >>>> We have observed instances of changes being reverted in master
> >     that have been authored following the contributor guidelines and
> >     pass all tests (post commit). While we generally seem to have quite
> >     a bit of revert action happening [1], this thread is about those
> >     instances that are outside of our documented policies.
> >      >>>>
> >      >>>> For a contributor, it isn't a good experience to see reverts
> >     (especially not out of the blue) after a PR has been reviewed, all
> >     tests pass and generally care has been taken to do the right things.
> >      >>>
> >      >>>
> >      >>> I completely agree. Everyone involved needs to have the context
> >     about why a change is being reverted. A JIRA with information is
> >     probably a good way to do it, similar to the any other issue we
> track.
> >      >>>
> >      >>>>
> >      >>>>
> >      >>>> Changes can have unforeseen effects downstream. Usually
> >     releases provide the opportunity to mitigate such issues, not
> >     necessarily by a revert, but in many cases by another change that
> >     keeps everyone happy. Unexpected reverts can break someone else and
> >     are disruptive.
> >      >>>
> >      >>>
> >      >>> I disagree about waiting until release to resolve an identified
> >     issue. Let's say we become aware of an issue through channels other
> >     than Beam tests (e.g. user reports, a contributor running into
> >     something not captured in Beam tests) and we know that it is a
> >     credible issue that will block the release anyway. Addressing the
> >     issue sooner will be less painful than addressing later. (I am not
> >     suggesting addressing every such issue similar to we do not block
> >     releases on every open issue. There needs to be a due diligence to
> >     understand the severity and the impact.)
> >      >>>
> >      >>> We can improve on the above process. If we end up reverting a
> >     change as a result of a report that is not covered by existing Beam
> >     tests, we could expand the tests to catch same class of problems
> >     even earlier by means of Beam's own tests.
> >      >>
> >      >>
> >      >> I'm referring to the release process as an example how such
> >     issues can be addressed. I'm not suggesting to wait until a release
> >     to address issues; as you say the sooner they are identified the
> better.
> >      >>
> >      >> However, I don't agree with taking out the revert hammer on the
> >     slightest sign that there could be a downstream problem. There are
> >     better ways to handle this. First of all, I think that these issues
> >     should be visible on the dev mailing list so that there is more
> >     awareness overall (which will lead to better test coverage and
> >     useful feedback in general). Second, we should make an effort to
> >     resolve issues in a forward looking way. If after all it turns out
> >     that a revert is most appropriate for the situation, it should
> >     follow explicit agreement.
> >      >>
> >      >>>
> >      >>>
> >      >>>>
> >      >>>>
> >      >>>> Some discussion already took place on one specific PR [3], but
> >     that is just an example and by no means an isolated incident.
> >      >>>>
> >      >>>> On some of these reverts, "internal" problems
> >      >>>
> >      >>>
> >      >>> This is a communication issue that needs be addressed. Over
> >     time we are getting new contributors and that is great, but it also
> >     means these new folks need to understand operating conditions of
> >     this community. Giving direct feedback would generally be most
> >     efficient here.
> >      >>>
> >      >>>>
> >      >>>> with an important runner are cited, with little to no
> >     explanation. It would be nice if folks with more insight can shed
> >     more light on this.
> >      >>>
> >      >>>
> >      >>> All runners we accepted at out master branch and include in our
> >     releases is important. I do not find a reference to an important
> >     runner in the examples other than a question about what would the
> >     outcome be for a runner other than Dataflow. I think the outcome
> >     would be the same, and in my opinion it should be. We need to be
> >     careful not breaking any runner we support.
> >      >>>
> >      >>>>
> >      >>>> I hope that as outcome of this discussion, we can arrive at a
> >     better understanding of why and when such reverts were necessary and
> >     possibly find ways to largely avoid them going forward.
> >      >>>>
> >      >>>> Thanks!
> >      >>>> Thomas
> >      >>>>
> >      >>>>
> >      >>>> [1]
> >
> https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
> >      >>>> [2] https://beam.apache.org/contribute/postcommits-policies/
> >      >>>> [3] https://github.com/apache/beam/pull/7012
> >      >>>>
> >      >>>
> >
>


-- 
================
Ruoyun  Huang

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Maximilian Michels <mx...@apache.org>.
The way I read Thomas' original email is that it's generally not a nice 
sign for a contributor if her work gets reverted. We all come from 
different backgrounds. For some, reverting is just a tool to get the job 
done, for others it might come across as offensive.

I know of communities where reverting is the absolute last resort. Now, 
Beam needs to find its own way. I think there are definitive advantages 
to reverting quickly.

In the most obvious case, when our tests are broken and a fix is not 
viable, reverting unblocks other contributors to test their code. I 
think this has been working fine in the past.

In the less obvious case, an external Runner or system is broken due to 
an update in the master. IMHO this does not warrant an immediate revert 
on its own. As already mentioned, there should be some justification for 
a rollback. This is not to make people's life harder but to figure out 
whether the problem can be solved upstream or downstream, or with a 
combination of both.

I think Thomas wanted to address this latter case. It seems like we're 
all more or less on the same page. The core problem is more related to 
communicating reverts in a way that helps contributors to save face and 
the community to work efficiently.

Thanks,
Max

On 19.11.18 10:51, Robert Bradshaw wrote:
> If something breaks Beam's post (or especially pre) commit tests, I 
> agree that rollback is typically the best option and can be done 
> quickly. The situation is totally different if it breaks downstream 
> projects in which Kenn's three points are good criteria for determining 
> if we should rollback, which should not be assumed to be the default option.
> 
> I would say the root cause of the problem is insufficient visibility and 
> testing. If external-to-beam tests (or production jobs) are broken in 
> such a way that rollback is desired, I would say the onus (maybe not a 
> hard requirement, but a high bar for exceptions) is on whoever is asking 
> for the rollback to create and submit an external test that demonstrates 
> the issue. It is their choice whether this is easier than rolling 
> forward or otherwise working around the breakage. This seems like the 
> only long-term sustainable option and should get us out of this bad 
> situation.
> 
> (As an aside, the bar for rolling back a runner-specific PR that brake 
> that runner may be lower, though still not automatic as other changes 
> may depend on it.)
> 
> - Robert
> 
> On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <kenn@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Just adapting my PR commentary to this thread:
> 
>     Our rollback first policy cannot apply to a change that passes all
>     of Beam's postcommit tests. It *does* apply to Beam's postcommit
>     suites for each and every runner; they are equally important in this
>     regard.
> 
>     The purpose of rapid rollback without discussion is foremost to
>     restore test signal and not to disrupt the work of other
>     contributors, that is why it is OK to roll back before figuring out
>     if the change was actually bad. If that isn't at stake, the policy
>     doesn't make sense to apply.
> 
>     But...
> 
>       - We have at least three examples of runners where there are
>     probably tests outside the Beam repo: Dataflow, Samza runner, and
>     IBM Streams.
>       - We also may have users that try running their production loads
>     against Beam master branch to learn early whether the next release
>     will break them.
> 
>     These are success stories for Beam. We should respect these other
>     sources of information for what they are: users and vendors giving
>     us a heads up about changes that will be a problem if we release them.
> 
>     Often rollback is still a good option but IMO it is no longer
>     automatically the best option, and may not even be OK. I would say
>     that the case must be made clearly and *publicly* that
> 
>     (1) something is actually broken
>     (2) the revert fixes the problem
>     (3) revert is the best option
> 
>     In this scenario there is time to consider. An important and common
>     case is that a perfectly fine change exposes something already
>     broken, so the best option may be sickbaying downstream or pinning
>     their version/commit of Beam until they can fix.
> 
>     Kenn
> 
> 
>     On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <altay@google.com
>     <ma...@google.com>> wrote:
>      >
>      > It sounds like we are in agreement that addressing issues sooner
>     is better. I think reverting is in general the less stressful option
>     because it allows a solution to be developed in parallel. Even with
>     that, it is not the only option we have and based on the severity
>     and the complexity of the problem we can consider other options.
>     Fixing forward might be feasible in some cases.
>      >
>      > We can bring issues back to the mailing list. This would be akin
>     to bringing any issues to the mailing list. I think JIRA is a better
>     tool for that. These reverts are happening because of an issue, and
>     JIRA allows informing all involved parties, creates emails to the
>     issues list for later searching through mailing archives, and
>     creates a record of things in structured way with components.
>      >
>      > We could establish general policies about for all reverts to have
>     an issue (which we already do because they are regular PRs),
>     including all people in the discussion (including the author and
>     reviewers) and follow up with new tests to expands Beam's test coverage.
>      >
>      > On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise
>     <thomas.weise@gmail.com <ma...@gmail.com>> wrote:
>      >>
>      >>
>      >>
>      >> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <altay@google.com
>     <ma...@google.com>> wrote:
>      >>>
>      >>> Thank you for bringing this discussion back to the mailing list.
>      >>>
>      >>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <thw@apache.org
>     <ma...@apache.org>> wrote:
>      >>>>
>      >>>> We have observed instances of changes being reverted in master
>     that have been authored following the contributor guidelines and
>     pass all tests (post commit). While we generally seem to have quite
>     a bit of revert action happening [1], this thread is about those
>     instances that are outside of our documented policies.
>      >>>>
>      >>>> For a contributor, it isn't a good experience to see reverts
>     (especially not out of the blue) after a PR has been reviewed, all
>     tests pass and generally care has been taken to do the right things.
>      >>>
>      >>>
>      >>> I completely agree. Everyone involved needs to have the context
>     about why a change is being reverted. A JIRA with information is
>     probably a good way to do it, similar to the any other issue we track.
>      >>>
>      >>>>
>      >>>>
>      >>>> Changes can have unforeseen effects downstream. Usually
>     releases provide the opportunity to mitigate such issues, not
>     necessarily by a revert, but in many cases by another change that
>     keeps everyone happy. Unexpected reverts can break someone else and
>     are disruptive.
>      >>>
>      >>>
>      >>> I disagree about waiting until release to resolve an identified
>     issue. Let's say we become aware of an issue through channels other
>     than Beam tests (e.g. user reports, a contributor running into
>     something not captured in Beam tests) and we know that it is a
>     credible issue that will block the release anyway. Addressing the
>     issue sooner will be less painful than addressing later. (I am not
>     suggesting addressing every such issue similar to we do not block
>     releases on every open issue. There needs to be a due diligence to
>     understand the severity and the impact.)
>      >>>
>      >>> We can improve on the above process. If we end up reverting a
>     change as a result of a report that is not covered by existing Beam
>     tests, we could expand the tests to catch same class of problems
>     even earlier by means of Beam's own tests.
>      >>
>      >>
>      >> I'm referring to the release process as an example how such
>     issues can be addressed. I'm not suggesting to wait until a release
>     to address issues; as you say the sooner they are identified the better.
>      >>
>      >> However, I don't agree with taking out the revert hammer on the
>     slightest sign that there could be a downstream problem. There are
>     better ways to handle this. First of all, I think that these issues
>     should be visible on the dev mailing list so that there is more
>     awareness overall (which will lead to better test coverage and
>     useful feedback in general). Second, we should make an effort to
>     resolve issues in a forward looking way. If after all it turns out
>     that a revert is most appropriate for the situation, it should
>     follow explicit agreement.
>      >>
>      >>>
>      >>>
>      >>>>
>      >>>>
>      >>>> Some discussion already took place on one specific PR [3], but
>     that is just an example and by no means an isolated incident.
>      >>>>
>      >>>> On some of these reverts, "internal" problems
>      >>>
>      >>>
>      >>> This is a communication issue that needs be addressed. Over
>     time we are getting new contributors and that is great, but it also
>     means these new folks need to understand operating conditions of
>     this community. Giving direct feedback would generally be most
>     efficient here.
>      >>>
>      >>>>
>      >>>> with an important runner are cited, with little to no
>     explanation. It would be nice if folks with more insight can shed
>     more light on this.
>      >>>
>      >>>
>      >>> All runners we accepted at out master branch and include in our
>     releases is important. I do not find a reference to an important
>     runner in the examples other than a question about what would the
>     outcome be for a runner other than Dataflow. I think the outcome
>     would be the same, and in my opinion it should be. We need to be
>     careful not breaking any runner we support.
>      >>>
>      >>>>
>      >>>> I hope that as outcome of this discussion, we can arrive at a
>     better understanding of why and when such reverts were necessary and
>     possibly find ways to largely avoid them going forward.
>      >>>>
>      >>>> Thanks!
>      >>>> Thomas
>      >>>>
>      >>>>
>      >>>> [1]
>     https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>      >>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>      >>>> [3] https://github.com/apache/beam/pull/7012
>      >>>>
>      >>>
> 

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Robert Bradshaw <ro...@google.com>.
If something breaks Beam's post (or especially pre) commit tests, I agree
that rollback is typically the best option and can be done quickly. The
situation is totally different if it breaks downstream projects in which
Kenn's three points are good criteria for determining if we should
rollback, which should not be assumed to be the default option.

I would say the root cause of the problem is insufficient visibility and
testing. If external-to-beam tests (or production jobs) are broken in such
a way that rollback is desired, I would say the onus (maybe not a hard
requirement, but a high bar for exceptions) is on whoever is asking for the
rollback to create and submit an external test that demonstrates the issue.
It is their choice whether this is easier than rolling forward or otherwise
working around the breakage. This seems like the only long-term sustainable
option and should get us out of this bad situation.

(As an aside, the bar for rolling back a runner-specific PR that brake that
runner may be lower, though still not automatic as other changes may depend
on it.)

- Robert

On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <ke...@apache.org> wrote:

> Just adapting my PR commentary to this thread:
>
> Our rollback first policy cannot apply to a change that passes all of
> Beam's postcommit tests. It *does* apply to Beam's postcommit suites for
> each and every runner; they are equally important in this regard.
>
> The purpose of rapid rollback without discussion is foremost to restore
> test signal and not to disrupt the work of other contributors, that is why
> it is OK to roll back before figuring out if the change was actually bad.
> If that isn't at stake, the policy doesn't make sense to apply.
>
> But...
>
>  - We have at least three examples of runners where there are probably
> tests outside the Beam repo: Dataflow, Samza runner, and IBM Streams.
>  - We also may have users that try running their production loads against
> Beam master branch to learn early whether the next release will break them.
>
> These are success stories for Beam. We should respect these other sources
> of information for what they are: users and vendors giving us a heads up
> about changes that will be a problem if we release them.
>
> Often rollback is still a good option but IMO it is no longer
> automatically the best option, and may not even be OK. I would say that the
> case must be made clearly and *publicly* that
>
> (1) something is actually broken
> (2) the revert fixes the problem
> (3) revert is the best option
>
> In this scenario there is time to consider. An important and common case
> is that a perfectly fine change exposes something already broken, so the
> best option may be sickbaying downstream or pinning their version/commit of
> Beam until they can fix.
>
> Kenn
>
>
> On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <al...@google.com> wrote:
> >
> > It sounds like we are in agreement that addressing issues sooner is
> better. I think reverting is in general the less stressful option because
> it allows a solution to be developed in parallel. Even with that, it is not
> the only option we have and based on the severity and the complexity of the
> problem we can consider other options. Fixing forward might be feasible in
> some cases.
> >
> > We can bring issues back to the mailing list. This would be akin to
> bringing any issues to the mailing list. I think JIRA is a better tool for
> that. These reverts are happening because of an issue, and JIRA allows
> informing all involved parties, creates emails to the issues list for later
> searching through mailing archives, and creates a record of things in
> structured way with components.
> >
> > We could establish general policies about for all reverts to have an
> issue (which we already do because they are regular PRs), including all
> people in the discussion (including the author and reviewers) and follow up
> with new tests to expands Beam's test coverage.
> >
> > On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise <th...@gmail.com>
> wrote:
> >>
> >>
> >>
> >> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <al...@google.com> wrote:
> >>>
> >>> Thank you for bringing this discussion back to the mailing list.
> >>>
> >>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <th...@apache.org> wrote:
> >>>>
> >>>> We have observed instances of changes being reverted in master that
> have been authored following the contributor guidelines and pass all tests
> (post commit). While we generally seem to have quite a bit of revert action
> happening [1], this thread is about those instances that are outside of our
> documented policies.
> >>>>
> >>>> For a contributor, it isn't a good experience to see reverts
> (especially not out of the blue) after a PR has been reviewed, all tests
> pass and generally care has been taken to do the right things.
> >>>
> >>>
> >>> I completely agree. Everyone involved needs to have the context about
> why a change is being reverted. A JIRA with information is probably a good
> way to do it, similar to the any other issue we track.
> >>>
> >>>>
> >>>>
> >>>> Changes can have unforeseen effects downstream. Usually releases
> provide the opportunity to mitigate such issues, not necessarily by a
> revert, but in many cases by another change that keeps everyone happy.
> Unexpected reverts can break someone else and are disruptive.
> >>>
> >>>
> >>> I disagree about waiting until release to resolve an identified issue.
> Let's say we become aware of an issue through channels other than Beam
> tests (e.g. user reports, a contributor running into something not captured
> in Beam tests) and we know that it is a credible issue that will block the
> release anyway. Addressing the issue sooner will be less painful than
> addressing later. (I am not suggesting addressing every such issue similar
> to we do not block releases on every open issue. There needs to be a due
> diligence to understand the severity and the impact.)
> >>>
> >>> We can improve on the above process. If we end up reverting a change
> as a result of a report that is not covered by existing Beam tests, we
> could expand the tests to catch same class of problems even earlier by
> means of Beam's own tests.
> >>
> >>
> >> I'm referring to the release process as an example how such issues can
> be addressed. I'm not suggesting to wait until a release to address issues;
> as you say the sooner they are identified the better.
> >>
> >> However, I don't agree with taking out the revert hammer on the
> slightest sign that there could be a downstream problem. There are better
> ways to handle this. First of all, I think that these issues should be
> visible on the dev mailing list so that there is more awareness overall
> (which will lead to better test coverage and useful feedback in general).
> Second, we should make an effort to resolve issues in a forward looking
> way. If after all it turns out that a revert is most appropriate for the
> situation, it should follow explicit agreement.
> >>
> >>>
> >>>
> >>>>
> >>>>
> >>>> Some discussion already took place on one specific PR [3], but that
> is just an example and by no means an isolated incident.
> >>>>
> >>>> On some of these reverts, "internal" problems
> >>>
> >>>
> >>> This is a communication issue that needs be addressed. Over time we
> are getting new contributors and that is great, but it also means these new
> folks need to understand operating conditions of this community. Giving
> direct feedback would generally be most efficient here.
> >>>
> >>>>
> >>>> with an important runner are cited, with little to no explanation. It
> would be nice if folks with more insight can shed more light on this.
> >>>
> >>>
> >>> All runners we accepted at out master branch and include in our
> releases is important. I do not find a reference to an important runner in
> the examples other than a question about what would the outcome be for a
> runner other than Dataflow. I think the outcome would be the same, and in
> my opinion it should be. We need to be careful not breaking any runner we
> support.
> >>>
> >>>>
> >>>> I hope that as outcome of this discussion, we can arrive at a better
> understanding of why and when such reverts were necessary and possibly find
> ways to largely avoid them going forward.
> >>>>
> >>>> Thanks!
> >>>> Thomas
> >>>>
> >>>>
> >>>> [1]
> https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
> >>>> [2] https://beam.apache.org/contribute/postcommits-policies/
> >>>> [3] https://github.com/apache/beam/pull/7012
> >>>>
> >>>
>
>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Kenneth Knowles <ke...@apache.org>.
Just adapting my PR commentary to this thread:

Our rollback first policy cannot apply to a change that passes all of
Beam's postcommit tests. It *does* apply to Beam's postcommit suites for
each and every runner; they are equally important in this regard.

The purpose of rapid rollback without discussion is foremost to restore
test signal and not to disrupt the work of other contributors, that is why
it is OK to roll back before figuring out if the change was actually bad.
If that isn't at stake, the policy doesn't make sense to apply.

But...

 - We have at least three examples of runners where there are probably
tests outside the Beam repo: Dataflow, Samza runner, and IBM Streams.
 - We also may have users that try running their production loads against
Beam master branch to learn early whether the next release will break them.

These are success stories for Beam. We should respect these other sources
of information for what they are: users and vendors giving us a heads up
about changes that will be a problem if we release them.

Often rollback is still a good option but IMO it is no longer automatically
the best option, and may not even be OK. I would say that the case must be
made clearly and *publicly* that

(1) something is actually broken
(2) the revert fixes the problem
(3) revert is the best option

In this scenario there is time to consider. An important and common case is
that a perfectly fine change exposes something already broken, so the best
option may be sickbaying downstream or pinning their version/commit of Beam
until they can fix.

Kenn


On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <al...@google.com> wrote:
>
> It sounds like we are in agreement that addressing issues sooner is
better. I think reverting is in general the less stressful option because
it allows a solution to be developed in parallel. Even with that, it is not
the only option we have and based on the severity and the complexity of the
problem we can consider other options. Fixing forward might be feasible in
some cases.
>
> We can bring issues back to the mailing list. This would be akin to
bringing any issues to the mailing list. I think JIRA is a better tool for
that. These reverts are happening because of an issue, and JIRA allows
informing all involved parties, creates emails to the issues list for later
searching through mailing archives, and creates a record of things in
structured way with components.
>
> We could establish general policies about for all reverts to have an
issue (which we already do because they are regular PRs), including all
people in the discussion (including the author and reviewers) and follow up
with new tests to expands Beam's test coverage.
>
> On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise <th...@gmail.com>
wrote:
>>
>>
>>
>> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>> Thank you for bringing this discussion back to the mailing list.
>>>
>>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <th...@apache.org> wrote:
>>>>
>>>> We have observed instances of changes being reverted in master that
have been authored following the contributor guidelines and pass all tests
(post commit). While we generally seem to have quite a bit of revert action
happening [1], this thread is about those instances that are outside of our
documented policies.
>>>>
>>>> For a contributor, it isn't a good experience to see reverts
(especially not out of the blue) after a PR has been reviewed, all tests
pass and generally care has been taken to do the right things.
>>>
>>>
>>> I completely agree. Everyone involved needs to have the context about
why a change is being reverted. A JIRA with information is probably a good
way to do it, similar to the any other issue we track.
>>>
>>>>
>>>>
>>>> Changes can have unforeseen effects downstream. Usually releases
provide the opportunity to mitigate such issues, not necessarily by a
revert, but in many cases by another change that keeps everyone happy.
Unexpected reverts can break someone else and are disruptive.
>>>
>>>
>>> I disagree about waiting until release to resolve an identified issue.
Let's say we become aware of an issue through channels other than Beam
tests (e.g. user reports, a contributor running into something not captured
in Beam tests) and we know that it is a credible issue that will block the
release anyway. Addressing the issue sooner will be less painful than
addressing later. (I am not suggesting addressing every such issue similar
to we do not block releases on every open issue. There needs to be a due
diligence to understand the severity and the impact.)
>>>
>>> We can improve on the above process. If we end up reverting a change as
a result of a report that is not covered by existing Beam tests, we could
expand the tests to catch same class of problems even earlier by means of
Beam's own tests.
>>
>>
>> I'm referring to the release process as an example how such issues can
be addressed. I'm not suggesting to wait until a release to address issues;
as you say the sooner they are identified the better.
>>
>> However, I don't agree with taking out the revert hammer on the
slightest sign that there could be a downstream problem. There are better
ways to handle this. First of all, I think that these issues should be
visible on the dev mailing list so that there is more awareness overall
(which will lead to better test coverage and useful feedback in general).
Second, we should make an effort to resolve issues in a forward looking
way. If after all it turns out that a revert is most appropriate for the
situation, it should follow explicit agreement.
>>
>>>
>>>
>>>>
>>>>
>>>> Some discussion already took place on one specific PR [3], but that is
just an example and by no means an isolated incident.
>>>>
>>>> On some of these reverts, "internal" problems
>>>
>>>
>>> This is a communication issue that needs be addressed. Over time we are
getting new contributors and that is great, but it also means these new
folks need to understand operating conditions of this community. Giving
direct feedback would generally be most efficient here.
>>>
>>>>
>>>> with an important runner are cited, with little to no explanation. It
would be nice if folks with more insight can shed more light on this.
>>>
>>>
>>> All runners we accepted at out master branch and include in our
releases is important. I do not find a reference to an important runner in
the examples other than a question about what would the outcome be for a
runner other than Dataflow. I think the outcome would be the same, and in
my opinion it should be. We need to be careful not breaking any runner we
support.
>>>
>>>>
>>>> I hope that as outcome of this discussion, we can arrive at a better
understanding of why and when such reverts were necessary and possibly find
ways to largely avoid them going forward.
>>>>
>>>> Thanks!
>>>> Thomas
>>>>
>>>>
>>>> [1]
https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>>>> [3] https://github.com/apache/beam/pull/7012
>>>>
>>>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Ahmet Altay <al...@google.com>.
It sounds like we are in agreement that addressing issues sooner is better.
I think reverting is in general the less stressful option because it allows
a solution to be developed in parallel. Even with that, it is not the only
option we have and based on the severity and the complexity of the problem
we can consider other options. Fixing forward might be feasible in some
cases.

We can bring issues back to the mailing list. This would be akin to
bringing any issues to the mailing list. I think JIRA is a better tool for
that. These reverts are happening because of an issue, and JIRA allows
informing all involved parties, creates emails to the issues list for later
searching through mailing archives, and creates a record of things in
structured way with components.

We could establish general policies about for all reverts to have an issue
(which we already do because they are regular PRs), including all people in
the discussion (including the author and reviewers) and follow up with new
tests to expands Beam's test coverage.

On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise <th...@gmail.com>
wrote:

>
>
> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you for bringing this discussion back to the mailing list.
>>
>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <th...@apache.org> wrote:
>>
>>> We have observed instances of changes being reverted in master that have
>>> been authored following the contributor guidelines and pass all tests (post
>>> commit). While we generally seem to have quite a bit of revert action
>>> happening [1], this thread is about those instances that are outside of our
>>> documented policies.
>>>
>>> For a contributor, it isn't a good experience to see reverts (especially
>>> not out of the blue) after a PR has been reviewed, all tests pass and
>>> generally care has been taken to do the right things.
>>>
>>
>> I completely agree. Everyone involved needs to have the context about why
>> a change is being reverted. A JIRA with information is probably a good way
>> to do it, similar to the any other issue we track.
>>
>>
>>>
>>> Changes can have unforeseen effects downstream. Usually releases provide
>>> the opportunity to mitigate such issues, not necessarily by a revert, but
>>> in many cases by another change that keeps everyone happy. Unexpected
>>> reverts can break someone else and are disruptive.
>>>
>>
>> I disagree about waiting until release to resolve an identified issue.
>> Let's say we become aware of an issue through channels other than Beam
>> tests (e.g. user reports, a contributor running into something not captured
>> in Beam tests) and we know that it is a credible issue that will block the
>> release anyway. Addressing the issue sooner will be less painful than
>> addressing later. (I am not suggesting addressing every such issue similar
>> to we do not block releases on every open issue. There needs to be a due
>> diligence to understand the severity and the impact.)
>>
>> We can improve on the above process. If we end up reverting a change as a
>> result of a report that is not covered by existing Beam tests, we could
>> expand the tests to catch same class of problems even earlier by means of
>> Beam's own tests.
>>
>
> I'm referring to the release process as an example how such issues can be
> addressed. I'm not suggesting to wait until a release to address issues; as
> you say the sooner they are identified the better.
>
> However, I don't agree with taking out the revert hammer on the slightest
> sign that there could be a downstream problem. There are better ways to
> handle this. First of all, I think that these issues should be visible on
> the dev mailing list so that there is more awareness overall (which will
> lead to better test coverage and useful feedback in general). Second, we
> should make an effort to resolve issues in a forward looking way. If after
> all it turns out that a revert is most appropriate for the situation, it
> should follow explicit agreement.
>
>
>>
>>
>>>
>>> Some discussion already took place on one specific PR [3], but that is
>>> just an example and by no means an isolated incident.
>>>
>>> On some of these reverts, "internal" problems
>>>
>>
>> This is a communication issue that needs be addressed. Over time we are
>> getting new contributors and that is great, but it also means these new
>> folks need to understand operating conditions of this community. Giving
>> direct feedback would generally be most efficient here.
>>
>>
>>> with an important runner are cited, with little to no explanation. It
>>> would be nice if folks with more insight can shed more light on this.
>>>
>>
>> All runners we accepted at out master branch and include in our releases
>> is important. I do not find a reference to an important runner in the
>> examples other than a question about what would the outcome be for a runner
>> other than Dataflow. I think the outcome would be the same, and in my
>> opinion it should be. We need to be careful not breaking any runner we
>> support.
>>
>>
>>> I hope that as outcome of this discussion, we can arrive at a better
>>> understanding of why and when such reverts were necessary and possibly find
>>> ways to largely avoid them going forward.
>>>
>>> Thanks!
>>> Thomas
>>>
>>>
>>> [1] https://github.com/apache/beam/search?o=desc&q=Revert&s=comm
>>> itter-date&type=Commits
>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>>> [3] https://github.com/apache/beam/pull/7012
>>>
>>>
>>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Thomas Weise <th...@gmail.com>.
On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <al...@google.com> wrote:

> Thank you for bringing this discussion back to the mailing list.
>
> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <th...@apache.org> wrote:
>
>> We have observed instances of changes being reverted in master that have
>> been authored following the contributor guidelines and pass all tests (post
>> commit). While we generally seem to have quite a bit of revert action
>> happening [1], this thread is about those instances that are outside of our
>> documented policies.
>>
>> For a contributor, it isn't a good experience to see reverts (especially
>> not out of the blue) after a PR has been reviewed, all tests pass and
>> generally care has been taken to do the right things.
>>
>
> I completely agree. Everyone involved needs to have the context about why
> a change is being reverted. A JIRA with information is probably a good way
> to do it, similar to the any other issue we track.
>
>
>>
>> Changes can have unforeseen effects downstream. Usually releases provide
>> the opportunity to mitigate such issues, not necessarily by a revert, but
>> in many cases by another change that keeps everyone happy. Unexpected
>> reverts can break someone else and are disruptive.
>>
>
> I disagree about waiting until release to resolve an identified issue.
> Let's say we become aware of an issue through channels other than Beam
> tests (e.g. user reports, a contributor running into something not captured
> in Beam tests) and we know that it is a credible issue that will block the
> release anyway. Addressing the issue sooner will be less painful than
> addressing later. (I am not suggesting addressing every such issue similar
> to we do not block releases on every open issue. There needs to be a due
> diligence to understand the severity and the impact.)
>
> We can improve on the above process. If we end up reverting a change as a
> result of a report that is not covered by existing Beam tests, we could
> expand the tests to catch same class of problems even earlier by means of
> Beam's own tests.
>

I'm referring to the release process as an example how such issues can be
addressed. I'm not suggesting to wait until a release to address issues; as
you say the sooner they are identified the better.

However, I don't agree with taking out the revert hammer on the slightest
sign that there could be a downstream problem. There are better ways to
handle this. First of all, I think that these issues should be visible on
the dev mailing list so that there is more awareness overall (which will
lead to better test coverage and useful feedback in general). Second, we
should make an effort to resolve issues in a forward looking way. If after
all it turns out that a revert is most appropriate for the situation, it
should follow explicit agreement.


>
>
>>
>> Some discussion already took place on one specific PR [3], but that is
>> just an example and by no means an isolated incident.
>>
>> On some of these reverts, "internal" problems
>>
>
> This is a communication issue that needs be addressed. Over time we are
> getting new contributors and that is great, but it also means these new
> folks need to understand operating conditions of this community. Giving
> direct feedback would generally be most efficient here.
>
>
>> with an important runner are cited, with little to no explanation. It
>> would be nice if folks with more insight can shed more light on this.
>>
>
> All runners we accepted at out master branch and include in our releases
> is important. I do not find a reference to an important runner in the
> examples other than a question about what would the outcome be for a runner
> other than Dataflow. I think the outcome would be the same, and in my
> opinion it should be. We need to be careful not breaking any runner we
> support.
>
>
>> I hope that as outcome of this discussion, we can arrive at a better
>> understanding of why and when such reverts were necessary and possibly find
>> ways to largely avoid them going forward.
>>
>> Thanks!
>> Thomas
>>
>>
>> [1]
>> https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>> [2] https://beam.apache.org/contribute/postcommits-policies/
>> [3] https://github.com/apache/beam/pull/7012
>>
>>
>

Re: [DISCUSS] Reverting commits on green post-commit status

Posted by Ahmet Altay <al...@google.com>.
Thank you for bringing this discussion back to the mailing list.

On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <th...@apache.org> wrote:

> We have observed instances of changes being reverted in master that have
> been authored following the contributor guidelines and pass all tests (post
> commit). While we generally seem to have quite a bit of revert action
> happening [1], this thread is about those instances that are outside of our
> documented policies.
>
> For a contributor, it isn't a good experience to see reverts (especially
> not out of the blue) after a PR has been reviewed, all tests pass and
> generally care has been taken to do the right things.
>

I completely agree. Everyone involved needs to have the context about why a
change is being reverted. A JIRA with information is probably a good way to
do it, similar to the any other issue we track.


>
> Changes can have unforeseen effects downstream. Usually releases provide
> the opportunity to mitigate such issues, not necessarily by a revert, but
> in many cases by another change that keeps everyone happy. Unexpected
> reverts can break someone else and are disruptive.
>

I disagree about waiting until release to resolve an identified issue.
Let's say we become aware of an issue through channels other than Beam
tests (e.g. user reports, a contributor running into something not captured
in Beam tests) and we know that it is a credible issue that will block the
release anyway. Addressing the issue sooner will be less painful than
addressing later. (I am not suggesting addressing every such issue similar
to we do not block releases on every open issue. There needs to be a due
diligence to understand the severity and the impact.)

We can improve on the above process. If we end up reverting a change as a
result of a report that is not covered by existing Beam tests, we could
expand the tests to catch same class of problems even earlier by means of
Beam's own tests.


>
> Some discussion already took place on one specific PR [3], but that is
> just an example and by no means an isolated incident.
>
> On some of these reverts, "internal" problems
>

This is a communication issue that needs be addressed. Over time we are
getting new contributors and that is great, but it also means these new
folks need to understand operating conditions of this community. Giving
direct feedback would generally be most efficient here.


> with an important runner are cited, with little to no explanation. It
> would be nice if folks with more insight can shed more light on this.
>

All runners we accepted at out master branch and include in our releases is
important. I do not find a reference to an important runner in the examples
other than a question about what would the outcome be for a runner other
than Dataflow. I think the outcome would be the same, and in my opinion it
should be. We need to be careful not breaking any runner we support.


> I hope that as outcome of this discussion, we can arrive at a better
> understanding of why and when such reverts were necessary and possibly find
> ways to largely avoid them going forward.
>
> Thanks!
> Thomas
>
>
> [1] https://github.com/apache/beam/search?o=desc&q=Revert&s=comm
> itter-date&type=Commits
> [2] https://beam.apache.org/contribute/postcommits-policies/
> [3] https://github.com/apache/beam/pull/7012
>
>