You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Xintong Song <to...@gmail.com> on 2021/06/22 09:12:07 UTC

[DISCUSS] Do not merge PRs with "unrelated" test failures.

Hi everyone,

In the past a couple of weeks, I've observed several times that PRs are
merged without a green light from the CI tests, where failure cases are
considered *unrelated*. This may not always cause problems, but would
increase the chance of breaking our code base. In fact, it has occurred to
me twice in the past few weeks that I had to revert a commit which breaks
the master branch due to this.

I think it would be nicer to enforce a stricter rule, that no PRs should be
merged without passing CI.

The problems of merging PRs with "unrelated" test failures are:
- It's not always straightforward to tell whether a test failures are
related or not.
- It prevents subsequent test cases from being executed, which may fail
relating to the PR changes.

To make things easier for the committers, the following exceptions might be
considered acceptable.
- The PR has passed CI in the contributor's personal workspace. Please post
the link in such cases.
- The CI tests have been triggered multiple times, on the same commit, and
each stage has at least passed for once. Please also comment in such cases.

If we all agree on this, I'd update the community guidelines for merging
PRs wrt. this proposal. [1]

Please let me know what do you think.

Thank you~

Xintong Song


[1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Xintong Song <to...@gmail.com>.

Thanks, Etienne. Comments addressed.

Thank you~

Xintong Song



On Wed, Jul 7, 2021 at 6:41 PM Etienne Chauchot <ec...@apache.org>
wrote:

> Hi,
>
> Thanks Xintong for the very good doc ! I added 2 comments to it.
>
> Best,
>
> Etienne
>
> On 02/07/2021 05:57, Xintong Song wrote:
> > Thanks all for the positive feedback.
> >
> > I have updated the wiki page [1], and will send an announcement in a
> > separate thread, to draw more committers' attention.
> >
> > Moreover, I've opened FLINK-23212 where we can continue with the
> discussion
> > around pure documentation changing PRs.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
> > On Wed, Jun 30, 2021 at 5:26 PM Xintong Song <to...@gmail.com>
> wrote:
> >
> >> I second Tison's opinion.
> >>
> >> Similar to how we skip docs_404_check for PRs that do not touch the
> >> documentation, we can skip other stages for PRs that only contain
> >> documentation changes.
> >>
> >> In addition to making merging documentation PRs easier, we can also
> reduce
> >> the workload on CI workers. Especially during the last days of a release
> >> cycle, which is usually the most busy time for the CI workers, and is
> also
> >> where most documentation efforts take place.
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>
> >>> I think you are right Tison that docs are a special case and they only
> >>> require flink-docs to pass. What I am wondering is how much of a
> problem
> >>> this will be (assuming that we have a decent build stability). The more
> >>> exceptions we add, the harder it will be to properly follow the
> >>> guidelines.
> >>> Maybe we can observe how many docs PRs get delayed/not merged because
> of
> >>> this and then revisit this discussion if needed.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Wed, Jun 30, 2021 at 8:30 AM tison <wa...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> There are a number of PRs modifying docs only, but we still require
> all
> >>>> tests passed on that.
> >>>>
> >>>> It is a good proposal we avoid merge PR with "unrelated" failure, but
> >>> can
> >>>> we improve the case where the contributor only works for docs?
> >>>>
> >>>> For example, base on the file change set, run doc tests only.
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>>
> >>>> godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：
> >>>>
> >>>>> +1 for the proposal. Thanks Xintong!
> >>>>>
> >>>>> Best,
> >>>>> Godfrey
> >>>>>
> >>>>>
> >>>>>
> >>>>> Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
> >>>>>
> >>>>>> Thanks Xintong. +1 to the proposal.
> >>>>>>
> >>>>>> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com>
> >>>> wrote:
> >>>>>>> +1 for the proposal. Since the test time is long and environment
> >>> may
> >>>>>> vary,
> >>>>>>> unstable tests are really annoying for developers. The solution is
> >>>>>> welcome.
> >>>>>>> Best
> >>>>>>> liujiangang
> >>>>>>>
> >>>>>>> Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
> >>>>>>>
> >>>>>>>> +1 Thanks Xintong for the update!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong
> >>>>>>>>
> >>>>>>>> On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> >>>> trohrmann@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1, thanks for updating the guidelines Xintong!
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>> On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
> >>> karmagyz@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> Thanks Xintong for drafting this doc.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Yangze Guo
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> >>>> beyond1920@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>>>> Thanks Xintong for giving detailed documentation.
> >>>>>>>>>>>
> >>>>>>>>>>> The best practice for handling test failure is very
> >>> detailed,
> >>>>>> it's
> >>>>>>> a
> >>>>>>>>> good
> >>>>>>>>>>> guidelines document with clear action steps.
> >>>>>>>>>>>
> >>>>>>>>>>> +1 to Xintong's proposal.
> >>>>>>>>>>>
> >>>>>>>>>>> Xintong Song <to...@gmail.com> 于2021年6月28日周一
> >>> 下午4:07写道：
> >>>>>>>>>>>> Thanks all for the discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Based on the opinions so far, I've drafted the new
> >>>> guidelines
> >>>>>>> [1],
> >>>>>>>>> as a
> >>>>>>>>>>>> potential replacement of the original wiki page [2].
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hopefully this draft has covered the most opinions
> >>>> discussed
> >>>>>> and
> >>>>>>>>>> consensus
> >>>>>>>>>>>> made in this discussion thread.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you~
> >>>>>>>>>>>>
> >>>>>>>>>>>> Xintong Song
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> >>>>>>>>> pnowojski@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the clarification Till. +1 for what you
> >>> have
> >>>>>>> written.
> >>>>>>>>>>>>> Piotrek
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> pt., 25 cze 2021 o 16:00 Till Rohrmann <
> >>>>> trohrmann@apache.org
> >>>>>>>>>>>> napisał(a):
> >>>>>>>>>>>>>> One quick note for clarification. I don't have
> >>> anything
> >>>>>>> against
> >>>>>>>>>> builds
> >>>>>>>>>>>>>> running on your personal Azure account and this is
> >>> not
> >>>>>> what I
> >>>>>>>>>>>> understood
> >>>>>>>>>>>>>> under "local environment". For me "local
> >>> environment"
> >>>>> means
> >>>>>>>> that
> >>>>>>>>>>>> someone
> >>>>>>>>>>>>>> runs the test locally on his machine and then says
> >>> that
> >>>>> the
> >>>>>>>>>>>>>> tests have passed locally.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I do agree that there might be a conflict of
> >>> interests
> >>>>> if a
> >>>>>>> PR
> >>>>>>>>>> author
> >>>>>>>>>>>>>> disables tests. Here I would argue that we don't
> >>> have
> >>>>>>> malignant
> >>>>>>>>>>>>> committers
> >>>>>>>>>>>>>> which means that every committer will probably first
> >>>>> check
> >>>>>>> the
> >>>>>>>>>>>> respective
> >>>>>>>>>>>>>> ticket for how often the test failed. Then I guess
> >>> the
> >>>>> next
> >>>>>>>> step
> >>>>>>>>>> would
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>> to discuss on the ticket whether to disable it or
> >>> not.
> >>>>> And
> >>>>>>>>> finally,
> >>>>>>>>>>>> after
> >>>>>>>>>>>>>> reaching a consensus, it will be disabled. If we see
> >>>>>> someone
> >>>>>>>>>> abusing
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>> policy, then we can still think about how to guard
> >>>>> against
> >>>>>>> it.
> >>>>>>>>> But,
> >>>>>>>>>>>>>> honestly, I have very rarely seen such a case. I am
> >>>> also
> >>>>> ok
> >>>>>>> to
> >>>>>>>>>> pull in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> release manager to make the final call if this
> >>> resolves
> >>>>>>>> concerns.
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> >>>>>>>>>> pnowojski@apache.org>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1 for the general idea, however I have concerns
> >>>> about
> >>>>> a
> >>>>>>>> couple
> >>>>>>>>>> of
> >>>>>>>>>>>>>> details.
> >>>>>>>>>>>>>>>> I would first try to not introduce the exception
> >>>> for
> >>>>>>> local
> >>>>>>>>>> builds.
> >>>>>>>>>>>>>>>> It makes it quite hard for others to verify the
> >>>> build
> >>>>>> and
> >>>>>>>> to
> >>>>>>>>>> make
> >>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>> that the right things were executed.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would counter Till's proposal to ignore local
> >>> green
> >>>>>>> builds.
> >>>>>>>>> If
> >>>>>>>>>>>>>> committer
> >>>>>>>>>>>>>>> is merging and closing a PR, with official azure
> >>>>> failure,
> >>>>>>> but
> >>>>>>>>>> there
> >>>>>>>>>>>>> was a
> >>>>>>>>>>>>>>> green build before or in local azure it's IMO
> >>> enough
> >>>> to
> >>>>>>> leave
> >>>>>>>>> the
> >>>>>>>>>>>>>> message:
> >>>>>>>>>>>>>>>> Latest build failure is a known issue:
> >>> FLINK-12345
> >>>>>>>>>>>>>>>> Green local build: URL
> >>>>>>>>>>>>>>> This should address Till's concern about
> >>>> verification.
> >>>>>>>>>>>>>>> On the other hand I have concerns about disabling
> >>>>> tests.*
> >>>>>>> It
> >>>>>>>>>>>> shouldn't
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>> the PR author/committer that's disabling a test on
> >>>> his
> >>>>>> own,
> >>>>>>>> as
> >>>>>>>>>>>> that's a
> >>>>>>>>>>>>>>> conflict of interests*. I have however no problems
> >>>> with
> >>>>>>>>> disabling
> >>>>>>>>>>>> test
> >>>>>>>>>>>>>>> instabilities that were marked as "blockers"
> >>> though,
> >>>>> that
> >>>>>>>>> should
> >>>>>>>>>> work
> >>>>>>>>>>>>>>> pretty well. But the important thing here is to
> >>>>> correctly
> >>>>>>>> judge
> >>>>>>>>>>>> bumping
> >>>>>>>>>>>>>>> priorities of test instabilities based on their
> >>>>> frequency
> >>>>>>> and
> >>>>>>>>>> current
> >>>>>>>>>>>>>>> general health of the system. I believe that
> >>> release
> >>>>>>> managers
> >>>>>>>>>> should
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>> playing a big role here in deciding on the
> >>> guidelines
> >>>>> of
> >>>>>>> what
> >>>>>>>>>> should
> >>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>> priority of certain test instabilities.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What I mean by that is two example scenarios:
> >>>>>>>>>>>>>>> 1. if we have a handful of very frequently failing
> >>>>> tests
> >>>>>>> and
> >>>>>>>> a
> >>>>>>>>>>>> handful
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> very rarely failing tests (like one reported
> >>> failure
> >>>>> and
> >>>>>> no
> >>>>>>>>>> another
> >>>>>>>>>>>>>>> occurrence in many months, and let's even say that
> >>>> the
> >>>>>>>> failure
> >>>>>>>>>> looks
> >>>>>>>>>>>>> like
> >>>>>>>>>>>>>>> infrastructure/network timeout), we should focus
> >>> on
> >>>> the
> >>>>>>>>>> frequently
> >>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>> ones, and probably we are safe to ignore for the
> >>> time
> >>>>>> being
> >>>>>>>> the
> >>>>>>>>>> rare
> >>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>> - at least until we deal with the most pressing
> >>> ones.
> >>>>>>>>>>>>>>> 2. If we have tons of rarely failing test
> >>>>> instabilities,
> >>>>>> we
> >>>>>>>>>> should
> >>>>>>>>>>>>>> probably
> >>>>>>>>>>>>>>> start addressing them as blockers.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm using my own conscious and my best judgement
> >>> when
> >>>>> I'm
> >>>>>>>>>>>>>>> bumping/decreasing priorities of test
> >>> instabilities
> >>>>> (and
> >>>>>>>> bugs),
> >>>>>>>>>> but
> >>>>>>>>>>>> as
> >>>>>>>>>>>>>>> individual committer I don't have the full
> >>> picture.
> >>>> As
> >>>>> I
> >>>>>>>> wrote
> >>>>>>>>>>>> above, I
> >>>>>>>>>>>>>>> think release managers are in a much better
> >>> position
> >>>> to
> >>>>>>> keep
> >>>>>>>>>>>> adjusting
> >>>>>>>>>>>>>>> those kind of guidelines.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best, Piotrek
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> >>>>>>>> napisał(a):
> >>>>>>>>>>>>>>>> +1 for Xintong's proposal.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For me, resolving problems directly (fixing the
> >>>>>>>>> infrastructure
> >>>>>>>>>>>> issue,
> >>>>>>>>>>>>>>>> disabling unstable tests and creating blocker
> >>> JIRAs
> >>>>> to
> >>>>>>>> track
> >>>>>>>>>> the
> >>>>>>>>>>>> fix
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> re-enable them asap, etc.) is (in most cases)
> >>>> better
> >>>>>> than
> >>>>>>>>>> working
> >>>>>>>>>>>>>> around
> >>>>>>>>>>>>>>>> them (verify locally, manually check and judge
> >>> the
> >>>>>>> failure
> >>>>>>>> as
> >>>>>>>>>>>>>>> "unrelated",
> >>>>>>>>>>>>>>>> etc.), and I believe the proposal could help us
> >>>>> pushing
> >>>>>>>> those
> >>>>>>>>>> more
> >>>>>>>>>>>>>> "real"
> >>>>>>>>>>>>>>>> solutions forward.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>> Yu
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> >>>>>>>> karmagyz@gmail.com
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Creating a blocker issue for the manually
> >>>> disabled
> >>>>>>> tests
> >>>>>>>>>> sounds
> >>>>>>>>>>>>> good
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>> Minor: I'm still a bit worried about the
> >>> commits
> >>>>>> merged
> >>>>>>>>>> before we
> >>>>>>>>>>>>> fix
> >>>>>>>>>>>>>>>>> the unstable tests can also break those tests.
> >>>>>> Instead
> >>>>>>> of
> >>>>>>>>>> letting
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> assigners keep a look at all potentially
> >>> related
> >>>>>>> commits,
> >>>>>>>>>> they
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> maintain a branch that is periodically synced
> >>>> with
> >>>>>> the
> >>>>>>>>> master
> >>>>>>>>>>>>> branch
> >>>>>>>>>>>>>>>>> while enabling the unstable test. So that they
> >>>> can
> >>>>>>> catch
> >>>>>>>>> the
> >>>>>>>>>>>>> breaking
> >>>>>>>>>>>>>>>>> changes asap.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Yangze Guo
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann
> >>> <
> >>>>>>>>>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> I like the idea of creating a blocker issue
> >>>> for a
> >>>>>>>>> disabled
> >>>>>>>>>>>> test.
> >>>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>> force us to resolve it in a timely manner
> >>> and
> >>>> it
> >>>>>>> won't
> >>>>>>>>> fall
> >>>>>>>>>>>>> through
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> cracks.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li
> >>> <
> >>>>>>>>>>>>>> jingsonglee0@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> +1 to Xintong's proposal
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I also have some concerns about unstable
> >>>> cases.
> >>>>>>>>>>>>>>>>>>> I think unstable cases can be divided into
> >>>>> these
> >>>>>>>> types:
> >>>>>>>>>>>>>>>>>>> - Force majeure: For example, network
> >>>> timeout,
> >>>>>>> sudden
> >>>>>>>>>>>>>> environmental
> >>>>>>>>>>>>>>>>>>> collapse, they are accidental and can
> >>> always
> >>>> be
> >>>>>>>> solved
> >>>>>>>>> by
> >>>>>>>>>>>>>>> triggering
> >>>>>>>>>>>>>>>>> azure
> >>>>>>>>>>>>>>>>>>> again. Committers should wait for the next
> >>>>> green
> >>>>>>>> azure.
> >>>>>>>>>>>>>>>>>>> - Obvious mistakes: For example, some
> >>> errors
> >>>>>> caused
> >>>>>>>> by
> >>>>>>>>>>>> obvious
> >>>>>>>>>>>>>>>> reasons
> >>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>> be repaired quickly. At this time, do we
> >>> need
> >>>>> to
> >>>>>>>> wait,
> >>>>>>>>>> or not
> >>>>>>>>>>>>>> wait
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>> ignore?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> - Difficult questions: These problems are
> >>>> very
> >>>>>>>>> difficult
> >>>>>>>>>> to
> >>>>>>>>>>>>> find.
> >>>>>>>>>>>>>>>> There
> >>>>>>>>>>>>>>>>>>> will be no solution for a while and a
> >>> half.
> >>>> We
> >>>>>>> don't
> >>>>>>>>> even
> >>>>>>>>>>>> know
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> reason.
> >>>>>>>>>>>>>>>>>>> At this time, we should ignore it. (Maybe
> >>>> it's
> >>>>>>> judged
> >>>>>>>>> by
> >>>>>>>>>> the
> >>>>>>>>>>>>>> author
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> case. But what about the old case whose
> >>>> author
> >>>>>>> can't
> >>>>>>>> be
> >>>>>>>>>>>> found?)
> >>>>>>>>>>>>>>>>>>> So, the ignored cases should be the block
> >>> of
> >>>>> the
> >>>>>>> next
> >>>>>>>>>> release
> >>>>>>>>>>>>>> until
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> reason is found or the case is fixed?  We
> >>>> need
> >>>>> to
> >>>>>>>>> ensure
> >>>>>>>>>> that
> >>>>>>>>>>>>>>> someone
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> take care of these cases, because there
> >>> is no
> >>>>>>>> deepening
> >>>>>>>>>> of
> >>>>>>>>>>>>> failed
> >>>>>>>>>>>>>>>>> tests, no
> >>>>>>>>>>>>>>>>>>> one may continue to pay attention to these
> >>>>> cases.
> >>>>>>>>>>>>>>>>>>> I think this guideline should consider
> >>> these
> >>>>>>>>> situations,
> >>>>>>>>>> and
> >>>>>>>>>>>>> show
> >>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> solve them.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> >>>>>>>>>> imjark@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>> Thanks to Xintong for bringing up this
> >>>> topic,
> >>>>>> I'm
> >>>>>>>> +1
> >>>>>>>>> in
> >>>>>>>>>>>>>> general.
> >>>>>>>>>>>>>>>>>>>> However, I think it's still not very
> >>> clear
> >>>>> how
> >>>>>> we
> >>>>>>>>>> address
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> unstable
> >>>>>>>>>>>>>>>>>>>> tests.
> >>>>>>>>>>>>>>>>>>>> I think this is a very important part of
> >>>> this
> >>>>>> new
> >>>>>>>>>>>> guideline.
> >>>>>>>>>>>>>>>>>>>> According to the discussion above, if
> >>> some
> >>>>>> tests
> >>>>>>>> are
> >>>>>>>>>>>>> unstable,
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> manually disable it.
> >>>>>>>>>>>>>>>>>>>> But I have some questions in my mind:
> >>>>>>>>>>>>>>>>>>>> 1) Is the instability judged by the
> >>>> committer
> >>>>>>>>>> themselves or
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>> metrics?
> >>>>>>>>>>>>>>>>>>>> 2) Should we log the disable commit in
> >>> the
> >>>>>>>>>> corresponding
> >>>>>>>>>>>>> issue
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> increase
> >>>>>>>>>>>>>>>>>>>> the priority?
> >>>>>>>>>>>>>>>>>>>> 3) What if nobody looks into this issue
> >>> and
> >>>>>> this
> >>>>>>>>>> becomes
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> potential
> >>>>>>>>>>>>>>>>>>>> bugs released with the new version?
> >>>>>>>>>>>>>>>>>>>> 4) If no person is actively working on
> >>> the
> >>>>>> issue,
> >>>>>>>> who
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>> re-enable
> >>>>>>>>>>>>>>>>>>> it?
> >>>>>>>>>>>>>>>>>>>> Would it block PRs again?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, 24 Jun 2021 at 10:04, Xintong
> >>> Song
> >>>> <
> >>>>>>>>>>>>>>> tonysong820@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>> Thanks all for the feedback.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> @Till @Yangze
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I'm also not convinced by the idea of
> >>>>> having
> >>>>>> an
> >>>>>>>>>> exception
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> local
> >>>>>>>>>>>>>>>>>>>> builds.
> >>>>>>>>>>>>>>>>>>>>> We need to execute the entire build
> >>> (or
> >>>> at
> >>>>>>> least
> >>>>>>>>> the
> >>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>> stage)
> >>>>>>>>>>>>>>>>>>>>> locally, to make sure subsequent test
> >>>> cases
> >>>>>>>>>> prevented by
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>> are all executed. In that case, it's
> >>>>> probably
> >>>>>>>>> easier
> >>>>>>>>>> to
> >>>>>>>>>>>>> rerun
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>> azure than locally.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Concerning disabling unstable test
> >>> cases
> >>>>> that
> >>>>>>>>>> regularly
> >>>>>>>>>>>>> block
> >>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>> merging, maybe we can say that such
> >>> cases
> >>>>> can
> >>>>>>>> only
> >>>>>>>>> be
> >>>>>>>>>>>>>> disabled
> >>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>> someone
> >>>>>>>>>>>>>>>>>>>>> is actively looking into it, likely
> >>> the
> >>>>>> person
> >>>>>>>> who
> >>>>>>>>>>>> disabled
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> case.
> >>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>> this person is no longer actively
> >>> working
> >>>>> on
> >>>>>>> it,
> >>>>>>>>>> he/she
> >>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>> enable
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> case again no matter if it is fixed or
> >>>> not.
> >>>>>>>>>>>>>>>>>>>>> @Jing
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for the suggestions.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> +1 to provide guidelines on handling
> >>> test
> >>>>>>>> failures.
> >>>>>>>>>>>>>>>>>>>>> 1. Report the test failures in the
> >>> JIRA.
> >>>>>>>>>>>>>>>>>>>>> +1 on this. Currently, the release
> >>>> managers
> >>>>>> are
> >>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> ci
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> cron
> >>>>>>>>>>>>>>>>>>>>> build instabilities and reporting
> >>> them on
> >>>>>> JIRA.
> >>>>>>>> We
> >>>>>>>>>> should
> >>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>> encourage
> >>>>>>>>>>>>>>>>>>>>> other contributors to do that for PRs.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 2. Set a deadline to find out the root
> >>>>> cause
> >>>>>>> and
> >>>>>>>>>> solve
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> new created JIRA  because we could
> >>> not
> >>>>>> block
> >>>>>>>>> other
> >>>>>>>>>>>> commit
> >>>>>>>>>>>>>>>> merges
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 3. What to do if the JIRA has not made
> >>>>>>>> significant
> >>>>>>>>>>>> progress
> >>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>> reached
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> the deadline time?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I'm not sure about these two. It
> >>> feels a
> >>>>> bit
> >>>>>>>>> against
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> voluntary
> >>>>>>>>>>>>>>>>>>> nature
> >>>>>>>>>>>>>>>>>>>>> of open source projects.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> IMHO, frequent instabilities are more
> >>>>> likely
> >>>>>> to
> >>>>>>>> be
> >>>>>>>>>>>> upgraded
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> critical
> >>>>>>>>>>>>>>>>>>>>> / blocker priority, receive more
> >>>> attention
> >>>>>> and
> >>>>>>>>>> eventually
> >>>>>>>>>>>>> get
> >>>>>>>>>>>>>>>>> fixed.
> >>>>>>>>>>>>>>>>>>>>> Release managers are also responsible
> >>> for
> >>>>>>> looking
> >>>>>>>>> for
> >>>>>>>>>>>>>> assignees
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>>>>> issues. If a case is still not fixed
> >>>>> soonish,
> >>>>>>>> even
> >>>>>>>>>> with
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>> efforts,
> >>>>>>>>>>>>>>>>>>>>> I'm not sure how setting a deadline
> >>> can
> >>>>> help
> >>>>>>>> this.
> >>>>>>>>>>>>>>>>>>>>> 4. If we disable the respective tests
> >>>>>>>> temporarily,
> >>>>>>>>> we
> >>>>>>>>>>>> also
> >>>>>>>>>>>>>>> need a
> >>>>>>>>>>>>>>>>>>>> mechanism
> >>>>>>>>>>>>>>>>>>>>>> to ensure the issue would be
> >>> continued
> >>>> to
> >>>>>> be
> >>>>>>>>>>>> investigated
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>> +1. As mentioned above, we may
> >>> consider
> >>>>>>> disabling
> >>>>>>>>>> such
> >>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>> iff
> >>>>>>>>>>>>>>>>>>> someone
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> actively working on it.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thank you~
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Xintong Song
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 9:56 PM JING
> >>>> ZHANG
> >>>>> <
> >>>>>>>>>>>>>>> beyond1920@gmail.com
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hi Xintong,
> >>>>>>>>>>>>>>>>>>>>>> +1 to the proposal.
> >>>>>>>>>>>>>>>>>>>>>> In order to better comply with the
> >>>> rule,
> >>>>> it
> >>>>>>> is
> >>>>>>>>>>>> necessary
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> describe
> >>>>>>>>>>>>>>>>>>>>> what's
> >>>>>>>>>>>>>>>>>>>>>> best practice if encountering test
> >>>>> failure
> >>>>>>>> which
> >>>>>>>>>> seems
> >>>>>>>>>>>>>>>> unrelated
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> current commits.
> >>>>>>>>>>>>>>>>>>>>>> How to avoid merging PR with test
> >>>>> failures
> >>>>>>> and
> >>>>>>>>> not
> >>>>>>>>>>>>> blocking
> >>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>> merging
> >>>>>>>>>>>>>>>>>>>>>> for a long time?
> >>>>>>>>>>>>>>>>>>>>>> I tried to think about the possible
> >>>>> steps,
> >>>>>>> and
> >>>>>>>>>> found
> >>>>>>>>>>>>> there
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>> detailed problems that need to be
> >>>>> discussed
> >>>>>>> in
> >>>>>>>> a
> >>>>>>>>>> step
> >>>>>>>>>>>>>>> further:
> >>>>>>>>>>>>>>>>>>>>>> 1. Report the test failures in the
> >>>> JIRA.
> >>>>>>>>>>>>>>>>>>>>>> 2. Set a deadline to find out the
> >>> root
> >>>>>> cause
> >>>>>>>> and
> >>>>>>>>>> solve
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> new created JIRA  because we could
> >>> not
> >>>>>> block
> >>>>>>>>> other
> >>>>>>>>>>>> commit
> >>>>>>>>>>>>>>>> merges
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>>      When is a reasonable deadline
> >>> here?
> >>>>>>>>>>>>>>>>>>>>>> 3. What to do if the JIRA has not
> >>> made
> >>>>>>>>> significant
> >>>>>>>>>>>>> progress
> >>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>> reached
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> the deadline time?
> >>>>>>>>>>>>>>>>>>>>>>      There are several situations as
> >>>>>> follows,
> >>>>>>>>> maybe
> >>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>> cases
> >>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>> different approaches.
> >>>>>>>>>>>>>>>>>>>>>>      1. the JIRA is non-assigned yet
> >>>>>>>>>>>>>>>>>>>>>>      2. not found the root cause yet
> >>>>>>>>>>>>>>>>>>>>>>      3. not found a good solution,
> >>> but
> >>>>>> already
> >>>>>>>>>> found the
> >>>>>>>>>>>>>> root
> >>>>>>>>>>>>>>>>> cause
> >>>>>>>>>>>>>>>>>>>>>>      4. found a solution, but it
> >>> needs
> >>>>> more
> >>>>>>> time
> >>>>>>>>> to
> >>>>>>>>>> be
> >>>>>>>>>>>>> done.
> >>>>>>>>>>>>>>>>>>>>>> 4. If we disable the respective
> >>> tests
> >>>>>>>>> temporarily,
> >>>>>>>>>> we
> >>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>> need a
> >>>>>>>>>>>>>>>>>>>>> mechanism
> >>>>>>>>>>>>>>>>>>>>>> to ensure the issue would be
> >>> continued
> >>>> to
> >>>>>> be
> >>>>>>>>>>>> investigated
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>> JING ZHANG
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> >>>>>>> 于2021年6月23日周三
> >>>>>>>>>>>> 下午8:16写道：
> >>>>>>>>>>>>>>>>>>>>>>> +1 to Xintong's proposal
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 1:53 PM
> >>> Till
> >>>>>>>> Rohrmann <
> >>>>>>>>>>>>>>>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I would first try to not
> >>> introduce
> >>>>> the
> >>>>>>>>>> exception
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> local
> >>>>>>>>>>>>>>>>>>> builds.
> >>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>> makes
> >>>>>>>>>>>>>>>>>>>>>>>> it quite hard for others to
> >>> verify
> >>>>> the
> >>>>>>>> build
> >>>>>>>>>> and to
> >>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> right things were executed. If
> >>> we
> >>>> see
> >>>>>>> that
> >>>>>>>>> this
> >>>>>>>>>>>>> becomes
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>> issue
> >>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>> can revisit this idea.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 4:19 AM
> >>>>> Yangze
> >>>>>>> Guo
> >>>>>>>> <
> >>>>>>>>>>>>>>>>> karmagyz@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> +1 for appending this to
> >>>> community
> >>>>>>>>>> guidelines for
> >>>>>>>>>>>>>>> merging
> >>>>>>>>>>>>>>>>> PRs.
> >>>>>>>>>>>>>>>>>>>>>>>>> @Till Rohrmann
> >>>>>>>>>>>>>>>>>>>>>>>>> I agree that with this
> >>> approach
> >>>>>>> unstable
> >>>>>>>>>> tests
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> block
> >>>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>> commit merges. However, it
> >>> might
> >>>> be
> >>>>>>> hard
> >>>>>>>> to
> >>>>>>>>>>>> prevent
> >>>>>>>>>>>>>>>> merging
> >>>>>>>>>>>>>>>>>>>> commits
> >>>>>>>>>>>>>>>>>>>>>>>>> that are related to those
> >>> tests
> >>>> and
> >>>>>>>> should
> >>>>>>>>>> have
> >>>>>>>>>>>>> been
> >>>>>>>>>>>>>>>> passed
> >>>>>>>>>>>>>>>>>>> them.
> >>>>>>>>>>>>>>>>>>>>>> It's
> >>>>>>>>>>>>>>>>>>>>>>>>> true that this judgment can be
> >>>> made
> >>>>>> by
> >>>>>>>> the
> >>>>>>>>>>>>>> committers,
> >>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>> ensure the judgment is always
> >>>>> precise
> >>>>>>> and
> >>>>>>>>> so
> >>>>>>>>>> that
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>> discussion thread.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Regarding the unstable tests,
> >>> how
> >>>>>> about
> >>>>>>>>>> adding
> >>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>> exception:
> >>>>>>>>>>>>>>>>>>>>>>>>> committers verify it in their
> >>>> local
> >>>>>>>>>> environment
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> comment in
> >>>>>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>>>>>>>>> cases?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> Yangze Guo
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at 8:23
> >>> PM
> >>>>> 刘建刚 <
> >>>>>>>>>>>>>>>>> liujiangangpeng@gmail.com
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>> It is a good principle to
> >>> run
> >>>> all
> >>>>>>> tests
> >>>>>>>>>>>>>> successfully
> >>>>>>>>>>>>>>>>> with any
> >>>>>>>>>>>>>>>>>>>>>> change.
> >>>>>>>>>>>>>>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>>>>>>>>>>> means a lot for project's
> >>>>> stability
> >>>>>>> and
> >>>>>>>>>>>>>> development.
> >>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> am big
> >>>>>>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Best
> >>>>>>>>>>>>>>>>>>>>>>>>>> liujiangang
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>> 于2021年6月22日周二
> >>>>>>>>>>>>>>>>> 下午6:36写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>> One way to address the
> >>>> problem
> >>>>> of
> >>>>>>>>>> regularly
> >>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> block
> >>>>>>>>>>>>>>>>>>>>>>>>>>> merging of PRs is to
> >>> disable
> >>>>> the
> >>>>>>>>>> respective
> >>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>>> being.
> >>>>>>>>>>>>>>>>>>>>>>>>> Of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> course, the failing test
> >>> then
> >>>>>> needs
> >>>>>>>> to
> >>>>>>>>> be
> >>>>>>>>>>>>> fixed.
> >>>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>> would not block everyone
> >>> from
> >>>>>>> making
> >>>>>>>>>>>> progress.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at
> >>> 12:00
> >>>>> PM
> >>>>>>>> Arvid
> >>>>>>>>>> Heise
> >>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>> arvid@apache.org
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I think this is overall
> >>> a
> >>>>> good
> >>>>>>>> idea.
> >>>>>>>>>> So +1
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>> side.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However, I'd like to
> >>> put a
> >>>>>> higher
> >>>>>>>>>> priority
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> infrastructure
> >>>>>>>>>>>>>>>>>>>>>>> then,
> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> particular docker
> >>>>>> image/artifact
> >>>>>>>>>> caches.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at
> >>>> 11:50
> >>>>>> AM
> >>>>>>>> Till
> >>>>>>>>>>>>> Rohrmann
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>> trohrmann@apache.org
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for bringing
> >>> this
> >>>>>> topic
> >>>>>>> to
> >>>>>>>>> our
> >>>>>>>>>>>>>> attention
> >>>>>>>>>>>>>>>>>>> Xintong.
> >>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal makes a lot
> >>> of
> >>>>> sense
> >>>>>>> and
> >>>>>>>>> we
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>>>> follow
> >>>>>>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>> give us
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> confidence that our
> >>>> changes
> >>>>>> are
> >>>>>>>>>> working
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>>>>>>>>>>>> incentive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> quickly fix build
> >>>>>>> instabilities.
> >>>>>>>>>> Hence,
> >>>>>>>>>>>> +1.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021
> >>> at
> >>>>> 11:12
> >>>>>>> AM
> >>>>>>>>>> Xintong
> >>>>>>>>>>>>>> Song <
> >>>>>>>>>>>>>>>>>>>>>>>>> tonysong820@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past a
> >>> couple of
> >>>>>>> weeks,
> >>>>>>>>> I've
> >>>>>>>>>>>>>> observed
> >>>>>>>>>>>>>>>>> several
> >>>>>>>>>>>>>>>>>>>>> times
> >>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> merged without a
> >>> green
> >>>>>> light
> >>>>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>>>> CI
> >>>>>>>>>>>>>>> tests,
> >>>>>>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>>>>>>>> cases
> >>>>>>>>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
> >>> *unrelated*.
> >>>>>> This
> >>>>>>>> may
> >>>>>>>>>> not
> >>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>> cause
> >>>>>>>>>>>>>>>>>>>>> problems,
> >>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase the chance
> >>> of
> >>>>>>> breaking
> >>>>>>>>> our
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>>> base.
> >>>>>>>>>>>>>>>> In
> >>>>>>>>>>>>>>>>>>> fact,
> >>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>> occurred
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me twice in the past
> >>>> few
> >>>>>>> weeks
> >>>>>>>>>> that I
> >>>>>>>>>>>> had
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> revert a
> >>>>>>>>>>>>>>>>>>>>>> commit
> >>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> breaks
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master branch
> >>> due
> >>>> to
> >>>>>>> this.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it would be
> >>>> nicer
> >>>>>> to
> >>>>>>>>>> enforce a
> >>>>>>>>>>>>>>> stricter
> >>>>>>>>>>>>>>>>> rule,
> >>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> merged without
> >>> passing
> >>>>> CI.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The problems of
> >>> merging
> >>>>> PRs
> >>>>>>>> with
> >>>>>>>>>>>>>> "unrelated"
> >>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>> failures
> >>>>>>>>>>>>>>>>>>>>>>> are:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - It's not always
> >>>>>>>> straightforward
> >>>>>>>>>> to
> >>>>>>>>>>>> tell
> >>>>>>>>>>>>>>>>> whether a
> >>>>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>>>>> failures are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> related or not.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - It prevents
> >>>> subsequent
> >>>>>> test
> >>>>>>>>> cases
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>>> being
> >>>>>>>>>>>>>>>>>>>> executed,
> >>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>>>>>>>>>> fail
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> relating to the PR
> >>>>> changes.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To make things
> >>> easier
> >>>> for
> >>>>>> the
> >>>>>>>>>>>> committers,
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> following
> >>>>>>>>>>>>>>>>>>>>>>>>> exceptions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
> >>> acceptable.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - The PR has passed
> >>> CI
> >>>> in
> >>>>>> the
> >>>>>>>>>>>>> contributor's
> >>>>>>>>>>>>>>>>> personal
> >>>>>>>>>>>>>>>>>>>>>>> workspace.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> post
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the link in such
> >>> cases.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - The CI tests have
> >>>> been
> >>>>>>>>> triggered
> >>>>>>>>>>>>> multiple
> >>>>>>>>>>>>>>>>> times, on
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>> commit,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> each stage has at
> >>> least
> >>>>>>> passed
> >>>>>>>>> for
> >>>>>>>>>>>> once.
> >>>>>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>>> comment
> >>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we all agree on
> >>>> this,
> >>>>>> I'd
> >>>>>>>>>> update the
> >>>>>>>>>>>>>>>> community
> >>>>>>>>>>>>>>>>>>>>>> guidelines
> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> merging
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PRs wrt. this
> >>> proposal.
> >>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know
> >>> what
> >>>>> do
> >>>>>>> you
> >>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you~
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xintong Song
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Best, Jingsong Lee
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best, Jingsong Lee
> >>>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards!
> >>>>>> Rui Li
> >>>>>>
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Etienne Chauchot <ec...@apache.org>.

Hi,

Thanks Xintong for the very good doc ! I added 2 comments to it.

Best,

Etienne

On 02/07/2021 05:57, Xintong Song wrote:
> Thanks all for the positive feedback.
>
> I have updated the wiki page [1], and will send an announcement in a
> separate thread, to draw more committers' attention.
>
> Moreover, I've opened FLINK-23212 where we can continue with the discussion
> around pure documentation changing PRs.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>
> On Wed, Jun 30, 2021 at 5:26 PM Xintong Song <to...@gmail.com> wrote:
>
>> I second Tison's opinion.
>>
>> Similar to how we skip docs_404_check for PRs that do not touch the
>> documentation, we can skip other stages for PRs that only contain
>> documentation changes.
>>
>> In addition to making merging documentation PRs easier, we can also reduce
>> the workload on CI workers. Especially during the last days of a release
>> cycle, which is usually the most busy time for the CI workers, and is also
>> where most documentation efforts take place.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> I think you are right Tison that docs are a special case and they only
>>> require flink-docs to pass. What I am wondering is how much of a problem
>>> this will be (assuming that we have a decent build stability). The more
>>> exceptions we add, the harder it will be to properly follow the
>>> guidelines.
>>> Maybe we can observe how many docs PRs get delayed/not merged because of
>>> this and then revisit this discussion if needed.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Jun 30, 2021 at 8:30 AM tison <wa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> There are a number of PRs modifying docs only, but we still require all
>>>> tests passed on that.
>>>>
>>>> It is a good proposal we avoid merge PR with "unrelated" failure, but
>>> can
>>>> we improve the case where the contributor only works for docs?
>>>>
>>>> For example, base on the file change set, run doc tests only.
>>>>
>>>> Best,
>>>> tison.
>>>>
>>>>
>>>> godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：
>>>>
>>>>> +1 for the proposal. Thanks Xintong!
>>>>>
>>>>> Best,
>>>>> Godfrey
>>>>>
>>>>>
>>>>>
>>>>> Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
>>>>>
>>>>>> Thanks Xintong. +1 to the proposal.
>>>>>>
>>>>>> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com>
>>>> wrote:
>>>>>>> +1 for the proposal. Since the test time is long and environment
>>> may
>>>>>> vary,
>>>>>>> unstable tests are really annoying for developers. The solution is
>>>>>> welcome.
>>>>>>> Best
>>>>>>> liujiangang
>>>>>>>
>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
>>>>>>>
>>>>>>>> +1 Thanks Xintong for the update!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong
>>>>>>>>
>>>>>>>> On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
>>>> trohrmann@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1, thanks for updating the guidelines Xintong!
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
>>> karmagyz@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Thanks Xintong for drafting this doc.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Yangze Guo
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
>>>> beyond1920@gmail.com
>>>>>>>> wrote:
>>>>>>>>>>> Thanks Xintong for giving detailed documentation.
>>>>>>>>>>>
>>>>>>>>>>> The best practice for handling test failure is very
>>> detailed,
>>>>>> it's
>>>>>>> a
>>>>>>>>> good
>>>>>>>>>>> guidelines document with clear action steps.
>>>>>>>>>>>
>>>>>>>>>>> +1 to Xintong's proposal.
>>>>>>>>>>>
>>>>>>>>>>> Xintong Song <to...@gmail.com> 于2021年6月28日周一
>>> 下午4:07写道：
>>>>>>>>>>>> Thanks all for the discussion.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on the opinions so far, I've drafted the new
>>>> guidelines
>>>>>>> [1],
>>>>>>>>> as a
>>>>>>>>>>>> potential replacement of the original wiki page [2].
>>>>>>>>>>>>
>>>>>>>>>>>> Hopefully this draft has covered the most opinions
>>>> discussed
>>>>>> and
>>>>>>>>>> consensus
>>>>>>>>>>>> made in this discussion thread.
>>>>>>>>>>>>
>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you~
>>>>>>>>>>>>
>>>>>>>>>>>> Xintong Song
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>>>>>>>>>>>> [2]
>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
>>>>>>>>> pnowojski@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the clarification Till. +1 for what you
>>> have
>>>>>>> written.
>>>>>>>>>>>>> Piotrek
>>>>>>>>>>>>>
>>>>>>>>>>>>> pt., 25 cze 2021 o 16:00 Till Rohrmann <
>>>>> trohrmann@apache.org
>>>>>>>>>>>> napisał(a):
>>>>>>>>>>>>>> One quick note for clarification. I don't have
>>> anything
>>>>>>> against
>>>>>>>>>> builds
>>>>>>>>>>>>>> running on your personal Azure account and this is
>>> not
>>>>>> what I
>>>>>>>>>>>> understood
>>>>>>>>>>>>>> under "local environment". For me "local
>>> environment"
>>>>> means
>>>>>>>> that
>>>>>>>>>>>> someone
>>>>>>>>>>>>>> runs the test locally on his machine and then says
>>> that
>>>>> the
>>>>>>>>>>>>>> tests have passed locally.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do agree that there might be a conflict of
>>> interests
>>>>> if a
>>>>>>> PR
>>>>>>>>>> author
>>>>>>>>>>>>>> disables tests. Here I would argue that we don't
>>> have
>>>>>>> malignant
>>>>>>>>>>>>> committers
>>>>>>>>>>>>>> which means that every committer will probably first
>>>>> check
>>>>>>> the
>>>>>>>>>>>> respective
>>>>>>>>>>>>>> ticket for how often the test failed. Then I guess
>>> the
>>>>> next
>>>>>>>> step
>>>>>>>>>> would
>>>>>>>>>>>> be
>>>>>>>>>>>>>> to discuss on the ticket whether to disable it or
>>> not.
>>>>> And
>>>>>>>>> finally,
>>>>>>>>>>>> after
>>>>>>>>>>>>>> reaching a consensus, it will be disabled. If we see
>>>>>> someone
>>>>>>>>>> abusing
>>>>>>>>>>>> this
>>>>>>>>>>>>>> policy, then we can still think about how to guard
>>>>> against
>>>>>>> it.
>>>>>>>>> But,
>>>>>>>>>>>>>> honestly, I have very rarely seen such a case. I am
>>>> also
>>>>> ok
>>>>>>> to
>>>>>>>>>> pull in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> release manager to make the final call if this
>>> resolves
>>>>>>>> concerns.
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
>>>>>>>>>> pnowojski@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 for the general idea, however I have concerns
>>>> about
>>>>> a
>>>>>>>> couple
>>>>>>>>>> of
>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>> I would first try to not introduce the exception
>>>> for
>>>>>>> local
>>>>>>>>>> builds.
>>>>>>>>>>>>>>>> It makes it quite hard for others to verify the
>>>> build
>>>>>> and
>>>>>>>> to
>>>>>>>>>> make
>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>> that the right things were executed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would counter Till's proposal to ignore local
>>> green
>>>>>>> builds.
>>>>>>>>> If
>>>>>>>>>>>>>> committer
>>>>>>>>>>>>>>> is merging and closing a PR, with official azure
>>>>> failure,
>>>>>>> but
>>>>>>>>>> there
>>>>>>>>>>>>> was a
>>>>>>>>>>>>>>> green build before or in local azure it's IMO
>>> enough
>>>> to
>>>>>>> leave
>>>>>>>>> the
>>>>>>>>>>>>>> message:
>>>>>>>>>>>>>>>> Latest build failure is a known issue:
>>> FLINK-12345
>>>>>>>>>>>>>>>> Green local build: URL
>>>>>>>>>>>>>>> This should address Till's concern about
>>>> verification.
>>>>>>>>>>>>>>> On the other hand I have concerns about disabling
>>>>> tests.*
>>>>>>> It
>>>>>>>>>>>> shouldn't
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> the PR author/committer that's disabling a test on
>>>> his
>>>>>> own,
>>>>>>>> as
>>>>>>>>>>>> that's a
>>>>>>>>>>>>>>> conflict of interests*. I have however no problems
>>>> with
>>>>>>>>> disabling
>>>>>>>>>>>> test
>>>>>>>>>>>>>>> instabilities that were marked as "blockers"
>>> though,
>>>>> that
>>>>>>>>> should
>>>>>>>>>> work
>>>>>>>>>>>>>>> pretty well. But the important thing here is to
>>>>> correctly
>>>>>>>> judge
>>>>>>>>>>>> bumping
>>>>>>>>>>>>>>> priorities of test instabilities based on their
>>>>> frequency
>>>>>>> and
>>>>>>>>>> current
>>>>>>>>>>>>>>> general health of the system. I believe that
>>> release
>>>>>>> managers
>>>>>>>>>> should
>>>>>>>>>>>> be
>>>>>>>>>>>>>>> playing a big role here in deciding on the
>>> guidelines
>>>>> of
>>>>>>> what
>>>>>>>>>> should
>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>> priority of certain test instabilities.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I mean by that is two example scenarios:
>>>>>>>>>>>>>>> 1. if we have a handful of very frequently failing
>>>>> tests
>>>>>>> and
>>>>>>>> a
>>>>>>>>>>>> handful
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> very rarely failing tests (like one reported
>>> failure
>>>>> and
>>>>>> no
>>>>>>>>>> another
>>>>>>>>>>>>>>> occurrence in many months, and let's even say that
>>>> the
>>>>>>>> failure
>>>>>>>>>> looks
>>>>>>>>>>>>> like
>>>>>>>>>>>>>>> infrastructure/network timeout), we should focus
>>> on
>>>> the
>>>>>>>>>> frequently
>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>> ones, and probably we are safe to ignore for the
>>> time
>>>>>> being
>>>>>>>> the
>>>>>>>>>> rare
>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>> - at least until we deal with the most pressing
>>> ones.
>>>>>>>>>>>>>>> 2. If we have tons of rarely failing test
>>>>> instabilities,
>>>>>> we
>>>>>>>>>> should
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>> start addressing them as blockers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm using my own conscious and my best judgement
>>> when
>>>>> I'm
>>>>>>>>>>>>>>> bumping/decreasing priorities of test
>>> instabilities
>>>>> (and
>>>>>>>> bugs),
>>>>>>>>>> but
>>>>>>>>>>>> as
>>>>>>>>>>>>>>> individual committer I don't have the full
>>> picture.
>>>> As
>>>>> I
>>>>>>>> wrote
>>>>>>>>>>>> above, I
>>>>>>>>>>>>>>> think release managers are in a much better
>>> position
>>>> to
>>>>>>> keep
>>>>>>>>>>>> adjusting
>>>>>>>>>>>>>>> those kind of guidelines.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best, Piotrek
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
>>>>>>>> napisał(a):
>>>>>>>>>>>>>>>> +1 for Xintong's proposal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For me, resolving problems directly (fixing the
>>>>>>>>> infrastructure
>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>> disabling unstable tests and creating blocker
>>> JIRAs
>>>>> to
>>>>>>>> track
>>>>>>>>>> the
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> re-enable them asap, etc.) is (in most cases)
>>>> better
>>>>>> than
>>>>>>>>>> working
>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>> them (verify locally, manually check and judge
>>> the
>>>>>>> failure
>>>>>>>> as
>>>>>>>>>>>>>>> "unrelated",
>>>>>>>>>>>>>>>> etc.), and I believe the proposal could help us
>>>>> pushing
>>>>>>>> those
>>>>>>>>>> more
>>>>>>>>>>>>>> "real"
>>>>>>>>>>>>>>>> solutions forward.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Yu
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
>>>>>>>> karmagyz@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Creating a blocker issue for the manually
>>>> disabled
>>>>>>> tests
>>>>>>>>>> sounds
>>>>>>>>>>>>> good
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>> Minor: I'm still a bit worried about the
>>> commits
>>>>>> merged
>>>>>>>>>> before we
>>>>>>>>>>>>> fix
>>>>>>>>>>>>>>>>> the unstable tests can also break those tests.
>>>>>> Instead
>>>>>>> of
>>>>>>>>>> letting
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> assigners keep a look at all potentially
>>> related
>>>>>>> commits,
>>>>>>>>>> they
>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> maintain a branch that is periodically synced
>>>> with
>>>>>> the
>>>>>>>>> master
>>>>>>>>>>>>> branch
>>>>>>>>>>>>>>>>> while enabling the unstable test. So that they
>>>> can
>>>>>>> catch
>>>>>>>>> the
>>>>>>>>>>>>> breaking
>>>>>>>>>>>>>>>>> changes asap.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Yangze Guo
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann
>>> <
>>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> I like the idea of creating a blocker issue
>>>> for a
>>>>>>>>> disabled
>>>>>>>>>>>> test.
>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> force us to resolve it in a timely manner
>>> and
>>>> it
>>>>>>> won't
>>>>>>>>> fall
>>>>>>>>>>>>> through
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> cracks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li
>>> <
>>>>>>>>>>>>>> jingsonglee0@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> +1 to Xintong's proposal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also have some concerns about unstable
>>>> cases.
>>>>>>>>>>>>>>>>>>> I think unstable cases can be divided into
>>>>> these
>>>>>>>> types:
>>>>>>>>>>>>>>>>>>> - Force majeure: For example, network
>>>> timeout,
>>>>>>> sudden
>>>>>>>>>>>>>> environmental
>>>>>>>>>>>>>>>>>>> collapse, they are accidental and can
>>> always
>>>> be
>>>>>>>> solved
>>>>>>>>> by
>>>>>>>>>>>>>>> triggering
>>>>>>>>>>>>>>>>> azure
>>>>>>>>>>>>>>>>>>> again. Committers should wait for the next
>>>>> green
>>>>>>>> azure.
>>>>>>>>>>>>>>>>>>> - Obvious mistakes: For example, some
>>> errors
>>>>>> caused
>>>>>>>> by
>>>>>>>>>>>> obvious
>>>>>>>>>>>>>>>> reasons
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>> be repaired quickly. At this time, do we
>>> need
>>>>> to
>>>>>>>> wait,
>>>>>>>>>> or not
>>>>>>>>>>>>>> wait
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>> ignore?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Difficult questions: These problems are
>>>> very
>>>>>>>>> difficult
>>>>>>>>>> to
>>>>>>>>>>>>> find.
>>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>> will be no solution for a while and a
>>> half.
>>>> We
>>>>>>> don't
>>>>>>>>> even
>>>>>>>>>>>> know
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> reason.
>>>>>>>>>>>>>>>>>>> At this time, we should ignore it. (Maybe
>>>> it's
>>>>>>> judged
>>>>>>>>> by
>>>>>>>>>> the
>>>>>>>>>>>>>> author
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> case. But what about the old case whose
>>>> author
>>>>>>> can't
>>>>>>>> be
>>>>>>>>>>>> found?)
>>>>>>>>>>>>>>>>>>> So, the ignored cases should be the block
>>> of
>>>>> the
>>>>>>> next
>>>>>>>>>> release
>>>>>>>>>>>>>> until
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> reason is found or the case is fixed?  We
>>>> need
>>>>> to
>>>>>>>>> ensure
>>>>>>>>>> that
>>>>>>>>>>>>>>> someone
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> take care of these cases, because there
>>> is no
>>>>>>>> deepening
>>>>>>>>>> of
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>> tests, no
>>>>>>>>>>>>>>>>>>> one may continue to pay attention to these
>>>>> cases.
>>>>>>>>>>>>>>>>>>> I think this guideline should consider
>>> these
>>>>>>>>> situations,
>>>>>>>>>> and
>>>>>>>>>>>>> show
>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> solve them.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
>>>>>>>>>> imjark@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> Thanks to Xintong for bringing up this
>>>> topic,
>>>>>> I'm
>>>>>>>> +1
>>>>>>>>> in
>>>>>>>>>>>>>> general.
>>>>>>>>>>>>>>>>>>>> However, I think it's still not very
>>> clear
>>>>> how
>>>>>> we
>>>>>>>>>> address
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> unstable
>>>>>>>>>>>>>>>>>>>> tests.
>>>>>>>>>>>>>>>>>>>> I think this is a very important part of
>>>> this
>>>>>> new
>>>>>>>>>>>> guideline.
>>>>>>>>>>>>>>>>>>>> According to the discussion above, if
>>> some
>>>>>> tests
>>>>>>>> are
>>>>>>>>>>>>> unstable,
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> manually disable it.
>>>>>>>>>>>>>>>>>>>> But I have some questions in my mind:
>>>>>>>>>>>>>>>>>>>> 1) Is the instability judged by the
>>>> committer
>>>>>>>>>> themselves or
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>> metrics?
>>>>>>>>>>>>>>>>>>>> 2) Should we log the disable commit in
>>> the
>>>>>>>>>> corresponding
>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> increase
>>>>>>>>>>>>>>>>>>>> the priority?
>>>>>>>>>>>>>>>>>>>> 3) What if nobody looks into this issue
>>> and
>>>>>> this
>>>>>>>>>> becomes
>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> potential
>>>>>>>>>>>>>>>>>>>> bugs released with the new version?
>>>>>>>>>>>>>>>>>>>> 4) If no person is actively working on
>>> the
>>>>>> issue,
>>>>>>>> who
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> re-enable
>>>>>>>>>>>>>>>>>>> it?
>>>>>>>>>>>>>>>>>>>> Would it block PRs again?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 24 Jun 2021 at 10:04, Xintong
>>> Song
>>>> <
>>>>>>>>>>>>>>> tonysong820@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Thanks all for the feedback.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> @Till @Yangze
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm also not convinced by the idea of
>>>>> having
>>>>>> an
>>>>>>>>>> exception
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>> builds.
>>>>>>>>>>>>>>>>>>>>> We need to execute the entire build
>>> (or
>>>> at
>>>>>>> least
>>>>>>>>> the
>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>> stage)
>>>>>>>>>>>>>>>>>>>>> locally, to make sure subsequent test
>>>> cases
>>>>>>>>>> prevented by
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> are all executed. In that case, it's
>>>>> probably
>>>>>>>>> easier
>>>>>>>>>> to
>>>>>>>>>>>>> rerun
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> azure than locally.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Concerning disabling unstable test
>>> cases
>>>>> that
>>>>>>>>>> regularly
>>>>>>>>>>>>> block
>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> merging, maybe we can say that such
>>> cases
>>>>> can
>>>>>>>> only
>>>>>>>>> be
>>>>>>>>>>>>>> disabled
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>> someone
>>>>>>>>>>>>>>>>>>>>> is actively looking into it, likely
>>> the
>>>>>> person
>>>>>>>> who
>>>>>>>>>>>> disabled
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>> this person is no longer actively
>>> working
>>>>> on
>>>>>>> it,
>>>>>>>>>> he/she
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> case again no matter if it is fixed or
>>>> not.
>>>>>>>>>>>>>>>>>>>>> @Jing
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for the suggestions.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +1 to provide guidelines on handling
>>> test
>>>>>>>> failures.
>>>>>>>>>>>>>>>>>>>>> 1. Report the test failures in the
>>> JIRA.
>>>>>>>>>>>>>>>>>>>>> +1 on this. Currently, the release
>>>> managers
>>>>>> are
>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> ci
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> cron
>>>>>>>>>>>>>>>>>>>>> build instabilities and reporting
>>> them on
>>>>>> JIRA.
>>>>>>>> We
>>>>>>>>>> should
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> encourage
>>>>>>>>>>>>>>>>>>>>> other contributors to do that for PRs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2. Set a deadline to find out the root
>>>>> cause
>>>>>>> and
>>>>>>>>>> solve
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> new created JIRA  because we could
>>> not
>>>>>> block
>>>>>>>>> other
>>>>>>>>>>>> commit
>>>>>>>>>>>>>>>> merges
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3. What to do if the JIRA has not made
>>>>>>>> significant
>>>>>>>>>>>> progress
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> the deadline time?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm not sure about these two. It
>>> feels a
>>>>> bit
>>>>>>>>> against
>>>>>>>>>> the
>>>>>>>>>>>>>>>> voluntary
>>>>>>>>>>>>>>>>>>> nature
>>>>>>>>>>>>>>>>>>>>> of open source projects.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> IMHO, frequent instabilities are more
>>>>> likely
>>>>>> to
>>>>>>>> be
>>>>>>>>>>>> upgraded
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> critical
>>>>>>>>>>>>>>>>>>>>> / blocker priority, receive more
>>>> attention
>>>>>> and
>>>>>>>>>> eventually
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>> fixed.
>>>>>>>>>>>>>>>>>>>>> Release managers are also responsible
>>> for
>>>>>>> looking
>>>>>>>>> for
>>>>>>>>>>>>>> assignees
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>> issues. If a case is still not fixed
>>>>> soonish,
>>>>>>>> even
>>>>>>>>>> with
>>>>>>>>>>>> all
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>> efforts,
>>>>>>>>>>>>>>>>>>>>> I'm not sure how setting a deadline
>>> can
>>>>> help
>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>> 4. If we disable the respective tests
>>>>>>>> temporarily,
>>>>>>>>> we
>>>>>>>>>>>> also
>>>>>>>>>>>>>>> need a
>>>>>>>>>>>>>>>>>>>> mechanism
>>>>>>>>>>>>>>>>>>>>>> to ensure the issue would be
>>> continued
>>>> to
>>>>>> be
>>>>>>>>>>>> investigated
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>> +1. As mentioned above, we may
>>> consider
>>>>>>> disabling
>>>>>>>>>> such
>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> iff
>>>>>>>>>>>>>>>>>>> someone
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> actively working on it.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you~
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Xintong Song
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 9:56 PM JING
>>>> ZHANG
>>>>> <
>>>>>>>>>>>>>>> beyond1920@gmail.com
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Xintong,
>>>>>>>>>>>>>>>>>>>>>> +1 to the proposal.
>>>>>>>>>>>>>>>>>>>>>> In order to better comply with the
>>>> rule,
>>>>> it
>>>>>>> is
>>>>>>>>>>>> necessary
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> describe
>>>>>>>>>>>>>>>>>>>>> what's
>>>>>>>>>>>>>>>>>>>>>> best practice if encountering test
>>>>> failure
>>>>>>>> which
>>>>>>>>>> seems
>>>>>>>>>>>>>>>> unrelated
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> current commits.
>>>>>>>>>>>>>>>>>>>>>> How to avoid merging PR with test
>>>>> failures
>>>>>>> and
>>>>>>>>> not
>>>>>>>>>>>>> blocking
>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>> merging
>>>>>>>>>>>>>>>>>>>>>> for a long time?
>>>>>>>>>>>>>>>>>>>>>> I tried to think about the possible
>>>>> steps,
>>>>>>> and
>>>>>>>>>> found
>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> detailed problems that need to be
>>>>> discussed
>>>>>>> in
>>>>>>>> a
>>>>>>>>>> step
>>>>>>>>>>>>>>> further:
>>>>>>>>>>>>>>>>>>>>>> 1. Report the test failures in the
>>>> JIRA.
>>>>>>>>>>>>>>>>>>>>>> 2. Set a deadline to find out the
>>> root
>>>>>> cause
>>>>>>>> and
>>>>>>>>>> solve
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> new created JIRA  because we could
>>> not
>>>>>> block
>>>>>>>>> other
>>>>>>>>>>>> commit
>>>>>>>>>>>>>>>> merges
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>>      When is a reasonable deadline
>>> here?
>>>>>>>>>>>>>>>>>>>>>> 3. What to do if the JIRA has not
>>> made
>>>>>>>>> significant
>>>>>>>>>>>>> progress
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> the deadline time?
>>>>>>>>>>>>>>>>>>>>>>      There are several situations as
>>>>>> follows,
>>>>>>>>> maybe
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>> different approaches.
>>>>>>>>>>>>>>>>>>>>>>      1. the JIRA is non-assigned yet
>>>>>>>>>>>>>>>>>>>>>>      2. not found the root cause yet
>>>>>>>>>>>>>>>>>>>>>>      3. not found a good solution,
>>> but
>>>>>> already
>>>>>>>>>> found the
>>>>>>>>>>>>>> root
>>>>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>>>>>>>>      4. found a solution, but it
>>> needs
>>>>> more
>>>>>>> time
>>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>>>>> done.
>>>>>>>>>>>>>>>>>>>>>> 4. If we disable the respective
>>> tests
>>>>>>>>> temporarily,
>>>>>>>>>> we
>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> need a
>>>>>>>>>>>>>>>>>>>>> mechanism
>>>>>>>>>>>>>>>>>>>>>> to ensure the issue would be
>>> continued
>>>> to
>>>>>> be
>>>>>>>>>>>> investigated
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>> JING ZHANG
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>>>>>> 于2021年6月23日周三
>>>>>>>>>>>> 下午8:16写道：
>>>>>>>>>>>>>>>>>>>>>>> +1 to Xintong's proposal
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 1:53 PM
>>> Till
>>>>>>>> Rohrmann <
>>>>>>>>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I would first try to not
>>> introduce
>>>>> the
>>>>>>>>>> exception
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>> builds.
>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>> makes
>>>>>>>>>>>>>>>>>>>>>>>> it quite hard for others to
>>> verify
>>>>> the
>>>>>>>> build
>>>>>>>>>> and to
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> right things were executed. If
>>> we
>>>> see
>>>>>>> that
>>>>>>>>> this
>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>> can revisit this idea.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 23, 2021 at 4:19 AM
>>>>> Yangze
>>>>>>> Guo
>>>>>>>> <
>>>>>>>>>>>>>>>>> karmagyz@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> +1 for appending this to
>>>> community
>>>>>>>>>> guidelines for
>>>>>>>>>>>>>>> merging
>>>>>>>>>>>>>>>>> PRs.
>>>>>>>>>>>>>>>>>>>>>>>>> @Till Rohrmann
>>>>>>>>>>>>>>>>>>>>>>>>> I agree that with this
>>> approach
>>>>>>> unstable
>>>>>>>>>> tests
>>>>>>>>>>>> will
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>> commit merges. However, it
>>> might
>>>> be
>>>>>>> hard
>>>>>>>> to
>>>>>>>>>>>> prevent
>>>>>>>>>>>>>>>> merging
>>>>>>>>>>>>>>>>>>>> commits
>>>>>>>>>>>>>>>>>>>>>>>>> that are related to those
>>> tests
>>>> and
>>>>>>>> should
>>>>>>>>>> have
>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> passed
>>>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>>>>>>> It's
>>>>>>>>>>>>>>>>>>>>>>>>> true that this judgment can be
>>>> made
>>>>>> by
>>>>>>>> the
>>>>>>>>>>>>>> committers,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>> ensure the judgment is always
>>>>> precise
>>>>>>> and
>>>>>>>>> so
>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>> discussion thread.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the unstable tests,
>>> how
>>>>>> about
>>>>>>>>>> adding
>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>> exception:
>>>>>>>>>>>>>>>>>>>>>>>>> committers verify it in their
>>>> local
>>>>>>>>>> environment
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> comment in
>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>>>>> cases?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Yangze Guo
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at 8:23
>>> PM
>>>>> 刘建刚 <
>>>>>>>>>>>>>>>>> liujiangangpeng@gmail.com
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> It is a good principle to
>>> run
>>>> all
>>>>>>> tests
>>>>>>>>>>>>>> successfully
>>>>>>>>>>>>>>>>> with any
>>>>>>>>>>>>>>>>>>>>>> change.
>>>>>>>>>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>>>>>>>>>> means a lot for project's
>>>>> stability
>>>>>>> and
>>>>>>>>>>>>>> development.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> am big
>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>>>>>>> liujiangang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>> 于2021年6月22日周二
>>>>>>>>>>>>>>>>> 下午6:36写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>> One way to address the
>>>> problem
>>>>> of
>>>>>>>>>> regularly
>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>>>>>>>>>>>>>> merging of PRs is to
>>> disable
>>>>> the
>>>>>>>>>> respective
>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>>> being.
>>>>>>>>>>>>>>>>>>>>>>>>> Of
>>>>>>>>>>>>>>>>>>>>>>>>>>> course, the failing test
>>> then
>>>>>> needs
>>>>>>>> to
>>>>>>>>> be
>>>>>>>>>>>>> fixed.
>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>> would not block everyone
>>> from
>>>>>>> making
>>>>>>>>>>>> progress.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at
>>> 12:00
>>>>> PM
>>>>>>>> Arvid
>>>>>>>>>> Heise
>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>> arvid@apache.org
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think this is overall
>>> a
>>>>> good
>>>>>>>> idea.
>>>>>>>>>> So +1
>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, I'd like to
>>> put a
>>>>>> higher
>>>>>>>>>> priority
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> infrastructure
>>>>>>>>>>>>>>>>>>>>>>> then,
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> particular docker
>>>>>> image/artifact
>>>>>>>>>> caches.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021 at
>>>> 11:50
>>>>>> AM
>>>>>>>> Till
>>>>>>>>>>>>> Rohrmann
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>> trohrmann@apache.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for bringing
>>> this
>>>>>> topic
>>>>>>> to
>>>>>>>>> our
>>>>>>>>>>>>>> attention
>>>>>>>>>>>>>>>>>>> Xintong.
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal makes a lot
>>> of
>>>>> sense
>>>>>>> and
>>>>>>>>> we
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> follow
>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>> give us
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confidence that our
>>>> changes
>>>>>> are
>>>>>>>>>> working
>>>>>>>>>>>> and
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>>>>>>>>>> incentive
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quickly fix build
>>>>>>> instabilities.
>>>>>>>>>> Hence,
>>>>>>>>>>>> +1.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 22, 2021
>>> at
>>>>> 11:12
>>>>>>> AM
>>>>>>>>>> Xintong
>>>>>>>>>>>>>> Song <
>>>>>>>>>>>>>>>>>>>>>>>>> tonysong820@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past a
>>> couple of
>>>>>>> weeks,
>>>>>>>>> I've
>>>>>>>>>>>>>> observed
>>>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>>>>>>>> times
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> merged without a
>>> green
>>>>>> light
>>>>>>>> from
>>>>>>>>>> the
>>>>>>>>>>>> CI
>>>>>>>>>>>>>>> tests,
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
>>> *unrelated*.
>>>>>> This
>>>>>>>> may
>>>>>>>>>> not
>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>>>>>>> problems,
>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase the chance
>>> of
>>>>>>> breaking
>>>>>>>>> our
>>>>>>>>>>>> code
>>>>>>>>>>>>>>> base.
>>>>>>>>>>>>>>>> In
>>>>>>>>>>>>>>>>>>> fact,
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>> occurred
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me twice in the past
>>>> few
>>>>>>> weeks
>>>>>>>>>> that I
>>>>>>>>>>>> had
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> revert a
>>>>>>>>>>>>>>>>>>>>>> commit
>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master branch
>>> due
>>>> to
>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it would be
>>>> nicer
>>>>>> to
>>>>>>>>>> enforce a
>>>>>>>>>>>>>>> stricter
>>>>>>>>>>>>>>>>> rule,
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> merged without
>>> passing
>>>>> CI.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The problems of
>>> merging
>>>>> PRs
>>>>>>>> with
>>>>>>>>>>>>>> "unrelated"
>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>> failures
>>>>>>>>>>>>>>>>>>>>>>> are:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - It's not always
>>>>>>>> straightforward
>>>>>>>>>> to
>>>>>>>>>>>> tell
>>>>>>>>>>>>>>>>> whether a
>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>> failures are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> related or not.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - It prevents
>>>> subsequent
>>>>>> test
>>>>>>>>> cases
>>>>>>>>>>>> from
>>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>>> executed,
>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>>>>>>> fail
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> relating to the PR
>>>>> changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To make things
>>> easier
>>>> for
>>>>>> the
>>>>>>>>>>>> committers,
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>>>> exceptions
>>>>>>>>>>>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
>>> acceptable.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - The PR has passed
>>> CI
>>>> in
>>>>>> the
>>>>>>>>>>>>> contributor's
>>>>>>>>>>>>>>>>> personal
>>>>>>>>>>>>>>>>>>>>>>> workspace.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> post
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the link in such
>>> cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - The CI tests have
>>>> been
>>>>>>>>> triggered
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>> times, on
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>> commit,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> each stage has at
>>> least
>>>>>>> passed
>>>>>>>>> for
>>>>>>>>>>>> once.
>>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we all agree on
>>>> this,
>>>>>> I'd
>>>>>>>>>> update the
>>>>>>>>>>>>>>>> community
>>>>>>>>>>>>>>>>>>>>>> guidelines
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> merging
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PRs wrt. this
>>> proposal.
>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know
>>> what
>>>>> do
>>>>>>> you
>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you~
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xintong Song
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best, Jingsong Lee
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Xintong Song <to...@gmail.com>.

Thanks all for the positive feedback.

I have updated the wiki page [1], and will send an announcement in a
separate thread, to draw more committers' attention.

Moreover, I've opened FLINK-23212 where we can continue with the discussion
around pure documentation changing PRs.

Thank you~

Xintong Song


[1] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests

On Wed, Jun 30, 2021 at 5:26 PM Xintong Song <to...@gmail.com> wrote:

> I second Tison's opinion.
>
> Similar to how we skip docs_404_check for PRs that do not touch the
> documentation, we can skip other stages for PRs that only contain
> documentation changes.
>
> In addition to making merging documentation PRs easier, we can also reduce
> the workload on CI workers. Especially during the last days of a release
> cycle, which is usually the most busy time for the CI workers, and is also
> where most documentation efforts take place.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> I think you are right Tison that docs are a special case and they only
>> require flink-docs to pass. What I am wondering is how much of a problem
>> this will be (assuming that we have a decent build stability). The more
>> exceptions we add, the harder it will be to properly follow the
>> guidelines.
>> Maybe we can observe how many docs PRs get delayed/not merged because of
>> this and then revisit this discussion if needed.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jun 30, 2021 at 8:30 AM tison <wa...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > There are a number of PRs modifying docs only, but we still require all
>> > tests passed on that.
>> >
>> > It is a good proposal we avoid merge PR with "unrelated" failure, but
>> can
>> > we improve the case where the contributor only works for docs?
>> >
>> > For example, base on the file change set, run doc tests only.
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：
>> >
>> > > +1 for the proposal. Thanks Xintong!
>> > >
>> > > Best,
>> > > Godfrey
>> > >
>> > >
>> > >
>> > > Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
>> > >
>> > > > Thanks Xintong. +1 to the proposal.
>> > > >
>> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com>
>> > wrote:
>> > > >
>> > > > > +1 for the proposal. Since the test time is long and environment
>> may
>> > > > vary,
>> > > > > unstable tests are really annoying for developers. The solution is
>> > > > welcome.
>> > > > >
>> > > > > Best
>> > > > > liujiangang
>> > > > >
>> > > > > Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
>> > > > >
>> > > > > > +1 Thanks Xintong for the update!
>> > > > > >
>> > > > > > Best,
>> > > > > > Jingsong
>> > > > > >
>> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
>> > trohrmann@apache.org>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > +1, thanks for updating the guidelines Xintong!
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Till
>> > > > > > >
>> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
>> karmagyz@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > +1
>> > > > > > > >
>> > > > > > > > Thanks Xintong for drafting this doc.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Yangze Guo
>> > > > > > > >
>> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
>> > beyond1920@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Thanks Xintong for giving detailed documentation.
>> > > > > > > > >
>> > > > > > > > > The best practice for handling test failure is very
>> detailed,
>> > > > it's
>> > > > > a
>> > > > > > > good
>> > > > > > > > > guidelines document with clear action steps.
>> > > > > > > > >
>> > > > > > > > > +1 to Xintong's proposal.
>> > > > > > > > >
>> > > > > > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一
>> 下午4:07写道：
>> > > > > > > > >
>> > > > > > > > > > Thanks all for the discussion.
>> > > > > > > > > >
>> > > > > > > > > > Based on the opinions so far, I've drafted the new
>> > guidelines
>> > > > > [1],
>> > > > > > > as a
>> > > > > > > > > > potential replacement of the original wiki page [2].
>> > > > > > > > > >
>> > > > > > > > > > Hopefully this draft has covered the most opinions
>> > discussed
>> > > > and
>> > > > > > > > consensus
>> > > > > > > > > > made in this discussion thread.
>> > > > > > > > > >
>> > > > > > > > > > Looking forward to your feedback.
>> > > > > > > > > >
>> > > > > > > > > > Thank you~
>> > > > > > > > > >
>> > > > > > > > > > Xintong Song
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>> > > > > > > > > >
>> > > > > > > > > > [2]
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
>> > > > > > > pnowojski@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Thanks for the clarification Till. +1 for what you
>> have
>> > > > > written.
>> > > > > > > > > > >
>> > > > > > > > > > > Piotrek
>> > > > > > > > > > >
>> > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
>> > > trohrmann@apache.org
>> > > > >
>> > > > > > > > > > napisał(a):
>> > > > > > > > > > >
>> > > > > > > > > > > > One quick note for clarification. I don't have
>> anything
>> > > > > against
>> > > > > > > > builds
>> > > > > > > > > > > > running on your personal Azure account and this is
>> not
>> > > > what I
>> > > > > > > > > > understood
>> > > > > > > > > > > > under "local environment". For me "local
>> environment"
>> > > means
>> > > > > > that
>> > > > > > > > > > someone
>> > > > > > > > > > > > runs the test locally on his machine and then says
>> that
>> > > the
>> > > > > > > > > > > > tests have passed locally.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I do agree that there might be a conflict of
>> interests
>> > > if a
>> > > > > PR
>> > > > > > > > author
>> > > > > > > > > > > > disables tests. Here I would argue that we don't
>> have
>> > > > > malignant
>> > > > > > > > > > > committers
>> > > > > > > > > > > > which means that every committer will probably first
>> > > check
>> > > > > the
>> > > > > > > > > > respective
>> > > > > > > > > > > > ticket for how often the test failed. Then I guess
>> the
>> > > next
>> > > > > > step
>> > > > > > > > would
>> > > > > > > > > > be
>> > > > > > > > > > > > to discuss on the ticket whether to disable it or
>> not.
>> > > And
>> > > > > > > finally,
>> > > > > > > > > > after
>> > > > > > > > > > > > reaching a consensus, it will be disabled. If we see
>> > > > someone
>> > > > > > > > abusing
>> > > > > > > > > > this
>> > > > > > > > > > > > policy, then we can still think about how to guard
>> > > against
>> > > > > it.
>> > > > > > > But,
>> > > > > > > > > > > > honestly, I have very rarely seen such a case. I am
>> > also
>> > > ok
>> > > > > to
>> > > > > > > > pull in
>> > > > > > > > > > > the
>> > > > > > > > > > > > release manager to make the final call if this
>> resolves
>> > > > > > concerns.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > Till
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
>> > > > > > > > pnowojski@apache.org>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > +1 for the general idea, however I have concerns
>> > about
>> > > a
>> > > > > > couple
>> > > > > > > > of
>> > > > > > > > > > > > details.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > I would first try to not introduce the exception
>> > for
>> > > > > local
>> > > > > > > > builds.
>> > > > > > > > > > > > > > It makes it quite hard for others to verify the
>> > build
>> > > > and
>> > > > > > to
>> > > > > > > > make
>> > > > > > > > > > > sure
>> > > > > > > > > > > > > that the right things were executed.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I would counter Till's proposal to ignore local
>> green
>> > > > > builds.
>> > > > > > > If
>> > > > > > > > > > > > committer
>> > > > > > > > > > > > > is merging and closing a PR, with official azure
>> > > failure,
>> > > > > but
>> > > > > > > > there
>> > > > > > > > > > > was a
>> > > > > > > > > > > > > green build before or in local azure it's IMO
>> enough
>> > to
>> > > > > leave
>> > > > > > > the
>> > > > > > > > > > > > message:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Latest build failure is a known issue:
>> FLINK-12345
>> > > > > > > > > > > > > > Green local build: URL
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > This should address Till's concern about
>> > verification.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On the other hand I have concerns about disabling
>> > > tests.*
>> > > > > It
>> > > > > > > > > > shouldn't
>> > > > > > > > > > > be
>> > > > > > > > > > > > > the PR author/committer that's disabling a test on
>> > his
>> > > > own,
>> > > > > > as
>> > > > > > > > > > that's a
>> > > > > > > > > > > > > conflict of interests*. I have however no problems
>> > with
>> > > > > > > disabling
>> > > > > > > > > > test
>> > > > > > > > > > > > > instabilities that were marked as "blockers"
>> though,
>> > > that
>> > > > > > > should
>> > > > > > > > work
>> > > > > > > > > > > > > pretty well. But the important thing here is to
>> > > correctly
>> > > > > > judge
>> > > > > > > > > > bumping
>> > > > > > > > > > > > > priorities of test instabilities based on their
>> > > frequency
>> > > > > and
>> > > > > > > > current
>> > > > > > > > > > > > > general health of the system. I believe that
>> release
>> > > > > managers
>> > > > > > > > should
>> > > > > > > > > > be
>> > > > > > > > > > > > > playing a big role here in deciding on the
>> guidelines
>> > > of
>> > > > > what
>> > > > > > > > should
>> > > > > > > > > > > be a
>> > > > > > > > > > > > > priority of certain test instabilities.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > What I mean by that is two example scenarios:
>> > > > > > > > > > > > > 1. if we have a handful of very frequently failing
>> > > tests
>> > > > > and
>> > > > > > a
>> > > > > > > > > > handful
>> > > > > > > > > > > of
>> > > > > > > > > > > > > very rarely failing tests (like one reported
>> failure
>> > > and
>> > > > no
>> > > > > > > > another
>> > > > > > > > > > > > > occurrence in many months, and let's even say that
>> > the
>> > > > > > failure
>> > > > > > > > looks
>> > > > > > > > > > > like
>> > > > > > > > > > > > > infrastructure/network timeout), we should focus
>> on
>> > the
>> > > > > > > > frequently
>> > > > > > > > > > > > failing
>> > > > > > > > > > > > > ones, and probably we are safe to ignore for the
>> time
>> > > > being
>> > > > > > the
>> > > > > > > > rare
>> > > > > > > > > > > > issues
>> > > > > > > > > > > > > - at least until we deal with the most pressing
>> ones.
>> > > > > > > > > > > > > 2. If we have tons of rarely failing test
>> > > instabilities,
>> > > > we
>> > > > > > > > should
>> > > > > > > > > > > > probably
>> > > > > > > > > > > > > start addressing them as blockers.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I'm using my own conscious and my best judgement
>> when
>> > > I'm
>> > > > > > > > > > > > > bumping/decreasing priorities of test
>> instabilities
>> > > (and
>> > > > > > bugs),
>> > > > > > > > but
>> > > > > > > > > > as
>> > > > > > > > > > > > > individual committer I don't have the full
>> picture.
>> > As
>> > > I
>> > > > > > wrote
>> > > > > > > > > > above, I
>> > > > > > > > > > > > > think release managers are in a much better
>> position
>> > to
>> > > > > keep
>> > > > > > > > > > adjusting
>> > > > > > > > > > > > > those kind of guidelines.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best, Piotrek
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
>> > > > > > napisał(a):
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > +1 for Xintong's proposal.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > For me, resolving problems directly (fixing the
>> > > > > > > infrastructure
>> > > > > > > > > > issue,
>> > > > > > > > > > > > > > disabling unstable tests and creating blocker
>> JIRAs
>> > > to
>> > > > > > track
>> > > > > > > > the
>> > > > > > > > > > fix
>> > > > > > > > > > > > and
>> > > > > > > > > > > > > > re-enable them asap, etc.) is (in most cases)
>> > better
>> > > > than
>> > > > > > > > working
>> > > > > > > > > > > > around
>> > > > > > > > > > > > > > them (verify locally, manually check and judge
>> the
>> > > > > failure
>> > > > > > as
>> > > > > > > > > > > > > "unrelated",
>> > > > > > > > > > > > > > etc.), and I believe the proposal could help us
>> > > pushing
>> > > > > > those
>> > > > > > > > more
>> > > > > > > > > > > > "real"
>> > > > > > > > > > > > > > solutions forward.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Best Regards,
>> > > > > > > > > > > > > > Yu
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
>> > > > > > karmagyz@gmail.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Creating a blocker issue for the manually
>> > disabled
>> > > > > tests
>> > > > > > > > sounds
>> > > > > > > > > > > good
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > me.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Minor: I'm still a bit worried about the
>> commits
>> > > > merged
>> > > > > > > > before we
>> > > > > > > > > > > fix
>> > > > > > > > > > > > > > > the unstable tests can also break those tests.
>> > > > Instead
>> > > > > of
>> > > > > > > > letting
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > assigners keep a look at all potentially
>> related
>> > > > > commits,
>> > > > > > > > they
>> > > > > > > > > > can
>> > > > > > > > > > > > > > > maintain a branch that is periodically synced
>> > with
>> > > > the
>> > > > > > > master
>> > > > > > > > > > > branch
>> > > > > > > > > > > > > > > while enabling the unstable test. So that they
>> > can
>> > > > > catch
>> > > > > > > the
>> > > > > > > > > > > breaking
>> > > > > > > > > > > > > > > changes asap.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > Yangze Guo
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann
>> <
>> > > > > > > > > > > trohrmann@apache.org>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I like the idea of creating a blocker issue
>> > for a
>> > > > > > > disabled
>> > > > > > > > > > test.
>> > > > > > > > > > > > This
>> > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > force us to resolve it in a timely manner
>> and
>> > it
>> > > > > won't
>> > > > > > > fall
>> > > > > > > > > > > through
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > cracks.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > Till
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li
>> <
>> > > > > > > > > > > > jingsonglee0@gmail.com>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > +1 to Xintong's proposal
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I also have some concerns about unstable
>> > cases.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I think unstable cases can be divided into
>> > > these
>> > > > > > types:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > - Force majeure: For example, network
>> > timeout,
>> > > > > sudden
>> > > > > > > > > > > > environmental
>> > > > > > > > > > > > > > > > > collapse, they are accidental and can
>> always
>> > be
>> > > > > > solved
>> > > > > > > by
>> > > > > > > > > > > > > triggering
>> > > > > > > > > > > > > > > azure
>> > > > > > > > > > > > > > > > > again. Committers should wait for the next
>> > > green
>> > > > > > azure.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > - Obvious mistakes: For example, some
>> errors
>> > > > caused
>> > > > > > by
>> > > > > > > > > > obvious
>> > > > > > > > > > > > > > reasons
>> > > > > > > > > > > > > > > may
>> > > > > > > > > > > > > > > > > be repaired quickly. At this time, do we
>> need
>> > > to
>> > > > > > wait,
>> > > > > > > > or not
>> > > > > > > > > > > > wait
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > just
>> > > > > > > > > > > > > > > > > ignore?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > - Difficult questions: These problems are
>> > very
>> > > > > > > difficult
>> > > > > > > > to
>> > > > > > > > > > > find.
>> > > > > > > > > > > > > > There
>> > > > > > > > > > > > > > > > > will be no solution for a while and a
>> half.
>> > We
>> > > > > don't
>> > > > > > > even
>> > > > > > > > > > know
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > reason.
>> > > > > > > > > > > > > > > > > At this time, we should ignore it. (Maybe
>> > it's
>> > > > > judged
>> > > > > > > by
>> > > > > > > > the
>> > > > > > > > > > > > author
>> > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > case. But what about the old case whose
>> > author
>> > > > > can't
>> > > > > > be
>> > > > > > > > > > found?)
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > So, the ignored cases should be the block
>> of
>> > > the
>> > > > > next
>> > > > > > > > release
>> > > > > > > > > > > > until
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > reason is found or the case is fixed?  We
>> > need
>> > > to
>> > > > > > > ensure
>> > > > > > > > that
>> > > > > > > > > > > > > someone
>> > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > take care of these cases, because there
>> is no
>> > > > > > deepening
>> > > > > > > > of
>> > > > > > > > > > > failed
>> > > > > > > > > > > > > > > tests, no
>> > > > > > > > > > > > > > > > > one may continue to pay attention to these
>> > > cases.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I think this guideline should consider
>> these
>> > > > > > > situations,
>> > > > > > > > and
>> > > > > > > > > > > show
>> > > > > > > > > > > > > how
>> > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > solve them.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > > > Jingsong
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
>> > > > > > > > imjark@gmail.com>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks to Xintong for bringing up this
>> > topic,
>> > > > I'm
>> > > > > > +1
>> > > > > > > in
>> > > > > > > > > > > > general.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > However, I think it's still not very
>> clear
>> > > how
>> > > > we
>> > > > > > > > address
>> > > > > > > > > > the
>> > > > > > > > > > > > > > > unstable
>> > > > > > > > > > > > > > > > > > tests.
>> > > > > > > > > > > > > > > > > > I think this is a very important part of
>> > this
>> > > > new
>> > > > > > > > > > guideline.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > According to the discussion above, if
>> some
>> > > > tests
>> > > > > > are
>> > > > > > > > > > > unstable,
>> > > > > > > > > > > > we
>> > > > > > > > > > > > > > can
>> > > > > > > > > > > > > > > > > > manually disable it.
>> > > > > > > > > > > > > > > > > > But I have some questions in my mind:
>> > > > > > > > > > > > > > > > > > 1) Is the instability judged by the
>> > committer
>> > > > > > > > themselves or
>> > > > > > > > > > > by
>> > > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > > > metrics?
>> > > > > > > > > > > > > > > > > > 2) Should we log the disable commit in
>> the
>> > > > > > > > corresponding
>> > > > > > > > > > > issue
>> > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > increase
>> > > > > > > > > > > > > > > > > > the priority?
>> > > > > > > > > > > > > > > > > > 3) What if nobody looks into this issue
>> and
>> > > > this
>> > > > > > > > becomes
>> > > > > > > > > > some
>> > > > > > > > > > > > > > > potential
>> > > > > > > > > > > > > > > > > > bugs released with the new version?
>> > > > > > > > > > > > > > > > > > 4) If no person is actively working on
>> the
>> > > > issue,
>> > > > > > who
>> > > > > > > > > > should
>> > > > > > > > > > > > > > > re-enable
>> > > > > > > > > > > > > > > > > it?
>> > > > > > > > > > > > > > > > > > Would it block PRs again?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > > > > Jark
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong
>> Song
>> > <
>> > > > > > > > > > > > > tonysong820@gmail.com>
>> > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thanks all for the feedback.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > @Till @Yangze
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I'm also not convinced by the idea of
>> > > having
>> > > > an
>> > > > > > > > exception
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > local
>> > > > > > > > > > > > > > > > > > builds.
>> > > > > > > > > > > > > > > > > > > We need to execute the entire build
>> (or
>> > at
>> > > > > least
>> > > > > > > the
>> > > > > > > > > > > failing
>> > > > > > > > > > > > > > stage)
>> > > > > > > > > > > > > > > > > > > locally, to make sure subsequent test
>> > cases
>> > > > > > > > prevented by
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > failure
>> > > > > > > > > > > > > > > > > one
>> > > > > > > > > > > > > > > > > > > are all executed. In that case, it's
>> > > probably
>> > > > > > > easier
>> > > > > > > > to
>> > > > > > > > > > > rerun
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > build
>> > > > > > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > > > > azure than locally.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Concerning disabling unstable test
>> cases
>> > > that
>> > > > > > > > regularly
>> > > > > > > > > > > block
>> > > > > > > > > > > > > PRs
>> > > > > > > > > > > > > > > from
>> > > > > > > > > > > > > > > > > > > merging, maybe we can say that such
>> cases
>> > > can
>> > > > > > only
>> > > > > > > be
>> > > > > > > > > > > > disabled
>> > > > > > > > > > > > > > when
>> > > > > > > > > > > > > > > > > > someone
>> > > > > > > > > > > > > > > > > > > is actively looking into it, likely
>> the
>> > > > person
>> > > > > > who
>> > > > > > > > > > disabled
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > case.
>> > > > > > > > > > > > > > > > > If
>> > > > > > > > > > > > > > > > > > > this person is no longer actively
>> working
>> > > on
>> > > > > it,
>> > > > > > > > he/she
>> > > > > > > > > > > > should
>> > > > > > > > > > > > > > > enable
>> > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > case again no matter if it is fixed or
>> > not.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > @Jing
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thanks for the suggestions.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > +1 to provide guidelines on handling
>> test
>> > > > > > failures.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 1. Report the test failures in the
>> JIRA.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > +1 on this. Currently, the release
>> > managers
>> > > > are
>> > > > > > > > > > monitoring
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > ci
>> > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > cron
>> > > > > > > > > > > > > > > > > > > build instabilities and reporting
>> them on
>> > > > JIRA.
>> > > > > > We
>> > > > > > > > should
>> > > > > > > > > > > > also
>> > > > > > > > > > > > > > > > > encourage
>> > > > > > > > > > > > > > > > > > > other contributors to do that for PRs.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
>> > > cause
>> > > > > and
>> > > > > > > > solve
>> > > > > > > > > > the
>> > > > > > > > > > > > > > failure
>> > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > new created JIRA  because we could
>> not
>> > > > block
>> > > > > > > other
>> > > > > > > > > > commit
>> > > > > > > > > > > > > > merges
>> > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > > > > long
>> > > > > > > > > > > > > > > > > > > > time
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
>> > > > > > significant
>> > > > > > > > > > progress
>> > > > > > > > > > > > > when
>> > > > > > > > > > > > > > > > > reached
>> > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > the deadline time?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I'm not sure about these two. It
>> feels a
>> > > bit
>> > > > > > > against
>> > > > > > > > the
>> > > > > > > > > > > > > > voluntary
>> > > > > > > > > > > > > > > > > nature
>> > > > > > > > > > > > > > > > > > > of open source projects.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > IMHO, frequent instabilities are more
>> > > likely
>> > > > to
>> > > > > > be
>> > > > > > > > > > upgraded
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > critical
>> > > > > > > > > > > > > > > > > > > / blocker priority, receive more
>> > attention
>> > > > and
>> > > > > > > > eventually
>> > > > > > > > > > > get
>> > > > > > > > > > > > > > > fixed.
>> > > > > > > > > > > > > > > > > > > Release managers are also responsible
>> for
>> > > > > looking
>> > > > > > > for
>> > > > > > > > > > > > assignees
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > such
>> > > > > > > > > > > > > > > > > > > issues. If a case is still not fixed
>> > > soonish,
>> > > > > > even
>> > > > > > > > with
>> > > > > > > > > > all
>> > > > > > > > > > > > > these
>> > > > > > > > > > > > > > > > > > efforts,
>> > > > > > > > > > > > > > > > > > > I'm not sure how setting a deadline
>> can
>> > > help
>> > > > > > this.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
>> > > > > > temporarily,
>> > > > > > > we
>> > > > > > > > > > also
>> > > > > > > > > > > > > need a
>> > > > > > > > > > > > > > > > > > mechanism
>> > > > > > > > > > > > > > > > > > > > to ensure the issue would be
>> continued
>> > to
>> > > > be
>> > > > > > > > > > investigated
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > future.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > +1. As mentioned above, we may
>> consider
>> > > > > disabling
>> > > > > > > > such
>> > > > > > > > > > > tests
>> > > > > > > > > > > > > iff
>> > > > > > > > > > > > > > > > > someone
>> > > > > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > > > actively working on it.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thank you~
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Xintong Song
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING
>> > ZHANG
>> > > <
>> > > > > > > > > > > > > beyond1920@gmail.com
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Hi Xintong,
>> > > > > > > > > > > > > > > > > > > > +1 to the proposal.
>> > > > > > > > > > > > > > > > > > > > In order to better comply with the
>> > rule,
>> > > it
>> > > > > is
>> > > > > > > > > > necessary
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > > describe
>> > > > > > > > > > > > > > > > > > > what's
>> > > > > > > > > > > > > > > > > > > > best practice if encountering test
>> > > failure
>> > > > > > which
>> > > > > > > > seems
>> > > > > > > > > > > > > > unrelated
>> > > > > > > > > > > > > > > with
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > current commits.
>> > > > > > > > > > > > > > > > > > > > How to avoid merging PR with test
>> > > failures
>> > > > > and
>> > > > > > > not
>> > > > > > > > > > > blocking
>> > > > > > > > > > > > > > code
>> > > > > > > > > > > > > > > > > > merging
>> > > > > > > > > > > > > > > > > > > > for a long time?
>> > > > > > > > > > > > > > > > > > > > I tried to think about the possible
>> > > steps,
>> > > > > and
>> > > > > > > > found
>> > > > > > > > > > > there
>> > > > > > > > > > > > > are
>> > > > > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > > > > > detailed problems that need to be
>> > > discussed
>> > > > > in
>> > > > > > a
>> > > > > > > > step
>> > > > > > > > > > > > > further:
>> > > > > > > > > > > > > > > > > > > > 1. Report the test failures in the
>> > JIRA.
>> > > > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the
>> root
>> > > > cause
>> > > > > > and
>> > > > > > > > solve
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > failure
>> > > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > new created JIRA  because we could
>> not
>> > > > block
>> > > > > > > other
>> > > > > > > > > > commit
>> > > > > > > > > > > > > > merges
>> > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > > > > long
>> > > > > > > > > > > > > > > > > > > > time
>> > > > > > > > > > > > > > > > > > > >     When is a reasonable deadline
>> here?
>> > > > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not
>> made
>> > > > > > > significant
>> > > > > > > > > > > progress
>> > > > > > > > > > > > > > when
>> > > > > > > > > > > > > > > > > > reached
>> > > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > the deadline time?
>> > > > > > > > > > > > > > > > > > > >     There are several situations as
>> > > > follows,
>> > > > > > > maybe
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > > cases
>> > > > > > > > > > > > > > > > > need
>> > > > > > > > > > > > > > > > > > > > different approaches.
>> > > > > > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
>> > > > > > > > > > > > > > > > > > > >     2. not found the root cause yet
>> > > > > > > > > > > > > > > > > > > >     3. not found a good solution,
>> but
>> > > > already
>> > > > > > > > found the
>> > > > > > > > > > > > root
>> > > > > > > > > > > > > > > cause
>> > > > > > > > > > > > > > > > > > > >     4. found a solution, but it
>> needs
>> > > more
>> > > > > time
>> > > > > > > to
>> > > > > > > > be
>> > > > > > > > > > > done.
>> > > > > > > > > > > > > > > > > > > > 4. If we disable the respective
>> tests
>> > > > > > > temporarily,
>> > > > > > > > we
>> > > > > > > > > > > also
>> > > > > > > > > > > > > > need a
>> > > > > > > > > > > > > > > > > > > mechanism
>> > > > > > > > > > > > > > > > > > > > to ensure the issue would be
>> continued
>> > to
>> > > > be
>> > > > > > > > > > investigated
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > future.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Best regards,
>> > > > > > > > > > > > > > > > > > > > JING ZHANG
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
>> > > > > 于2021年6月23日周三
>> > > > > > > > > > 下午8:16写道：
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM
>> Till
>> > > > > > Rohrmann <
>> > > > > > > > > > > > > > > > > trohrmann@apache.org>
>> > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > I would first try to not
>> introduce
>> > > the
>> > > > > > > > exception
>> > > > > > > > > > for
>> > > > > > > > > > > > > local
>> > > > > > > > > > > > > > > > > builds.
>> > > > > > > > > > > > > > > > > > It
>> > > > > > > > > > > > > > > > > > > > > makes
>> > > > > > > > > > > > > > > > > > > > > > it quite hard for others to
>> verify
>> > > the
>> > > > > > build
>> > > > > > > > and to
>> > > > > > > > > > > > make
>> > > > > > > > > > > > > > sure
>> > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > > > right things were executed. If
>> we
>> > see
>> > > > > that
>> > > > > > > this
>> > > > > > > > > > > becomes
>> > > > > > > > > > > > > an
>> > > > > > > > > > > > > > > issue
>> > > > > > > > > > > > > > > > > > then
>> > > > > > > > > > > > > > > > > > > > we
>> > > > > > > > > > > > > > > > > > > > > > can revisit this idea.
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > > > > Till
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM
>> > > Yangze
>> > > > > Guo
>> > > > > > <
>> > > > > > > > > > > > > > > karmagyz@gmail.com>
>> > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > +1 for appending this to
>> > community
>> > > > > > > > guidelines for
>> > > > > > > > > > > > > merging
>> > > > > > > > > > > > > > > PRs.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > @Till Rohrmann
>> > > > > > > > > > > > > > > > > > > > > > > I agree that with this
>> approach
>> > > > > unstable
>> > > > > > > > tests
>> > > > > > > > > > will
>> > > > > > > > > > > > not
>> > > > > > > > > > > > > > > block
>> > > > > > > > > > > > > > > > > > other
>> > > > > > > > > > > > > > > > > > > > > > > commit merges. However, it
>> might
>> > be
>> > > > > hard
>> > > > > > to
>> > > > > > > > > > prevent
>> > > > > > > > > > > > > > merging
>> > > > > > > > > > > > > > > > > > commits
>> > > > > > > > > > > > > > > > > > > > > > > that are related to those
>> tests
>> > and
>> > > > > > should
>> > > > > > > > have
>> > > > > > > > > > > been
>> > > > > > > > > > > > > > passed
>> > > > > > > > > > > > > > > > > them.
>> > > > > > > > > > > > > > > > > > > > It's
>> > > > > > > > > > > > > > > > > > > > > > > true that this judgment can be
>> > made
>> > > > by
>> > > > > > the
>> > > > > > > > > > > > committers,
>> > > > > > > > > > > > > > but
>> > > > > > > > > > > > > > > no
>> > > > > > > > > > > > > > > > > one
>> > > > > > > > > > > > > > > > > > > can
>> > > > > > > > > > > > > > > > > > > > > > > ensure the judgment is always
>> > > precise
>> > > > > and
>> > > > > > > so
>> > > > > > > > that
>> > > > > > > > > > > we
>> > > > > > > > > > > > > have
>> > > > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > > > > > > > > > discussion thread.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Regarding the unstable tests,
>> how
>> > > > about
>> > > > > > > > adding
>> > > > > > > > > > > > another
>> > > > > > > > > > > > > > > > > exception:
>> > > > > > > > > > > > > > > > > > > > > > > committers verify it in their
>> > local
>> > > > > > > > environment
>> > > > > > > > > > and
>> > > > > > > > > > > > > > > comment in
>> > > > > > > > > > > > > > > > > > such
>> > > > > > > > > > > > > > > > > > > > > > > cases?
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23
>> PM
>> > > 刘建刚 <
>> > > > > > > > > > > > > > > liujiangangpeng@gmail.com
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > It is a good principle to
>> run
>> > all
>> > > > > tests
>> > > > > > > > > > > > successfully
>> > > > > > > > > > > > > > > with any
>> > > > > > > > > > > > > > > > > > > > change.
>> > > > > > > > > > > > > > > > > > > > > > > This
>> > > > > > > > > > > > > > > > > > > > > > > > means a lot for project's
>> > > stability
>> > > > > and
>> > > > > > > > > > > > development.
>> > > > > > > > > > > > > I
>> > > > > > > > > > > > > > > am big
>> > > > > > > > > > > > > > > > > > +1
>> > > > > > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > > > > > > > > > > proposal.
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Best
>> > > > > > > > > > > > > > > > > > > > > > > > liujiangang
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Till Rohrmann <
>> > > > trohrmann@apache.org>
>> > > > > > > > > > > 于2021年6月22日周二
>> > > > > > > > > > > > > > > 下午6:36写道：
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > One way to address the
>> > problem
>> > > of
>> > > > > > > > regularly
>> > > > > > > > > > > > failing
>> > > > > > > > > > > > > > > tests
>> > > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > > > > block
>> > > > > > > > > > > > > > > > > > > > > > > > > merging of PRs is to
>> disable
>> > > the
>> > > > > > > > respective
>> > > > > > > > > > > tests
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > time
>> > > > > > > > > > > > > > > > > > > > > being.
>> > > > > > > > > > > > > > > > > > > > > > > Of
>> > > > > > > > > > > > > > > > > > > > > > > > > course, the failing test
>> then
>> > > > needs
>> > > > > > to
>> > > > > > > be
>> > > > > > > > > > > fixed.
>> > > > > > > > > > > > > But
>> > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > least
>> > > > > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > > > > > way
>> > > > > > > > > > > > > > > > > > > > > > > we
>> > > > > > > > > > > > > > > > > > > > > > > > > would not block everyone
>> from
>> > > > > making
>> > > > > > > > > > progress.
>> > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > > > > > > > Till
>> > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
>> 12:00
>> > > PM
>> > > > > > Arvid
>> > > > > > > > Heise
>> > > > > > > > > > <
>> > > > > > > > > > > > > > > > > > arvid@apache.org
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > I think this is overall
>> a
>> > > good
>> > > > > > idea.
>> > > > > > > > So +1
>> > > > > > > > > > > from
>> > > > > > > > > > > > > my
>> > > > > > > > > > > > > > > side.
>> > > > > > > > > > > > > > > > > > > > > > > > > > However, I'd like to
>> put a
>> > > > higher
>> > > > > > > > priority
>> > > > > > > > > > on
>> > > > > > > > > > > > > > > > > > infrastructure
>> > > > > > > > > > > > > > > > > > > > > then,
>> > > > > > > > > > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > > > > > > > > > > > particular docker
>> > > > image/artifact
>> > > > > > > > caches.
>> > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
>> > 11:50
>> > > > AM
>> > > > > > Till
>> > > > > > > > > > > Rohrmann
>> > > > > > > > > > > > <
>> > > > > > > > > > > > > > > > > > > > > > trohrmann@apache.org
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing
>> this
>> > > > topic
>> > > > > to
>> > > > > > > our
>> > > > > > > > > > > > attention
>> > > > > > > > > > > > > > > > > Xintong.
>> > > > > > > > > > > > > > > > > > I
>> > > > > > > > > > > > > > > > > > > > > think
>> > > > > > > > > > > > > > > > > > > > > > > your
>> > > > > > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot
>> of
>> > > sense
>> > > > > and
>> > > > > > > we
>> > > > > > > > > > should
>> > > > > > > > > > > > > > follow
>> > > > > > > > > > > > > > > it.
>> > > > > > > > > > > > > > > > > It
>> > > > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > > > > give us
>> > > > > > > > > > > > > > > > > > > > > > > > > > > confidence that our
>> > changes
>> > > > are
>> > > > > > > > working
>> > > > > > > > > > and
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > > > might
>> > > > > > > > > > > > > > > > > be a
>> > > > > > > > > > > > > > > > > > > > good
>> > > > > > > > > > > > > > > > > > > > > > > > > incentive
>> > > > > > > > > > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > > > > > > > quickly fix build
>> > > > > instabilities.
>> > > > > > > > Hence,
>> > > > > > > > > > +1.
>> > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
>> > > > > > > > > > > > > > > > > > > > > > > > > > > Till
>> > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021
>> at
>> > > 11:12
>> > > > > AM
>> > > > > > > > Xintong
>> > > > > > > > > > > > Song <
>> > > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
>> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > In the past a
>> couple of
>> > > > > weeks,
>> > > > > > > I've
>> > > > > > > > > > > > observed
>> > > > > > > > > > > > > > > several
>> > > > > > > > > > > > > > > > > > > times
>> > > > > > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > > > > > > PRs
>> > > > > > > > > > > > > > > > > > > > > > > > > are
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > merged without a
>> green
>> > > > light
>> > > > > > from
>> > > > > > > > the
>> > > > > > > > > > CI
>> > > > > > > > > > > > > tests,
>> > > > > > > > > > > > > > > where
>> > > > > > > > > > > > > > > > > > > > failure
>> > > > > > > > > > > > > > > > > > > > > > > cases
>> > > > > > > > > > > > > > > > > > > > > > > > > are
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > considered
>> *unrelated*.
>> > > > This
>> > > > > > may
>> > > > > > > > not
>> > > > > > > > > > > always
>> > > > > > > > > > > > > > cause
>> > > > > > > > > > > > > > > > > > > problems,
>> > > > > > > > > > > > > > > > > > > > > but
>> > > > > > > > > > > > > > > > > > > > > > > would
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > increase the chance
>> of
>> > > > > breaking
>> > > > > > > our
>> > > > > > > > > > code
>> > > > > > > > > > > > > base.
>> > > > > > > > > > > > > > In
>> > > > > > > > > > > > > > > > > fact,
>> > > > > > > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > > > > > > has
>> > > > > > > > > > > > > > > > > > > > > > > > > occurred
>> > > > > > > > > > > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > me twice in the past
>> > few
>> > > > > weeks
>> > > > > > > > that I
>> > > > > > > > > > had
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > > revert a
>> > > > > > > > > > > > > > > > > > > > commit
>> > > > > > > > > > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > > > > > > > > > > > breaks
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > the master branch
>> due
>> > to
>> > > > > this.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > I think it would be
>> > nicer
>> > > > to
>> > > > > > > > enforce a
>> > > > > > > > > > > > > stricter
>> > > > > > > > > > > > > > > rule,
>> > > > > > > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > > > > no
>> > > > > > > > > > > > > > > > > > > > > > PRs
>> > > > > > > > > > > > > > > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > merged without
>> passing
>> > > CI.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > The problems of
>> merging
>> > > PRs
>> > > > > > with
>> > > > > > > > > > > > "unrelated"
>> > > > > > > > > > > > > > test
>> > > > > > > > > > > > > > > > > > > failures
>> > > > > > > > > > > > > > > > > > > > > are:
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > - It's not always
>> > > > > > straightforward
>> > > > > > > > to
>> > > > > > > > > > tell
>> > > > > > > > > > > > > > > whether a
>> > > > > > > > > > > > > > > > > > test
>> > > > > > > > > > > > > > > > > > > > > > > failures are
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > related or not.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > - It prevents
>> > subsequent
>> > > > test
>> > > > > > > cases
>> > > > > > > > > > from
>> > > > > > > > > > > > > being
>> > > > > > > > > > > > > > > > > > executed,
>> > > > > > > > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > > > > > > > > may
>> > > > > > > > > > > > > > > > > > > > > > > > > fail
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > relating to the PR
>> > > changes.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > To make things
>> easier
>> > for
>> > > > the
>> > > > > > > > > > committers,
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > following
>> > > > > > > > > > > > > > > > > > > > > > > exceptions
>> > > > > > > > > > > > > > > > > > > > > > > > > > might
>> > > > > > > > > > > > > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > considered
>> acceptable.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed
>> CI
>> > in
>> > > > the
>> > > > > > > > > > > contributor's
>> > > > > > > > > > > > > > > personal
>> > > > > > > > > > > > > > > > > > > > > workspace.
>> > > > > > > > > > > > > > > > > > > > > > > > > Please
>> > > > > > > > > > > > > > > > > > > > > > > > > > > post
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > the link in such
>> cases.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have
>> > been
>> > > > > > > triggered
>> > > > > > > > > > > multiple
>> > > > > > > > > > > > > > > times, on
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > > same
>> > > > > > > > > > > > > > > > > > > > > > > > > commit,
>> > > > > > > > > > > > > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > each stage has at
>> least
>> > > > > passed
>> > > > > > > for
>> > > > > > > > > > once.
>> > > > > > > > > > > > > Please
>> > > > > > > > > > > > > > > also
>> > > > > > > > > > > > > > > > > > > > comment
>> > > > > > > > > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > > > > > > > > such
>> > > > > > > > > > > > > > > > > > > > > > > > > > > cases.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > If we all agree on
>> > this,
>> > > > I'd
>> > > > > > > > update the
>> > > > > > > > > > > > > > community
>> > > > > > > > > > > > > > > > > > > > guidelines
>> > > > > > > > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > > > > > > > > > merging
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this
>> proposal.
>> > > [1]
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know
>> what
>> > > do
>> > > > > you
>> > > > > > > > think.
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>> > > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > > > Best, Jingsong Lee
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best, Jingsong Lee
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best regards!
>> > > > Rui Li
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Xintong Song <to...@gmail.com>.

I second Tison's opinion.

Similar to how we skip docs_404_check for PRs that do not touch the
documentation, we can skip other stages for PRs that only contain
documentation changes.

In addition to making merging documentation PRs easier, we can also reduce
the workload on CI workers. Especially during the last days of a release
cycle, which is usually the most busy time for the CI workers, and is also
where most documentation efforts take place.

Thank you~

Xintong Song



On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann <tr...@apache.org> wrote:

> I think you are right Tison that docs are a special case and they only
> require flink-docs to pass. What I am wondering is how much of a problem
> this will be (assuming that we have a decent build stability). The more
> exceptions we add, the harder it will be to properly follow the guidelines.
> Maybe we can observe how many docs PRs get delayed/not merged because of
> this and then revisit this discussion if needed.
>
> Cheers,
> Till
>
> On Wed, Jun 30, 2021 at 8:30 AM tison <wa...@gmail.com> wrote:
>
> > Hi,
> >
> > There are a number of PRs modifying docs only, but we still require all
> > tests passed on that.
> >
> > It is a good proposal we avoid merge PR with "unrelated" failure, but can
> > we improve the case where the contributor only works for docs?
> >
> > For example, base on the file change set, run doc tests only.
> >
> > Best,
> > tison.
> >
> >
> > godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：
> >
> > > +1 for the proposal. Thanks Xintong!
> > >
> > > Best,
> > > Godfrey
> > >
> > >
> > >
> > > Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
> > >
> > > > Thanks Xintong. +1 to the proposal.
> > > >
> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com>
> > wrote:
> > > >
> > > > > +1 for the proposal. Since the test time is long and environment
> may
> > > > vary,
> > > > > unstable tests are really annoying for developers. The solution is
> > > > welcome.
> > > > >
> > > > > Best
> > > > > liujiangang
> > > > >
> > > > > Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
> > > > >
> > > > > > +1 Thanks Xintong for the update!
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Thanks Xintong for drafting this doc.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> > beyond1920@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > > >
> > > > > > > > > The best practice for handling test failure is very
> detailed,
> > > > it's
> > > > > a
> > > > > > > good
> > > > > > > > > guidelines document with clear action steps.
> > > > > > > > >
> > > > > > > > > +1 to Xintong's proposal.
> > > > > > > > >
> > > > > > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一
> 下午4:07写道：
> > > > > > > > >
> > > > > > > > > > Thanks all for the discussion.
> > > > > > > > > >
> > > > > > > > > > Based on the opinions so far, I've drafted the new
> > guidelines
> > > > > [1],
> > > > > > > as a
> > > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > > >
> > > > > > > > > > Hopefully this draft has covered the most opinions
> > discussed
> > > > and
> > > > > > > > consensus
> > > > > > > > > > made in this discussion thread.
> > > > > > > > > >
> > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > > >
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > > pnowojski@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > > written.
> > > > > > > > > > >
> > > > > > > > > > > Piotrek
> > > > > > > > > > >
> > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > > trohrmann@apache.org
> > > > >
> > > > > > > > > > napisał(a):
> > > > > > > > > > >
> > > > > > > > > > > > One quick note for clarification. I don't have
> anything
> > > > > against
> > > > > > > > builds
> > > > > > > > > > > > running on your personal Azure account and this is
> not
> > > > what I
> > > > > > > > > > understood
> > > > > > > > > > > > under "local environment". For me "local environment"
> > > means
> > > > > > that
> > > > > > > > > > someone
> > > > > > > > > > > > runs the test locally on his machine and then says
> that
> > > the
> > > > > > > > > > > > tests have passed locally.
> > > > > > > > > > > >
> > > > > > > > > > > > I do agree that there might be a conflict of
> interests
> > > if a
> > > > > PR
> > > > > > > > author
> > > > > > > > > > > > disables tests. Here I would argue that we don't have
> > > > > malignant
> > > > > > > > > > > committers
> > > > > > > > > > > > which means that every committer will probably first
> > > check
> > > > > the
> > > > > > > > > > respective
> > > > > > > > > > > > ticket for how often the test failed. Then I guess
> the
> > > next
> > > > > > step
> > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > to discuss on the ticket whether to disable it or
> not.
> > > And
> > > > > > > finally,
> > > > > > > > > > after
> > > > > > > > > > > > reaching a consensus, it will be disabled. If we see
> > > > someone
> > > > > > > > abusing
> > > > > > > > > > this
> > > > > > > > > > > > policy, then we can still think about how to guard
> > > against
> > > > > it.
> > > > > > > But,
> > > > > > > > > > > > honestly, I have very rarely seen such a case. I am
> > also
> > > ok
> > > > > to
> > > > > > > > pull in
> > > > > > > > > > > the
> > > > > > > > > > > > release manager to make the final call if this
> resolves
> > > > > > concerns.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > > > > pnowojski@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 for the general idea, however I have concerns
> > about
> > > a
> > > > > > couple
> > > > > > > > of
> > > > > > > > > > > > details.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I would first try to not introduce the exception
> > for
> > > > > local
> > > > > > > > builds.
> > > > > > > > > > > > > > It makes it quite hard for others to verify the
> > build
> > > > and
> > > > > > to
> > > > > > > > make
> > > > > > > > > > > sure
> > > > > > > > > > > > > that the right things were executed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I would counter Till's proposal to ignore local
> green
> > > > > builds.
> > > > > > > If
> > > > > > > > > > > > committer
> > > > > > > > > > > > > is merging and closing a PR, with official azure
> > > failure,
> > > > > but
> > > > > > > > there
> > > > > > > > > > > was a
> > > > > > > > > > > > > green build before or in local azure it's IMO
> enough
> > to
> > > > > leave
> > > > > > > the
> > > > > > > > > > > > message:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Latest build failure is a known issue:
> FLINK-12345
> > > > > > > > > > > > > > Green local build: URL
> > > > > > > > > > > > >
> > > > > > > > > > > > > This should address Till's concern about
> > verification.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On the other hand I have concerns about disabling
> > > tests.*
> > > > > It
> > > > > > > > > > shouldn't
> > > > > > > > > > > be
> > > > > > > > > > > > > the PR author/committer that's disabling a test on
> > his
> > > > own,
> > > > > > as
> > > > > > > > > > that's a
> > > > > > > > > > > > > conflict of interests*. I have however no problems
> > with
> > > > > > > disabling
> > > > > > > > > > test
> > > > > > > > > > > > > instabilities that were marked as "blockers"
> though,
> > > that
> > > > > > > should
> > > > > > > > work
> > > > > > > > > > > > > pretty well. But the important thing here is to
> > > correctly
> > > > > > judge
> > > > > > > > > > bumping
> > > > > > > > > > > > > priorities of test instabilities based on their
> > > frequency
> > > > > and
> > > > > > > > current
> > > > > > > > > > > > > general health of the system. I believe that
> release
> > > > > managers
> > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > > > playing a big role here in deciding on the
> guidelines
> > > of
> > > > > what
> > > > > > > > should
> > > > > > > > > > > be a
> > > > > > > > > > > > > priority of certain test instabilities.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > > > > > > 1. if we have a handful of very frequently failing
> > > tests
> > > > > and
> > > > > > a
> > > > > > > > > > handful
> > > > > > > > > > > of
> > > > > > > > > > > > > very rarely failing tests (like one reported
> failure
> > > and
> > > > no
> > > > > > > > another
> > > > > > > > > > > > > occurrence in many months, and let's even say that
> > the
> > > > > > failure
> > > > > > > > looks
> > > > > > > > > > > like
> > > > > > > > > > > > > infrastructure/network timeout), we should focus on
> > the
> > > > > > > > frequently
> > > > > > > > > > > > failing
> > > > > > > > > > > > > ones, and probably we are safe to ignore for the
> time
> > > > being
> > > > > > the
> > > > > > > > rare
> > > > > > > > > > > > issues
> > > > > > > > > > > > > - at least until we deal with the most pressing
> ones.
> > > > > > > > > > > > > 2. If we have tons of rarely failing test
> > > instabilities,
> > > > we
> > > > > > > > should
> > > > > > > > > > > > probably
> > > > > > > > > > > > > start addressing them as blockers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm using my own conscious and my best judgement
> when
> > > I'm
> > > > > > > > > > > > > bumping/decreasing priorities of test instabilities
> > > (and
> > > > > > bugs),
> > > > > > > > but
> > > > > > > > > > as
> > > > > > > > > > > > > individual committer I don't have the full picture.
> > As
> > > I
> > > > > > wrote
> > > > > > > > > > above, I
> > > > > > > > > > > > > think release managers are in a much better
> position
> > to
> > > > > keep
> > > > > > > > > > adjusting
> > > > > > > > > > > > > those kind of guidelines.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best, Piotrek
> > > > > > > > > > > > >
> > > > > > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> > > > > > napisał(a):
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For me, resolving problems directly (fixing the
> > > > > > > infrastructure
> > > > > > > > > > issue,
> > > > > > > > > > > > > > disabling unstable tests and creating blocker
> JIRAs
> > > to
> > > > > > track
> > > > > > > > the
> > > > > > > > > > fix
> > > > > > > > > > > > and
> > > > > > > > > > > > > > re-enable them asap, etc.) is (in most cases)
> > better
> > > > than
> > > > > > > > working
> > > > > > > > > > > > around
> > > > > > > > > > > > > > them (verify locally, manually check and judge
> the
> > > > > failure
> > > > > > as
> > > > > > > > > > > > > "unrelated",
> > > > > > > > > > > > > > etc.), and I believe the proposal could help us
> > > pushing
> > > > > > those
> > > > > > > > more
> > > > > > > > > > > > "real"
> > > > > > > > > > > > > > solutions forward.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best Regards,
> > > > > > > > > > > > > > Yu
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> > > > > > karmagyz@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Creating a blocker issue for the manually
> > disabled
> > > > > tests
> > > > > > > > sounds
> > > > > > > > > > > good
> > > > > > > > > > > > to
> > > > > > > > > > > > > > me.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Minor: I'm still a bit worried about the
> commits
> > > > merged
> > > > > > > > before we
> > > > > > > > > > > fix
> > > > > > > > > > > > > > > the unstable tests can also break those tests.
> > > > Instead
> > > > > of
> > > > > > > > letting
> > > > > > > > > > > the
> > > > > > > > > > > > > > > assigners keep a look at all potentially
> related
> > > > > commits,
> > > > > > > > they
> > > > > > > > > > can
> > > > > > > > > > > > > > > maintain a branch that is periodically synced
> > with
> > > > the
> > > > > > > master
> > > > > > > > > > > branch
> > > > > > > > > > > > > > > while enabling the unstable test. So that they
> > can
> > > > > catch
> > > > > > > the
> > > > > > > > > > > breaking
> > > > > > > > > > > > > > > changes asap.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I like the idea of creating a blocker issue
> > for a
> > > > > > > disabled
> > > > > > > > > > test.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > force us to resolve it in a timely manner and
> > it
> > > > > won't
> > > > > > > fall
> > > > > > > > > > > through
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > cracks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I also have some concerns about unstable
> > cases.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think unstable cases can be divided into
> > > these
> > > > > > types:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Force majeure: For example, network
> > timeout,
> > > > > sudden
> > > > > > > > > > > > environmental
> > > > > > > > > > > > > > > > > collapse, they are accidental and can
> always
> > be
> > > > > > solved
> > > > > > > by
> > > > > > > > > > > > > triggering
> > > > > > > > > > > > > > > azure
> > > > > > > > > > > > > > > > > again. Committers should wait for the next
> > > green
> > > > > > azure.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Obvious mistakes: For example, some
> errors
> > > > caused
> > > > > > by
> > > > > > > > > > obvious
> > > > > > > > > > > > > > reasons
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > be repaired quickly. At this time, do we
> need
> > > to
> > > > > > wait,
> > > > > > > > or not
> > > > > > > > > > > > wait
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > ignore?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Difficult questions: These problems are
> > very
> > > > > > > difficult
> > > > > > > > to
> > > > > > > > > > > find.
> > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > will be no solution for a while and a half.
> > We
> > > > > don't
> > > > > > > even
> > > > > > > > > > know
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > reason.
> > > > > > > > > > > > > > > > > At this time, we should ignore it. (Maybe
> > it's
> > > > > judged
> > > > > > > by
> > > > > > > > the
> > > > > > > > > > > > author
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > case. But what about the old case whose
> > author
> > > > > can't
> > > > > > be
> > > > > > > > > > found?)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So, the ignored cases should be the block
> of
> > > the
> > > > > next
> > > > > > > > release
> > > > > > > > > > > > until
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > reason is found or the case is fixed?  We
> > need
> > > to
> > > > > > > ensure
> > > > > > > > that
> > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > take care of these cases, because there is
> no
> > > > > > deepening
> > > > > > > > of
> > > > > > > > > > > failed
> > > > > > > > > > > > > > > tests, no
> > > > > > > > > > > > > > > > > one may continue to pay attention to these
> > > cases.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think this guideline should consider
> these
> > > > > > > situations,
> > > > > > > > and
> > > > > > > > > > > show
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > solve them.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Jingsong
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > > > > > > imjark@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks to Xintong for bringing up this
> > topic,
> > > > I'm
> > > > > > +1
> > > > > > > in
> > > > > > > > > > > > general.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > However, I think it's still not very
> clear
> > > how
> > > > we
> > > > > > > > address
> > > > > > > > > > the
> > > > > > > > > > > > > > > unstable
> > > > > > > > > > > > > > > > > > tests.
> > > > > > > > > > > > > > > > > > I think this is a very important part of
> > this
> > > > new
> > > > > > > > > > guideline.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > According to the discussion above, if
> some
> > > > tests
> > > > > > are
> > > > > > > > > > > unstable,
> > > > > > > > > > > > we
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > > > > > > 1) Is the instability judged by the
> > committer
> > > > > > > > themselves or
> > > > > > > > > > > by
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > > > > 2) Should we log the disable commit in
> the
> > > > > > > > corresponding
> > > > > > > > > > > issue
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > increase
> > > > > > > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > > > > > > 3) What if nobody looks into this issue
> and
> > > > this
> > > > > > > > becomes
> > > > > > > > > > some
> > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > > > > > > 4) If no person is actively working on
> the
> > > > issue,
> > > > > > who
> > > > > > > > > > should
> > > > > > > > > > > > > > > re-enable
> > > > > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong
> Song
> > <
> > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'm also not convinced by the idea of
> > > having
> > > > an
> > > > > > > > exception
> > > > > > > > > > > for
> > > > > > > > > > > > > > local
> > > > > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > > > > We need to execute the entire build (or
> > at
> > > > > least
> > > > > > > the
> > > > > > > > > > > failing
> > > > > > > > > > > > > > stage)
> > > > > > > > > > > > > > > > > > > locally, to make sure subsequent test
> > cases
> > > > > > > > prevented by
> > > > > > > > > > > the
> > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > are all executed. In that case, it's
> > > probably
> > > > > > > easier
> > > > > > > > to
> > > > > > > > > > > rerun
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > build
> > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Concerning disabling unstable test
> cases
> > > that
> > > > > > > > regularly
> > > > > > > > > > > block
> > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > merging, maybe we can say that such
> cases
> > > can
> > > > > > only
> > > > > > > be
> > > > > > > > > > > > disabled
> > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > > > > is actively looking into it, likely the
> > > > person
> > > > > > who
> > > > > > > > > > disabled
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > this person is no longer actively
> working
> > > on
> > > > > it,
> > > > > > > > he/she
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > enable
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > case again no matter if it is fixed or
> > not.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1 to provide guidelines on handling
> test
> > > > > > failures.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 1. Report the test failures in the
> JIRA.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1 on this. Currently, the release
> > managers
> > > > are
> > > > > > > > > > monitoring
> > > > > > > > > > > > the
> > > > > > > > > > > > > ci
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > cron
> > > > > > > > > > > > > > > > > > > build instabilities and reporting them
> on
> > > > JIRA.
> > > > > > We
> > > > > > > > should
> > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > encourage
> > > > > > > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> > > cause
> > > > > and
> > > > > > > > solve
> > > > > > > > > > the
> > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > new created JIRA  because we could
> not
> > > > block
> > > > > > > other
> > > > > > > > > > commit
> > > > > > > > > > > > > > merges
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > > > significant
> > > > > > > > > > progress
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'm not sure about these two. It feels
> a
> > > bit
> > > > > > > against
> > > > > > > > the
> > > > > > > > > > > > > > voluntary
> > > > > > > > > > > > > > > > > nature
> > > > > > > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > IMHO, frequent instabilities are more
> > > likely
> > > > to
> > > > > > be
> > > > > > > > > > upgraded
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > > > > / blocker priority, receive more
> > attention
> > > > and
> > > > > > > > eventually
> > > > > > > > > > > get
> > > > > > > > > > > > > > > fixed.
> > > > > > > > > > > > > > > > > > > Release managers are also responsible
> for
> > > > > looking
> > > > > > > for
> > > > > > > > > > > > assignees
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > issues. If a case is still not fixed
> > > soonish,
> > > > > > even
> > > > > > > > with
> > > > > > > > > > all
> > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > > > > > I'm not sure how setting a deadline can
> > > help
> > > > > > this.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > > > temporarily,
> > > > > > > we
> > > > > > > > > > also
> > > > > > > > > > > > > need a
> > > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > > to ensure the issue would be
> continued
> > to
> > > > be
> > > > > > > > > > investigated
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1. As mentioned above, we may consider
> > > > > disabling
> > > > > > > > such
> > > > > > > > > > > tests
> > > > > > > > > > > > > iff
> > > > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING
> > ZHANG
> > > <
> > > > > > > > > > > > > beyond1920@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > > > > > > In order to better comply with the
> > rule,
> > > it
> > > > > is
> > > > > > > > > > necessary
> > > > > > > > > > > to
> > > > > > > > > > > > > > > describe
> > > > > > > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > > > > > > best practice if encountering test
> > > failure
> > > > > > which
> > > > > > > > seems
> > > > > > > > > > > > > > unrelated
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > > > > > > How to avoid merging PR with test
> > > failures
> > > > > and
> > > > > > > not
> > > > > > > > > > > blocking
> > > > > > > > > > > > > > code
> > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > > > > > > I tried to think about the possible
> > > steps,
> > > > > and
> > > > > > > > found
> > > > > > > > > > > there
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > detailed problems that need to be
> > > discussed
> > > > > in
> > > > > > a
> > > > > > > > step
> > > > > > > > > > > > > further:
> > > > > > > > > > > > > > > > > > > > 1. Report the test failures in the
> > JIRA.
> > > > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the
> root
> > > > cause
> > > > > > and
> > > > > > > > solve
> > > > > > > > > > > the
> > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > new created JIRA  because we could
> not
> > > > block
> > > > > > > other
> > > > > > > > > > commit
> > > > > > > > > > > > > > merges
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > >     When is a reasonable deadline
> here?
> > > > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not
> made
> > > > > > > significant
> > > > > > > > > > > progress
> > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > > > > >     There are several situations as
> > > > follows,
> > > > > > > maybe
> > > > > > > > > > > > different
> > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > > > > > > >     3. not found a good solution, but
> > > > already
> > > > > > > > found the
> > > > > > > > > > > > root
> > > > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > > >     4. found a solution, but it needs
> > > more
> > > > > time
> > > > > > > to
> > > > > > > > be
> > > > > > > > > > > done.
> > > > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > > > > temporarily,
> > > > > > > > we
> > > > > > > > > > > also
> > > > > > > > > > > > > > need a
> > > > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > > to ensure the issue would be
> continued
> > to
> > > > be
> > > > > > > > > > investigated
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
> > > > > 于2021年6月23日周三
> > > > > > > > > > 下午8:16写道：
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM
> Till
> > > > > > Rohrmann <
> > > > > > > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I would first try to not
> introduce
> > > the
> > > > > > > > exception
> > > > > > > > > > for
> > > > > > > > > > > > > local
> > > > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > > > it quite hard for others to
> verify
> > > the
> > > > > > build
> > > > > > > > and to
> > > > > > > > > > > > make
> > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > right things were executed. If we
> > see
> > > > > that
> > > > > > > this
> > > > > > > > > > > becomes
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM
> > > Yangze
> > > > > Guo
> > > > > > <
> > > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > +1 for appending this to
> > community
> > > > > > > > guidelines for
> > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > > > > > > I agree that with this approach
> > > > > unstable
> > > > > > > > tests
> > > > > > > > > > will
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > commit merges. However, it
> might
> > be
> > > > > hard
> > > > > > to
> > > > > > > > > > prevent
> > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > > > > > > that are related to those tests
> > and
> > > > > > should
> > > > > > > > have
> > > > > > > > > > > been
> > > > > > > > > > > > > > passed
> > > > > > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > > > > > true that this judgment can be
> > made
> > > > by
> > > > > > the
> > > > > > > > > > > > committers,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > ensure the judgment is always
> > > precise
> > > > > and
> > > > > > > so
> > > > > > > > that
> > > > > > > > > > > we
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Regarding the unstable tests,
> how
> > > > about
> > > > > > > > adding
> > > > > > > > > > > > another
> > > > > > > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > > > > > > committers verify it in their
> > local
> > > > > > > > environment
> > > > > > > > > > and
> > > > > > > > > > > > > > > comment in
> > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM
> > > 刘建刚 <
> > > > > > > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It is a good principle to run
> > all
> > > > > tests
> > > > > > > > > > > > successfully
> > > > > > > > > > > > > > > with any
> > > > > > > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > > > means a lot for project's
> > > stability
> > > > > and
> > > > > > > > > > > > development.
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > am big
> > > > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > > > > 于2021年6月22日周二
> > > > > > > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > One way to address the
> > problem
> > > of
> > > > > > > > regularly
> > > > > > > > > > > > failing
> > > > > > > > > > > > > > > tests
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > > > > > > merging of PRs is to
> disable
> > > the
> > > > > > > > respective
> > > > > > > > > > > tests
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > > > > > > course, the failing test
> then
> > > > needs
> > > > > > to
> > > > > > > be
> > > > > > > > > > > fixed.
> > > > > > > > > > > > > But
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > > would not block everyone
> from
> > > > > making
> > > > > > > > > > progress.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> 12:00
> > > PM
> > > > > > Arvid
> > > > > > > > Heise
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > I think this is overall a
> > > good
> > > > > > idea.
> > > > > > > > So +1
> > > > > > > > > > > from
> > > > > > > > > > > > > my
> > > > > > > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > > > > > > However, I'd like to put
> a
> > > > higher
> > > > > > > > priority
> > > > > > > > > > on
> > > > > > > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > particular docker
> > > > image/artifact
> > > > > > > > caches.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> > 11:50
> > > > AM
> > > > > > Till
> > > > > > > > > > > Rohrmann
> > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing
> this
> > > > topic
> > > > > to
> > > > > > > our
> > > > > > > > > > > > attention
> > > > > > > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of
> > > sense
> > > > > and
> > > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > > > > follow
> > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > > > > > > confidence that our
> > changes
> > > > are
> > > > > > > > working
> > > > > > > > > > and
> > > > > > > > > > > > it
> > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > quickly fix build
> > > > > instabilities.
> > > > > > > > Hence,
> > > > > > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> > > 11:12
> > > > > AM
> > > > > > > > Xintong
> > > > > > > > > > > > Song <
> > > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > In the past a couple
> of
> > > > > weeks,
> > > > > > > I've
> > > > > > > > > > > > observed
> > > > > > > > > > > > > > > several
> > > > > > > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > > > merged without a
> green
> > > > light
> > > > > > from
> > > > > > > > the
> > > > > > > > > > CI
> > > > > > > > > > > > > tests,
> > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > > > considered
> *unrelated*.
> > > > This
> > > > > > may
> > > > > > > > not
> > > > > > > > > > > always
> > > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > > > increase the chance
> of
> > > > > breaking
> > > > > > > our
> > > > > > > > > > code
> > > > > > > > > > > > > base.
> > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > > fact,
> > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > me twice in the past
> > few
> > > > > weeks
> > > > > > > > that I
> > > > > > > > > > had
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > revert a
> > > > > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the master branch due
> > to
> > > > > this.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I think it would be
> > nicer
> > > > to
> > > > > > > > enforce a
> > > > > > > > > > > > > stricter
> > > > > > > > > > > > > > > rule,
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > merged without
> passing
> > > CI.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > The problems of
> merging
> > > PRs
> > > > > > with
> > > > > > > > > > > > "unrelated"
> > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - It's not always
> > > > > > straightforward
> > > > > > > > to
> > > > > > > > > > tell
> > > > > > > > > > > > > > > whether a
> > > > > > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - It prevents
> > subsequent
> > > > test
> > > > > > > cases
> > > > > > > > > > from
> > > > > > > > > > > > > being
> > > > > > > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > > > > > > relating to the PR
> > > changes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > To make things easier
> > for
> > > > the
> > > > > > > > > > committers,
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > considered
> acceptable.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed
> CI
> > in
> > > > the
> > > > > > > > > > > contributor's
> > > > > > > > > > > > > > > personal
> > > > > > > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the link in such
> cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have
> > been
> > > > > > > triggered
> > > > > > > > > > > multiple
> > > > > > > > > > > > > > > times, on
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > each stage has at
> least
> > > > > passed
> > > > > > > for
> > > > > > > > > > once.
> > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > If we all agree on
> > this,
> > > > I'd
> > > > > > > > update the
> > > > > > > > > > > > > > community
> > > > > > > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this
> proposal.
> > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know
> what
> > > do
> > > > > you
> > > > > > > > think.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best, Jingsong Lee
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

I think you are right Tison that docs are a special case and they only
require flink-docs to pass. What I am wondering is how much of a problem
this will be (assuming that we have a decent build stability). The more
exceptions we add, the harder it will be to properly follow the guidelines.
Maybe we can observe how many docs PRs get delayed/not merged because of
this and then revisit this discussion if needed.

Cheers,
Till

On Wed, Jun 30, 2021 at 8:30 AM tison <wa...@gmail.com> wrote:

> Hi,
>
> There are a number of PRs modifying docs only, but we still require all
> tests passed on that.
>
> It is a good proposal we avoid merge PR with "unrelated" failure, but can
> we improve the case where the contributor only works for docs?
>
> For example, base on the file change set, run doc tests only.
>
> Best,
> tison.
>
>
> godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：
>
> > +1 for the proposal. Thanks Xintong!
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
> >
> > > Thanks Xintong. +1 to the proposal.
> > >
> > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com>
> wrote:
> > >
> > > > +1 for the proposal. Since the test time is long and environment may
> > > vary,
> > > > unstable tests are really annoying for developers. The solution is
> > > welcome.
> > > >
> > > > Best
> > > > liujiangang
> > > >
> > > > Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
> > > >
> > > > > +1 Thanks Xintong for the update!
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> trohrmann@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Thanks Xintong for drafting this doc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> beyond1920@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > >
> > > > > > > > The best practice for handling test failure is very detailed,
> > > it's
> > > > a
> > > > > > good
> > > > > > > > guidelines document with clear action steps.
> > > > > > > >
> > > > > > > > +1 to Xintong's proposal.
> > > > > > > >
> > > > > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > > > > > > >
> > > > > > > > > Thanks all for the discussion.
> > > > > > > > >
> > > > > > > > > Based on the opinions so far, I've drafted the new
> guidelines
> > > > [1],
> > > > > > as a
> > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > >
> > > > > > > > > Hopefully this draft has covered the most opinions
> discussed
> > > and
> > > > > > > consensus
> > > > > > > > > made in this discussion thread.
> > > > > > > > >
> > > > > > > > > Looking forward to your feedback.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > pnowojski@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > written.
> > > > > > > > > >
> > > > > > > > > > Piotrek
> > > > > > > > > >
> > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > trohrmann@apache.org
> > > >
> > > > > > > > > napisał(a):
> > > > > > > > > >
> > > > > > > > > > > One quick note for clarification. I don't have anything
> > > > against
> > > > > > > builds
> > > > > > > > > > > running on your personal Azure account and this is not
> > > what I
> > > > > > > > > understood
> > > > > > > > > > > under "local environment". For me "local environment"
> > means
> > > > > that
> > > > > > > > > someone
> > > > > > > > > > > runs the test locally on his machine and then says that
> > the
> > > > > > > > > > > tests have passed locally.
> > > > > > > > > > >
> > > > > > > > > > > I do agree that there might be a conflict of interests
> > if a
> > > > PR
> > > > > > > author
> > > > > > > > > > > disables tests. Here I would argue that we don't have
> > > > malignant
> > > > > > > > > > committers
> > > > > > > > > > > which means that every committer will probably first
> > check
> > > > the
> > > > > > > > > respective
> > > > > > > > > > > ticket for how often the test failed. Then I guess the
> > next
> > > > > step
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > to discuss on the ticket whether to disable it or not.
> > And
> > > > > > finally,
> > > > > > > > > after
> > > > > > > > > > > reaching a consensus, it will be disabled. If we see
> > > someone
> > > > > > > abusing
> > > > > > > > > this
> > > > > > > > > > > policy, then we can still think about how to guard
> > against
> > > > it.
> > > > > > But,
> > > > > > > > > > > honestly, I have very rarely seen such a case. I am
> also
> > ok
> > > > to
> > > > > > > pull in
> > > > > > > > > > the
> > > > > > > > > > > release manager to make the final call if this resolves
> > > > > concerns.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > > > pnowojski@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 for the general idea, however I have concerns
> about
> > a
> > > > > couple
> > > > > > > of
> > > > > > > > > > > details.
> > > > > > > > > > > >
> > > > > > > > > > > > > I would first try to not introduce the exception
> for
> > > > local
> > > > > > > builds.
> > > > > > > > > > > > > It makes it quite hard for others to verify the
> build
> > > and
> > > > > to
> > > > > > > make
> > > > > > > > > > sure
> > > > > > > > > > > > that the right things were executed.
> > > > > > > > > > > >
> > > > > > > > > > > > I would counter Till's proposal to ignore local green
> > > > builds.
> > > > > > If
> > > > > > > > > > > committer
> > > > > > > > > > > > is merging and closing a PR, with official azure
> > failure,
> > > > but
> > > > > > > there
> > > > > > > > > > was a
> > > > > > > > > > > > green build before or in local azure it's IMO enough
> to
> > > > leave
> > > > > > the
> > > > > > > > > > > message:
> > > > > > > > > > > >
> > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > > > > > Green local build: URL
> > > > > > > > > > > >
> > > > > > > > > > > > This should address Till's concern about
> verification.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand I have concerns about disabling
> > tests.*
> > > > It
> > > > > > > > > shouldn't
> > > > > > > > > > be
> > > > > > > > > > > > the PR author/committer that's disabling a test on
> his
> > > own,
> > > > > as
> > > > > > > > > that's a
> > > > > > > > > > > > conflict of interests*. I have however no problems
> with
> > > > > > disabling
> > > > > > > > > test
> > > > > > > > > > > > instabilities that were marked as "blockers" though,
> > that
> > > > > > should
> > > > > > > work
> > > > > > > > > > > > pretty well. But the important thing here is to
> > correctly
> > > > > judge
> > > > > > > > > bumping
> > > > > > > > > > > > priorities of test instabilities based on their
> > frequency
> > > > and
> > > > > > > current
> > > > > > > > > > > > general health of the system. I believe that release
> > > > managers
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > > > playing a big role here in deciding on the guidelines
> > of
> > > > what
> > > > > > > should
> > > > > > > > > > be a
> > > > > > > > > > > > priority of certain test instabilities.
> > > > > > > > > > > >
> > > > > > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > > > > > 1. if we have a handful of very frequently failing
> > tests
> > > > and
> > > > > a
> > > > > > > > > handful
> > > > > > > > > > of
> > > > > > > > > > > > very rarely failing tests (like one reported failure
> > and
> > > no
> > > > > > > another
> > > > > > > > > > > > occurrence in many months, and let's even say that
> the
> > > > > failure
> > > > > > > looks
> > > > > > > > > > like
> > > > > > > > > > > > infrastructure/network timeout), we should focus on
> the
> > > > > > > frequently
> > > > > > > > > > > failing
> > > > > > > > > > > > ones, and probably we are safe to ignore for the time
> > > being
> > > > > the
> > > > > > > rare
> > > > > > > > > > > issues
> > > > > > > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > > > > > > 2. If we have tons of rarely failing test
> > instabilities,
> > > we
> > > > > > > should
> > > > > > > > > > > probably
> > > > > > > > > > > > start addressing them as blockers.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm using my own conscious and my best judgement when
> > I'm
> > > > > > > > > > > > bumping/decreasing priorities of test instabilities
> > (and
> > > > > bugs),
> > > > > > > but
> > > > > > > > > as
> > > > > > > > > > > > individual committer I don't have the full picture.
> As
> > I
> > > > > wrote
> > > > > > > > > above, I
> > > > > > > > > > > > think release managers are in a much better position
> to
> > > > keep
> > > > > > > > > adjusting
> > > > > > > > > > > > those kind of guidelines.
> > > > > > > > > > > >
> > > > > > > > > > > > Best, Piotrek
> > > > > > > > > > > >
> > > > > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> > > > > napisał(a):
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For me, resolving problems directly (fixing the
> > > > > > infrastructure
> > > > > > > > > issue,
> > > > > > > > > > > > > disabling unstable tests and creating blocker JIRAs
> > to
> > > > > track
> > > > > > > the
> > > > > > > > > fix
> > > > > > > > > > > and
> > > > > > > > > > > > > re-enable them asap, etc.) is (in most cases)
> better
> > > than
> > > > > > > working
> > > > > > > > > > > around
> > > > > > > > > > > > > them (verify locally, manually check and judge the
> > > > failure
> > > > > as
> > > > > > > > > > > > "unrelated",
> > > > > > > > > > > > > etc.), and I believe the proposal could help us
> > pushing
> > > > > those
> > > > > > > more
> > > > > > > > > > > "real"
> > > > > > > > > > > > > solutions forward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best Regards,
> > > > > > > > > > > > > Yu
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> > > > > karmagyz@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Creating a blocker issue for the manually
> disabled
> > > > tests
> > > > > > > sounds
> > > > > > > > > > good
> > > > > > > > > > > to
> > > > > > > > > > > > > me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Minor: I'm still a bit worried about the commits
> > > merged
> > > > > > > before we
> > > > > > > > > > fix
> > > > > > > > > > > > > > the unstable tests can also break those tests.
> > > Instead
> > > > of
> > > > > > > letting
> > > > > > > > > > the
> > > > > > > > > > > > > > assigners keep a look at all potentially related
> > > > commits,
> > > > > > > they
> > > > > > > > > can
> > > > > > > > > > > > > > maintain a branch that is periodically synced
> with
> > > the
> > > > > > master
> > > > > > > > > > branch
> > > > > > > > > > > > > > while enabling the unstable test. So that they
> can
> > > > catch
> > > > > > the
> > > > > > > > > > breaking
> > > > > > > > > > > > > > changes asap.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I like the idea of creating a blocker issue
> for a
> > > > > > disabled
> > > > > > > > > test.
> > > > > > > > > > > This
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > force us to resolve it in a timely manner and
> it
> > > > won't
> > > > > > fall
> > > > > > > > > > through
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > cracks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I also have some concerns about unstable
> cases.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think unstable cases can be divided into
> > these
> > > > > types:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Force majeure: For example, network
> timeout,
> > > > sudden
> > > > > > > > > > > environmental
> > > > > > > > > > > > > > > > collapse, they are accidental and can always
> be
> > > > > solved
> > > > > > by
> > > > > > > > > > > > triggering
> > > > > > > > > > > > > > azure
> > > > > > > > > > > > > > > > again. Committers should wait for the next
> > green
> > > > > azure.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Obvious mistakes: For example, some errors
> > > caused
> > > > > by
> > > > > > > > > obvious
> > > > > > > > > > > > > reasons
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > be repaired quickly. At this time, do we need
> > to
> > > > > wait,
> > > > > > > or not
> > > > > > > > > > > wait
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > ignore?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Difficult questions: These problems are
> very
> > > > > > difficult
> > > > > > > to
> > > > > > > > > > find.
> > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > will be no solution for a while and a half.
> We
> > > > don't
> > > > > > even
> > > > > > > > > know
> > > > > > > > > > > the
> > > > > > > > > > > > > > reason.
> > > > > > > > > > > > > > > > At this time, we should ignore it. (Maybe
> it's
> > > > judged
> > > > > > by
> > > > > > > the
> > > > > > > > > > > author
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > case. But what about the old case whose
> author
> > > > can't
> > > > > be
> > > > > > > > > found?)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So, the ignored cases should be the block of
> > the
> > > > next
> > > > > > > release
> > > > > > > > > > > until
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > reason is found or the case is fixed?  We
> need
> > to
> > > > > > ensure
> > > > > > > that
> > > > > > > > > > > > someone
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > take care of these cases, because there is no
> > > > > deepening
> > > > > > > of
> > > > > > > > > > failed
> > > > > > > > > > > > > > tests, no
> > > > > > > > > > > > > > > > one may continue to pay attention to these
> > cases.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think this guideline should consider these
> > > > > > situations,
> > > > > > > and
> > > > > > > > > > show
> > > > > > > > > > > > how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > solve them.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Jingsong
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > > > > > imjark@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks to Xintong for bringing up this
> topic,
> > > I'm
> > > > > +1
> > > > > > in
> > > > > > > > > > > general.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > However, I think it's still not very clear
> > how
> > > we
> > > > > > > address
> > > > > > > > > the
> > > > > > > > > > > > > > unstable
> > > > > > > > > > > > > > > > > tests.
> > > > > > > > > > > > > > > > > I think this is a very important part of
> this
> > > new
> > > > > > > > > guideline.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > According to the discussion above, if some
> > > tests
> > > > > are
> > > > > > > > > > unstable,
> > > > > > > > > > > we
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > > > > > 1) Is the instability judged by the
> committer
> > > > > > > themselves or
> > > > > > > > > > by
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > > > 2) Should we log the disable commit in the
> > > > > > > corresponding
> > > > > > > > > > issue
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > increase
> > > > > > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > > > > > 3) What if nobody looks into this issue and
> > > this
> > > > > > > becomes
> > > > > > > > > some
> > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > > > > > 4) If no person is actively working on the
> > > issue,
> > > > > who
> > > > > > > > > should
> > > > > > > > > > > > > > re-enable
> > > > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song
> <
> > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm also not convinced by the idea of
> > having
> > > an
> > > > > > > exception
> > > > > > > > > > for
> > > > > > > > > > > > > local
> > > > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > > > We need to execute the entire build (or
> at
> > > > least
> > > > > > the
> > > > > > > > > > failing
> > > > > > > > > > > > > stage)
> > > > > > > > > > > > > > > > > > locally, to make sure subsequent test
> cases
> > > > > > > prevented by
> > > > > > > > > > the
> > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > are all executed. In that case, it's
> > probably
> > > > > > easier
> > > > > > > to
> > > > > > > > > > rerun
> > > > > > > > > > > > the
> > > > > > > > > > > > > > build
> > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Concerning disabling unstable test cases
> > that
> > > > > > > regularly
> > > > > > > > > > block
> > > > > > > > > > > > PRs
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > merging, maybe we can say that such cases
> > can
> > > > > only
> > > > > > be
> > > > > > > > > > > disabled
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > > > is actively looking into it, likely the
> > > person
> > > > > who
> > > > > > > > > disabled
> > > > > > > > > > > the
> > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > this person is no longer actively working
> > on
> > > > it,
> > > > > > > he/she
> > > > > > > > > > > should
> > > > > > > > > > > > > > enable
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > case again no matter if it is fixed or
> not.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 to provide guidelines on handling test
> > > > > failures.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 on this. Currently, the release
> managers
> > > are
> > > > > > > > > monitoring
> > > > > > > > > > > the
> > > > > > > > > > > > ci
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > cron
> > > > > > > > > > > > > > > > > > build instabilities and reporting them on
> > > JIRA.
> > > > > We
> > > > > > > should
> > > > > > > > > > > also
> > > > > > > > > > > > > > > > encourage
> > > > > > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> > cause
> > > > and
> > > > > > > solve
> > > > > > > > > the
> > > > > > > > > > > > > failure
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > new created JIRA  because we could not
> > > block
> > > > > > other
> > > > > > > > > commit
> > > > > > > > > > > > > merges
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > > significant
> > > > > > > > > progress
> > > > > > > > > > > > when
> > > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm not sure about these two. It feels a
> > bit
> > > > > > against
> > > > > > > the
> > > > > > > > > > > > > voluntary
> > > > > > > > > > > > > > > > nature
> > > > > > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > IMHO, frequent instabilities are more
> > likely
> > > to
> > > > > be
> > > > > > > > > upgraded
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > > > / blocker priority, receive more
> attention
> > > and
> > > > > > > eventually
> > > > > > > > > > get
> > > > > > > > > > > > > > fixed.
> > > > > > > > > > > > > > > > > > Release managers are also responsible for
> > > > looking
> > > > > > for
> > > > > > > > > > > assignees
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > issues. If a case is still not fixed
> > soonish,
> > > > > even
> > > > > > > with
> > > > > > > > > all
> > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > > > > I'm not sure how setting a deadline can
> > help
> > > > > this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > > temporarily,
> > > > > > we
> > > > > > > > > also
> > > > > > > > > > > > need a
> > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > to ensure the issue would be continued
> to
> > > be
> > > > > > > > > investigated
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1. As mentioned above, we may consider
> > > > disabling
> > > > > > > such
> > > > > > > > > > tests
> > > > > > > > > > > > iff
> > > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING
> ZHANG
> > <
> > > > > > > > > > > > beyond1920@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > > > > > In order to better comply with the
> rule,
> > it
> > > > is
> > > > > > > > > necessary
> > > > > > > > > > to
> > > > > > > > > > > > > > describe
> > > > > > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > > > > > best practice if encountering test
> > failure
> > > > > which
> > > > > > > seems
> > > > > > > > > > > > > unrelated
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > > > > > How to avoid merging PR with test
> > failures
> > > > and
> > > > > > not
> > > > > > > > > > blocking
> > > > > > > > > > > > > code
> > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > > > > > I tried to think about the possible
> > steps,
> > > > and
> > > > > > > found
> > > > > > > > > > there
> > > > > > > > > > > > are
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > detailed problems that need to be
> > discussed
> > > > in
> > > > > a
> > > > > > > step
> > > > > > > > > > > > further:
> > > > > > > > > > > > > > > > > > > 1. Report the test failures in the
> JIRA.
> > > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> > > cause
> > > > > and
> > > > > > > solve
> > > > > > > > > > the
> > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > new created JIRA  because we could not
> > > block
> > > > > > other
> > > > > > > > > commit
> > > > > > > > > > > > > merges
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > > > significant
> > > > > > > > > > progress
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > > > >     There are several situations as
> > > follows,
> > > > > > maybe
> > > > > > > > > > > different
> > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > > > > > >     3. not found a good solution, but
> > > already
> > > > > > > found the
> > > > > > > > > > > root
> > > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > >     4. found a solution, but it needs
> > more
> > > > time
> > > > > > to
> > > > > > > be
> > > > > > > > > > done.
> > > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > > > temporarily,
> > > > > > > we
> > > > > > > > > > also
> > > > > > > > > > > > > need a
> > > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > to ensure the issue would be continued
> to
> > > be
> > > > > > > > > investigated
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
> > > > 于2021年6月23日周三
> > > > > > > > > 下午8:16写道：
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till
> > > > > Rohrmann <
> > > > > > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I would first try to not introduce
> > the
> > > > > > > exception
> > > > > > > > > for
> > > > > > > > > > > > local
> > > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > > it quite hard for others to verify
> > the
> > > > > build
> > > > > > > and to
> > > > > > > > > > > make
> > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > right things were executed. If we
> see
> > > > that
> > > > > > this
> > > > > > > > > > becomes
> > > > > > > > > > > > an
> > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM
> > Yangze
> > > > Guo
> > > > > <
> > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > +1 for appending this to
> community
> > > > > > > guidelines for
> > > > > > > > > > > > merging
> > > > > > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > > > > > I agree that with this approach
> > > > unstable
> > > > > > > tests
> > > > > > > > > will
> > > > > > > > > > > not
> > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > commit merges. However, it might
> be
> > > > hard
> > > > > to
> > > > > > > > > prevent
> > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > > > > > that are related to those tests
> and
> > > > > should
> > > > > > > have
> > > > > > > > > > been
> > > > > > > > > > > > > passed
> > > > > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > > > > true that this judgment can be
> made
> > > by
> > > > > the
> > > > > > > > > > > committers,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > ensure the judgment is always
> > precise
> > > > and
> > > > > > so
> > > > > > > that
> > > > > > > > > > we
> > > > > > > > > > > > have
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Regarding the unstable tests, how
> > > about
> > > > > > > adding
> > > > > > > > > > > another
> > > > > > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > > > > > committers verify it in their
> local
> > > > > > > environment
> > > > > > > > > and
> > > > > > > > > > > > > > comment in
> > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM
> > 刘建刚 <
> > > > > > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > It is a good principle to run
> all
> > > > tests
> > > > > > > > > > > successfully
> > > > > > > > > > > > > > with any
> > > > > > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > > means a lot for project's
> > stability
> > > > and
> > > > > > > > > > > development.
> > > > > > > > > > > > I
> > > > > > > > > > > > > > am big
> > > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > > > > 于2021年6月22日周二
> > > > > > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > One way to address the
> problem
> > of
> > > > > > > regularly
> > > > > > > > > > > failing
> > > > > > > > > > > > > > tests
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > > > > > merging of PRs is to disable
> > the
> > > > > > > respective
> > > > > > > > > > tests
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > > > > > course, the failing test then
> > > needs
> > > > > to
> > > > > > be
> > > > > > > > > > fixed.
> > > > > > > > > > > > But
> > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > would not block everyone from
> > > > making
> > > > > > > > > progress.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00
> > PM
> > > > > Arvid
> > > > > > > Heise
> > > > > > > > > <
> > > > > > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I think this is overall a
> > good
> > > > > idea.
> > > > > > > So +1
> > > > > > > > > > from
> > > > > > > > > > > > my
> > > > > > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > > > > > However, I'd like to put a
> > > higher
> > > > > > > priority
> > > > > > > > > on
> > > > > > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > particular docker
> > > image/artifact
> > > > > > > caches.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> 11:50
> > > AM
> > > > > Till
> > > > > > > > > > Rohrmann
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing this
> > > topic
> > > > to
> > > > > > our
> > > > > > > > > > > attention
> > > > > > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of
> > sense
> > > > and
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > > > follow
> > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > > > > > confidence that our
> changes
> > > are
> > > > > > > working
> > > > > > > > > and
> > > > > > > > > > > it
> > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > quickly fix build
> > > > instabilities.
> > > > > > > Hence,
> > > > > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> > 11:12
> > > > AM
> > > > > > > Xintong
> > > > > > > > > > > Song <
> > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > In the past a couple of
> > > > weeks,
> > > > > > I've
> > > > > > > > > > > observed
> > > > > > > > > > > > > > several
> > > > > > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > > merged without a green
> > > light
> > > > > from
> > > > > > > the
> > > > > > > > > CI
> > > > > > > > > > > > tests,
> > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > > considered *unrelated*.
> > > This
> > > > > may
> > > > > > > not
> > > > > > > > > > always
> > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > > increase the chance of
> > > > breaking
> > > > > > our
> > > > > > > > > code
> > > > > > > > > > > > base.
> > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > fact,
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > me twice in the past
> few
> > > > weeks
> > > > > > > that I
> > > > > > > > > had
> > > > > > > > > > > to
> > > > > > > > > > > > > > revert a
> > > > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > > > > > the master branch due
> to
> > > > this.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > I think it would be
> nicer
> > > to
> > > > > > > enforce a
> > > > > > > > > > > > stricter
> > > > > > > > > > > > > > rule,
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > > merged without passing
> > CI.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > The problems of merging
> > PRs
> > > > > with
> > > > > > > > > > > "unrelated"
> > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > > > > > - It's not always
> > > > > straightforward
> > > > > > > to
> > > > > > > > > tell
> > > > > > > > > > > > > > whether a
> > > > > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - It prevents
> subsequent
> > > test
> > > > > > cases
> > > > > > > > > from
> > > > > > > > > > > > being
> > > > > > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > > > > > relating to the PR
> > changes.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > To make things easier
> for
> > > the
> > > > > > > > > committers,
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI
> in
> > > the
> > > > > > > > > > contributor's
> > > > > > > > > > > > > > personal
> > > > > > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have
> been
> > > > > > triggered
> > > > > > > > > > multiple
> > > > > > > > > > > > > > times, on
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > each stage has at least
> > > > passed
> > > > > > for
> > > > > > > > > once.
> > > > > > > > > > > > Please
> > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > If we all agree on
> this,
> > > I'd
> > > > > > > update the
> > > > > > > > > > > > > community
> > > > > > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal.
> > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know what
> > do
> > > > you
> > > > > > > think.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best, Jingsong Lee
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by tison <wa...@gmail.com>.

Hi,

There are a number of PRs modifying docs only, but we still require all
tests passed on that.

It is a good proposal we avoid merge PR with "unrelated" failure, but can
we improve the case where the contributor only works for docs?

For example, base on the file change set, run doc tests only.

Best,
tison.


godfrey he <go...@gmail.com> 于2021年6月30日周三 下午2:17写道：

> +1 for the proposal. Thanks Xintong!
>
> Best,
> Godfrey
>
>
>
> Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：
>
> > Thanks Xintong. +1 to the proposal.
> >
> > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com> wrote:
> >
> > > +1 for the proposal. Since the test time is long and environment may
> > vary,
> > > unstable tests are really annoying for developers. The solution is
> > welcome.
> > >
> > > Best
> > > liujiangang
> > >
> > > Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
> > >
> > > > +1 Thanks Xintong for the update!
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > >
> > > > > +1, thanks for updating the guidelines Xintong!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks Xintong for drafting this doc.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <beyond1920@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > >
> > > > > > > The best practice for handling test failure is very detailed,
> > it's
> > > a
> > > > > good
> > > > > > > guidelines document with clear action steps.
> > > > > > >
> > > > > > > +1 to Xintong's proposal.
> > > > > > >
> > > > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > > > > > >
> > > > > > > > Thanks all for the discussion.
> > > > > > > >
> > > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > > [1],
> > > > > as a
> > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > >
> > > > > > > > Hopefully this draft has covered the most opinions discussed
> > and
> > > > > > consensus
> > > > > > > > made in this discussion thread.
> > > > > > > >
> > > > > > > > Looking forward to your feedback.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > pnowojski@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > written.
> > > > > > > > >
> > > > > > > > > Piotrek
> > > > > > > > >
> > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> trohrmann@apache.org
> > >
> > > > > > > > napisał(a):
> > > > > > > > >
> > > > > > > > > > One quick note for clarification. I don't have anything
> > > against
> > > > > > builds
> > > > > > > > > > running on your personal Azure account and this is not
> > what I
> > > > > > > > understood
> > > > > > > > > > under "local environment". For me "local environment"
> means
> > > > that
> > > > > > > > someone
> > > > > > > > > > runs the test locally on his machine and then says that
> the
> > > > > > > > > > tests have passed locally.
> > > > > > > > > >
> > > > > > > > > > I do agree that there might be a conflict of interests
> if a
> > > PR
> > > > > > author
> > > > > > > > > > disables tests. Here I would argue that we don't have
> > > malignant
> > > > > > > > > committers
> > > > > > > > > > which means that every committer will probably first
> check
> > > the
> > > > > > > > respective
> > > > > > > > > > ticket for how often the test failed. Then I guess the
> next
> > > > step
> > > > > > would
> > > > > > > > be
> > > > > > > > > > to discuss on the ticket whether to disable it or not.
> And
> > > > > finally,
> > > > > > > > after
> > > > > > > > > > reaching a consensus, it will be disabled. If we see
> > someone
> > > > > > abusing
> > > > > > > > this
> > > > > > > > > > policy, then we can still think about how to guard
> against
> > > it.
> > > > > But,
> > > > > > > > > > honestly, I have very rarely seen such a case. I am also
> ok
> > > to
> > > > > > pull in
> > > > > > > > > the
> > > > > > > > > > release manager to make the final call if this resolves
> > > > concerns.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > > pnowojski@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 for the general idea, however I have concerns about
> a
> > > > couple
> > > > > > of
> > > > > > > > > > details.
> > > > > > > > > > >
> > > > > > > > > > > > I would first try to not introduce the exception for
> > > local
> > > > > > builds.
> > > > > > > > > > > > It makes it quite hard for others to verify the build
> > and
> > > > to
> > > > > > make
> > > > > > > > > sure
> > > > > > > > > > > that the right things were executed.
> > > > > > > > > > >
> > > > > > > > > > > I would counter Till's proposal to ignore local green
> > > builds.
> > > > > If
> > > > > > > > > > committer
> > > > > > > > > > > is merging and closing a PR, with official azure
> failure,
> > > but
> > > > > > there
> > > > > > > > > was a
> > > > > > > > > > > green build before or in local azure it's IMO enough to
> > > leave
> > > > > the
> > > > > > > > > > message:
> > > > > > > > > > >
> > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > > > > Green local build: URL
> > > > > > > > > > >
> > > > > > > > > > > This should address Till's concern about verification.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand I have concerns about disabling
> tests.*
> > > It
> > > > > > > > shouldn't
> > > > > > > > > be
> > > > > > > > > > > the PR author/committer that's disabling a test on his
> > own,
> > > > as
> > > > > > > > that's a
> > > > > > > > > > > conflict of interests*. I have however no problems with
> > > > > disabling
> > > > > > > > test
> > > > > > > > > > > instabilities that were marked as "blockers" though,
> that
> > > > > should
> > > > > > work
> > > > > > > > > > > pretty well. But the important thing here is to
> correctly
> > > > judge
> > > > > > > > bumping
> > > > > > > > > > > priorities of test instabilities based on their
> frequency
> > > and
> > > > > > current
> > > > > > > > > > > general health of the system. I believe that release
> > > managers
> > > > > > should
> > > > > > > > be
> > > > > > > > > > > playing a big role here in deciding on the guidelines
> of
> > > what
> > > > > > should
> > > > > > > > > be a
> > > > > > > > > > > priority of certain test instabilities.
> > > > > > > > > > >
> > > > > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > > > > 1. if we have a handful of very frequently failing
> tests
> > > and
> > > > a
> > > > > > > > handful
> > > > > > > > > of
> > > > > > > > > > > very rarely failing tests (like one reported failure
> and
> > no
> > > > > > another
> > > > > > > > > > > occurrence in many months, and let's even say that the
> > > > failure
> > > > > > looks
> > > > > > > > > like
> > > > > > > > > > > infrastructure/network timeout), we should focus on the
> > > > > > frequently
> > > > > > > > > > failing
> > > > > > > > > > > ones, and probably we are safe to ignore for the time
> > being
> > > > the
> > > > > > rare
> > > > > > > > > > issues
> > > > > > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > > > > > 2. If we have tons of rarely failing test
> instabilities,
> > we
> > > > > > should
> > > > > > > > > > probably
> > > > > > > > > > > start addressing them as blockers.
> > > > > > > > > > >
> > > > > > > > > > > I'm using my own conscious and my best judgement when
> I'm
> > > > > > > > > > > bumping/decreasing priorities of test instabilities
> (and
> > > > bugs),
> > > > > > but
> > > > > > > > as
> > > > > > > > > > > individual committer I don't have the full picture. As
> I
> > > > wrote
> > > > > > > > above, I
> > > > > > > > > > > think release managers are in a much better position to
> > > keep
> > > > > > > > adjusting
> > > > > > > > > > > those kind of guidelines.
> > > > > > > > > > >
> > > > > > > > > > > Best, Piotrek
> > > > > > > > > > >
> > > > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> > > > napisał(a):
> > > > > > > > > > >
> > > > > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > > > > >
> > > > > > > > > > > > For me, resolving problems directly (fixing the
> > > > > infrastructure
> > > > > > > > issue,
> > > > > > > > > > > > disabling unstable tests and creating blocker JIRAs
> to
> > > > track
> > > > > > the
> > > > > > > > fix
> > > > > > > > > > and
> > > > > > > > > > > > re-enable them asap, etc.) is (in most cases) better
> > than
> > > > > > working
> > > > > > > > > > around
> > > > > > > > > > > > them (verify locally, manually check and judge the
> > > failure
> > > > as
> > > > > > > > > > > "unrelated",
> > > > > > > > > > > > etc.), and I believe the proposal could help us
> pushing
> > > > those
> > > > > > more
> > > > > > > > > > "real"
> > > > > > > > > > > > solutions forward.
> > > > > > > > > > > >
> > > > > > > > > > > > Best Regards,
> > > > > > > > > > > > Yu
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> > > > karmagyz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Creating a blocker issue for the manually disabled
> > > tests
> > > > > > sounds
> > > > > > > > > good
> > > > > > > > > > to
> > > > > > > > > > > > me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Minor: I'm still a bit worried about the commits
> > merged
> > > > > > before we
> > > > > > > > > fix
> > > > > > > > > > > > > the unstable tests can also break those tests.
> > Instead
> > > of
> > > > > > letting
> > > > > > > > > the
> > > > > > > > > > > > > assigners keep a look at all potentially related
> > > commits,
> > > > > > they
> > > > > > > > can
> > > > > > > > > > > > > maintain a branch that is periodically synced with
> > the
> > > > > master
> > > > > > > > > branch
> > > > > > > > > > > > > while enabling the unstable test. So that they can
> > > catch
> > > > > the
> > > > > > > > > breaking
> > > > > > > > > > > > > changes asap.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I like the idea of creating a blocker issue for a
> > > > > disabled
> > > > > > > > test.
> > > > > > > > > > This
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > force us to resolve it in a timely manner and it
> > > won't
> > > > > fall
> > > > > > > > > through
> > > > > > > > > > > the
> > > > > > > > > > > > > > cracks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think unstable cases can be divided into
> these
> > > > types:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Force majeure: For example, network timeout,
> > > sudden
> > > > > > > > > > environmental
> > > > > > > > > > > > > > > collapse, they are accidental and can always be
> > > > solved
> > > > > by
> > > > > > > > > > > triggering
> > > > > > > > > > > > > azure
> > > > > > > > > > > > > > > again. Committers should wait for the next
> green
> > > > azure.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Obvious mistakes: For example, some errors
> > caused
> > > > by
> > > > > > > > obvious
> > > > > > > > > > > > reasons
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > be repaired quickly. At this time, do we need
> to
> > > > wait,
> > > > > > or not
> > > > > > > > > > wait
> > > > > > > > > > > > and
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > > ignore?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Difficult questions: These problems are very
> > > > > difficult
> > > > > > to
> > > > > > > > > find.
> > > > > > > > > > > > There
> > > > > > > > > > > > > > > will be no solution for a while and a half. We
> > > don't
> > > > > even
> > > > > > > > know
> > > > > > > > > > the
> > > > > > > > > > > > > reason.
> > > > > > > > > > > > > > > At this time, we should ignore it. (Maybe it's
> > > judged
> > > > > by
> > > > > > the
> > > > > > > > > > author
> > > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > case. But what about the old case whose author
> > > can't
> > > > be
> > > > > > > > found?)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So, the ignored cases should be the block of
> the
> > > next
> > > > > > release
> > > > > > > > > > until
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > reason is found or the case is fixed?  We need
> to
> > > > > ensure
> > > > > > that
> > > > > > > > > > > someone
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > take care of these cases, because there is no
> > > > deepening
> > > > > > of
> > > > > > > > > failed
> > > > > > > > > > > > > tests, no
> > > > > > > > > > > > > > > one may continue to pay attention to these
> cases.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think this guideline should consider these
> > > > > situations,
> > > > > > and
> > > > > > > > > show
> > > > > > > > > > > how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > solve them.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Jingsong
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > > > > imjark@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks to Xintong for bringing up this topic,
> > I'm
> > > > +1
> > > > > in
> > > > > > > > > > general.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > However, I think it's still not very clear
> how
> > we
> > > > > > address
> > > > > > > > the
> > > > > > > > > > > > > unstable
> > > > > > > > > > > > > > > > tests.
> > > > > > > > > > > > > > > > I think this is a very important part of this
> > new
> > > > > > > > guideline.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > According to the discussion above, if some
> > tests
> > > > are
> > > > > > > > > unstable,
> > > > > > > > > > we
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > > > > 1) Is the instability judged by the committer
> > > > > > themselves or
> > > > > > > > > by
> > > > > > > > > > > some
> > > > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > > 2) Should we log the disable commit in the
> > > > > > corresponding
> > > > > > > > > issue
> > > > > > > > > > > and
> > > > > > > > > > > > > > > increase
> > > > > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > > > > 3) What if nobody looks into this issue and
> > this
> > > > > > becomes
> > > > > > > > some
> > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > > > > 4) If no person is actively working on the
> > issue,
> > > > who
> > > > > > > > should
> > > > > > > > > > > > > re-enable
> > > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm also not convinced by the idea of
> having
> > an
> > > > > > exception
> > > > > > > > > for
> > > > > > > > > > > > local
> > > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > > We need to execute the entire build (or at
> > > least
> > > > > the
> > > > > > > > > failing
> > > > > > > > > > > > stage)
> > > > > > > > > > > > > > > > > locally, to make sure subsequent test cases
> > > > > > prevented by
> > > > > > > > > the
> > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > are all executed. In that case, it's
> probably
> > > > > easier
> > > > > > to
> > > > > > > > > rerun
> > > > > > > > > > > the
> > > > > > > > > > > > > build
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Concerning disabling unstable test cases
> that
> > > > > > regularly
> > > > > > > > > block
> > > > > > > > > > > PRs
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > merging, maybe we can say that such cases
> can
> > > > only
> > > > > be
> > > > > > > > > > disabled
> > > > > > > > > > > > when
> > > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > > is actively looking into it, likely the
> > person
> > > > who
> > > > > > > > disabled
> > > > > > > > > > the
> > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > this person is no longer actively working
> on
> > > it,
> > > > > > he/she
> > > > > > > > > > should
> > > > > > > > > > > > > enable
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 to provide guidelines on handling test
> > > > failures.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 on this. Currently, the release managers
> > are
> > > > > > > > monitoring
> > > > > > > > > > the
> > > > > > > > > > > ci
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > cron
> > > > > > > > > > > > > > > > > build instabilities and reporting them on
> > JIRA.
> > > > We
> > > > > > should
> > > > > > > > > > also
> > > > > > > > > > > > > > > encourage
> > > > > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> cause
> > > and
> > > > > > solve
> > > > > > > > the
> > > > > > > > > > > > failure
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > new created JIRA  because we could not
> > block
> > > > > other
> > > > > > > > commit
> > > > > > > > > > > > merges
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > significant
> > > > > > > > progress
> > > > > > > > > > > when
> > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm not sure about these two. It feels a
> bit
> > > > > against
> > > > > > the
> > > > > > > > > > > > voluntary
> > > > > > > > > > > > > > > nature
> > > > > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > IMHO, frequent instabilities are more
> likely
> > to
> > > > be
> > > > > > > > upgraded
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > > / blocker priority, receive more attention
> > and
> > > > > > eventually
> > > > > > > > > get
> > > > > > > > > > > > > fixed.
> > > > > > > > > > > > > > > > > Release managers are also responsible for
> > > looking
> > > > > for
> > > > > > > > > > assignees
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > issues. If a case is still not fixed
> soonish,
> > > > even
> > > > > > with
> > > > > > > > all
> > > > > > > > > > > these
> > > > > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > > > I'm not sure how setting a deadline can
> help
> > > > this.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > temporarily,
> > > > > we
> > > > > > > > also
> > > > > > > > > > > need a
> > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > to ensure the issue would be continued to
> > be
> > > > > > > > investigated
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1. As mentioned above, we may consider
> > > disabling
> > > > > > such
> > > > > > > > > tests
> > > > > > > > > > > iff
> > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG
> <
> > > > > > > > > > > beyond1920@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > > > > In order to better comply with the rule,
> it
> > > is
> > > > > > > > necessary
> > > > > > > > > to
> > > > > > > > > > > > > describe
> > > > > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > > > > best practice if encountering test
> failure
> > > > which
> > > > > > seems
> > > > > > > > > > > > unrelated
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > > > > How to avoid merging PR with test
> failures
> > > and
> > > > > not
> > > > > > > > > blocking
> > > > > > > > > > > > code
> > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > > > > I tried to think about the possible
> steps,
> > > and
> > > > > > found
> > > > > > > > > there
> > > > > > > > > > > are
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > detailed problems that need to be
> discussed
> > > in
> > > > a
> > > > > > step
> > > > > > > > > > > further:
> > > > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> > cause
> > > > and
> > > > > > solve
> > > > > > > > > the
> > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > new created JIRA  because we could not
> > block
> > > > > other
> > > > > > > > commit
> > > > > > > > > > > > merges
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > > significant
> > > > > > > > > progress
> > > > > > > > > > > > when
> > > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > > >     There are several situations as
> > follows,
> > > > > maybe
> > > > > > > > > > different
> > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > > > > >     3. not found a good solution, but
> > already
> > > > > > found the
> > > > > > > > > > root
> > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > >     4. found a solution, but it needs
> more
> > > time
> > > > > to
> > > > > > be
> > > > > > > > > done.
> > > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > > temporarily,
> > > > > > we
> > > > > > > > > also
> > > > > > > > > > > > need a
> > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > to ensure the issue would be continued to
> > be
> > > > > > > > investigated
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
> > > 于2021年6月23日周三
> > > > > > > > 下午8:16写道：
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till
> > > > Rohrmann <
> > > > > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I would first try to not introduce
> the
> > > > > > exception
> > > > > > > > for
> > > > > > > > > > > local
> > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > it quite hard for others to verify
> the
> > > > build
> > > > > > and to
> > > > > > > > > > make
> > > > > > > > > > > > sure
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > right things were executed. If we see
> > > that
> > > > > this
> > > > > > > > > becomes
> > > > > > > > > > > an
> > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM
> Yangze
> > > Guo
> > > > <
> > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > +1 for appending this to community
> > > > > > guidelines for
> > > > > > > > > > > merging
> > > > > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > > > > I agree that with this approach
> > > unstable
> > > > > > tests
> > > > > > > > will
> > > > > > > > > > not
> > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > commit merges. However, it might be
> > > hard
> > > > to
> > > > > > > > prevent
> > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > > > > that are related to those tests and
> > > > should
> > > > > > have
> > > > > > > > > been
> > > > > > > > > > > > passed
> > > > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > > > true that this judgment can be made
> > by
> > > > the
> > > > > > > > > > committers,
> > > > > > > > > > > > but
> > > > > > > > > > > > > no
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > ensure the judgment is always
> precise
> > > and
> > > > > so
> > > > > > that
> > > > > > > > > we
> > > > > > > > > > > have
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding the unstable tests, how
> > about
> > > > > > adding
> > > > > > > > > > another
> > > > > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > > > > committers verify it in their local
> > > > > > environment
> > > > > > > > and
> > > > > > > > > > > > > comment in
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM
> 刘建刚 <
> > > > > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > It is a good principle to run all
> > > tests
> > > > > > > > > > successfully
> > > > > > > > > > > > > with any
> > > > > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > means a lot for project's
> stability
> > > and
> > > > > > > > > > development.
> > > > > > > > > > > I
> > > > > > > > > > > > > am big
> > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > > > > 于2021年6月22日周二
> > > > > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > One way to address the problem
> of
> > > > > > regularly
> > > > > > > > > > failing
> > > > > > > > > > > > > tests
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > > > > merging of PRs is to disable
> the
> > > > > > respective
> > > > > > > > > tests
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > > > > course, the failing test then
> > needs
> > > > to
> > > > > be
> > > > > > > > > fixed.
> > > > > > > > > > > But
> > > > > > > > > > > > at
> > > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > would not block everyone from
> > > making
> > > > > > > > progress.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00
> PM
> > > > Arvid
> > > > > > Heise
> > > > > > > > <
> > > > > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I think this is overall a
> good
> > > > idea.
> > > > > > So +1
> > > > > > > > > from
> > > > > > > > > > > my
> > > > > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > > > > However, I'd like to put a
> > higher
> > > > > > priority
> > > > > > > > on
> > > > > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > particular docker
> > image/artifact
> > > > > > caches.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50
> > AM
> > > > Till
> > > > > > > > > Rohrmann
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing this
> > topic
> > > to
> > > > > our
> > > > > > > > > > attention
> > > > > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of
> sense
> > > and
> > > > > we
> > > > > > > > should
> > > > > > > > > > > > follow
> > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > > > > confidence that our changes
> > are
> > > > > > working
> > > > > > > > and
> > > > > > > > > > it
> > > > > > > > > > > > > might
> > > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > quickly fix build
> > > instabilities.
> > > > > > Hence,
> > > > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at
> 11:12
> > > AM
> > > > > > Xintong
> > > > > > > > > > Song <
> > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > In the past a couple of
> > > weeks,
> > > > > I've
> > > > > > > > > > observed
> > > > > > > > > > > > > several
> > > > > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > merged without a green
> > light
> > > > from
> > > > > > the
> > > > > > > > CI
> > > > > > > > > > > tests,
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > > considered *unrelated*.
> > This
> > > > may
> > > > > > not
> > > > > > > > > always
> > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > increase the chance of
> > > breaking
> > > > > our
> > > > > > > > code
> > > > > > > > > > > base.
> > > > > > > > > > > > In
> > > > > > > > > > > > > > > fact,
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > me twice in the past few
> > > weeks
> > > > > > that I
> > > > > > > > had
> > > > > > > > > > to
> > > > > > > > > > > > > revert a
> > > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > > > > the master branch due to
> > > this.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > I think it would be nicer
> > to
> > > > > > enforce a
> > > > > > > > > > > stricter
> > > > > > > > > > > > > rule,
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > merged without passing
> CI.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > The problems of merging
> PRs
> > > > with
> > > > > > > > > > "unrelated"
> > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > > > > - It's not always
> > > > straightforward
> > > > > > to
> > > > > > > > tell
> > > > > > > > > > > > > whether a
> > > > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > > > > - It prevents subsequent
> > test
> > > > > cases
> > > > > > > > from
> > > > > > > > > > > being
> > > > > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > > > > relating to the PR
> changes.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > To make things easier for
> > the
> > > > > > > > committers,
> > > > > > > > > > the
> > > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in
> > the
> > > > > > > > > contributor's
> > > > > > > > > > > > > personal
> > > > > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have been
> > > > > triggered
> > > > > > > > > multiple
> > > > > > > > > > > > > times, on
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > each stage has at least
> > > passed
> > > > > for
> > > > > > > > once.
> > > > > > > > > > > Please
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > If we all agree on this,
> > I'd
> > > > > > update the
> > > > > > > > > > > > community
> > > > > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal.
> [1]
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know what
> do
> > > you
> > > > > > think.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> >
> >
> > --
> > Best regards!
> > Rui Li
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by godfrey he <go...@gmail.com>.

+1 for the proposal. Thanks Xintong!

Best,
Godfrey



Rui Li <li...@gmail.com> 于2021年6月30日周三 上午11:36写道：

> Thanks Xintong. +1 to the proposal.
>
> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com> wrote:
>
> > +1 for the proposal. Since the test time is long and environment may
> vary,
> > unstable tests are really annoying for developers. The solution is
> welcome.
> >
> > Best
> > liujiangang
> >
> > Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
> >
> > > +1 Thanks Xintong for the update!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > > > +1, thanks for updating the guidelines Xintong!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thanks Xintong for drafting this doc.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Thanks Xintong for giving detailed documentation.
> > > > > >
> > > > > > The best practice for handling test failure is very detailed,
> it's
> > a
> > > > good
> > > > > > guidelines document with clear action steps.
> > > > > >
> > > > > > +1 to Xintong's proposal.
> > > > > >
> > > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > > > > >
> > > > > > > Thanks all for the discussion.
> > > > > > >
> > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > [1],
> > > > as a
> > > > > > > potential replacement of the original wiki page [2].
> > > > > > >
> > > > > > > Hopefully this draft has covered the most opinions discussed
> and
> > > > > consensus
> > > > > > > made in this discussion thread.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > pnowojski@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the clarification Till. +1 for what you have
> > written.
> > > > > > > >
> > > > > > > > Piotrek
> > > > > > > >
> > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <trohrmann@apache.org
> >
> > > > > > > napisał(a):
> > > > > > > >
> > > > > > > > > One quick note for clarification. I don't have anything
> > against
> > > > > builds
> > > > > > > > > running on your personal Azure account and this is not
> what I
> > > > > > > understood
> > > > > > > > > under "local environment". For me "local environment" means
> > > that
> > > > > > > someone
> > > > > > > > > runs the test locally on his machine and then says that the
> > > > > > > > > tests have passed locally.
> > > > > > > > >
> > > > > > > > > I do agree that there might be a conflict of interests if a
> > PR
> > > > > author
> > > > > > > > > disables tests. Here I would argue that we don't have
> > malignant
> > > > > > > > committers
> > > > > > > > > which means that every committer will probably first check
> > the
> > > > > > > respective
> > > > > > > > > ticket for how often the test failed. Then I guess the next
> > > step
> > > > > would
> > > > > > > be
> > > > > > > > > to discuss on the ticket whether to disable it or not. And
> > > > finally,
> > > > > > > after
> > > > > > > > > reaching a consensus, it will be disabled. If we see
> someone
> > > > > abusing
> > > > > > > this
> > > > > > > > > policy, then we can still think about how to guard against
> > it.
> > > > But,
> > > > > > > > > honestly, I have very rarely seen such a case. I am also ok
> > to
> > > > > pull in
> > > > > > > > the
> > > > > > > > > release manager to make the final call if this resolves
> > > concerns.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > pnowojski@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for the general idea, however I have concerns about a
> > > couple
> > > > > of
> > > > > > > > > details.
> > > > > > > > > >
> > > > > > > > > > > I would first try to not introduce the exception for
> > local
> > > > > builds.
> > > > > > > > > > > It makes it quite hard for others to verify the build
> and
> > > to
> > > > > make
> > > > > > > > sure
> > > > > > > > > > that the right things were executed.
> > > > > > > > > >
> > > > > > > > > > I would counter Till's proposal to ignore local green
> > builds.
> > > > If
> > > > > > > > > committer
> > > > > > > > > > is merging and closing a PR, with official azure failure,
> > but
> > > > > there
> > > > > > > > was a
> > > > > > > > > > green build before or in local azure it's IMO enough to
> > leave
> > > > the
> > > > > > > > > message:
> > > > > > > > > >
> > > > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > > > Green local build: URL
> > > > > > > > > >
> > > > > > > > > > This should address Till's concern about verification.
> > > > > > > > > >
> > > > > > > > > > On the other hand I have concerns about disabling tests.*
> > It
> > > > > > > shouldn't
> > > > > > > > be
> > > > > > > > > > the PR author/committer that's disabling a test on his
> own,
> > > as
> > > > > > > that's a
> > > > > > > > > > conflict of interests*. I have however no problems with
> > > > disabling
> > > > > > > test
> > > > > > > > > > instabilities that were marked as "blockers" though, that
> > > > should
> > > > > work
> > > > > > > > > > pretty well. But the important thing here is to correctly
> > > judge
> > > > > > > bumping
> > > > > > > > > > priorities of test instabilities based on their frequency
> > and
> > > > > current
> > > > > > > > > > general health of the system. I believe that release
> > managers
> > > > > should
> > > > > > > be
> > > > > > > > > > playing a big role here in deciding on the guidelines of
> > what
> > > > > should
> > > > > > > > be a
> > > > > > > > > > priority of certain test instabilities.
> > > > > > > > > >
> > > > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > > > 1. if we have a handful of very frequently failing tests
> > and
> > > a
> > > > > > > handful
> > > > > > > > of
> > > > > > > > > > very rarely failing tests (like one reported failure and
> no
> > > > > another
> > > > > > > > > > occurrence in many months, and let's even say that the
> > > failure
> > > > > looks
> > > > > > > > like
> > > > > > > > > > infrastructure/network timeout), we should focus on the
> > > > > frequently
> > > > > > > > > failing
> > > > > > > > > > ones, and probably we are safe to ignore for the time
> being
> > > the
> > > > > rare
> > > > > > > > > issues
> > > > > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > > > > 2. If we have tons of rarely failing test instabilities,
> we
> > > > > should
> > > > > > > > > probably
> > > > > > > > > > start addressing them as blockers.
> > > > > > > > > >
> > > > > > > > > > I'm using my own conscious and my best judgement when I'm
> > > > > > > > > > bumping/decreasing priorities of test instabilities (and
> > > bugs),
> > > > > but
> > > > > > > as
> > > > > > > > > > individual committer I don't have the full picture. As I
> > > wrote
> > > > > > > above, I
> > > > > > > > > > think release managers are in a much better position to
> > keep
> > > > > > > adjusting
> > > > > > > > > > those kind of guidelines.
> > > > > > > > > >
> > > > > > > > > > Best, Piotrek
> > > > > > > > > >
> > > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> > > napisał(a):
> > > > > > > > > >
> > > > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > > > >
> > > > > > > > > > > For me, resolving problems directly (fixing the
> > > > infrastructure
> > > > > > > issue,
> > > > > > > > > > > disabling unstable tests and creating blocker JIRAs to
> > > track
> > > > > the
> > > > > > > fix
> > > > > > > > > and
> > > > > > > > > > > re-enable them asap, etc.) is (in most cases) better
> than
> > > > > working
> > > > > > > > > around
> > > > > > > > > > > them (verify locally, manually check and judge the
> > failure
> > > as
> > > > > > > > > > "unrelated",
> > > > > > > > > > > etc.), and I believe the proposal could help us pushing
> > > those
> > > > > more
> > > > > > > > > "real"
> > > > > > > > > > > solutions forward.
> > > > > > > > > > >
> > > > > > > > > > > Best Regards,
> > > > > > > > > > > Yu
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> > > karmagyz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Creating a blocker issue for the manually disabled
> > tests
> > > > > sounds
> > > > > > > > good
> > > > > > > > > to
> > > > > > > > > > > me.
> > > > > > > > > > > >
> > > > > > > > > > > > Minor: I'm still a bit worried about the commits
> merged
> > > > > before we
> > > > > > > > fix
> > > > > > > > > > > > the unstable tests can also break those tests.
> Instead
> > of
> > > > > letting
> > > > > > > > the
> > > > > > > > > > > > assigners keep a look at all potentially related
> > commits,
> > > > > they
> > > > > > > can
> > > > > > > > > > > > maintain a branch that is periodically synced with
> the
> > > > master
> > > > > > > > branch
> > > > > > > > > > > > while enabling the unstable test. So that they can
> > catch
> > > > the
> > > > > > > > breaking
> > > > > > > > > > > > changes asap.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > I like the idea of creating a blocker issue for a
> > > > disabled
> > > > > > > test.
> > > > > > > > > This
> > > > > > > > > > > > will
> > > > > > > > > > > > > force us to resolve it in a timely manner and it
> > won't
> > > > fall
> > > > > > > > through
> > > > > > > > > > the
> > > > > > > > > > > > > cracks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Till
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think unstable cases can be divided into these
> > > types:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Force majeure: For example, network timeout,
> > sudden
> > > > > > > > > environmental
> > > > > > > > > > > > > > collapse, they are accidental and can always be
> > > solved
> > > > by
> > > > > > > > > > triggering
> > > > > > > > > > > > azure
> > > > > > > > > > > > > > again. Committers should wait for the next green
> > > azure.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Obvious mistakes: For example, some errors
> caused
> > > by
> > > > > > > obvious
> > > > > > > > > > > reasons
> > > > > > > > > > > > may
> > > > > > > > > > > > > > be repaired quickly. At this time, do we need to
> > > wait,
> > > > > or not
> > > > > > > > > wait
> > > > > > > > > > > and
> > > > > > > > > > > > just
> > > > > > > > > > > > > > ignore?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Difficult questions: These problems are very
> > > > difficult
> > > > > to
> > > > > > > > find.
> > > > > > > > > > > There
> > > > > > > > > > > > > > will be no solution for a while and a half. We
> > don't
> > > > even
> > > > > > > know
> > > > > > > > > the
> > > > > > > > > > > > reason.
> > > > > > > > > > > > > > At this time, we should ignore it. (Maybe it's
> > judged
> > > > by
> > > > > the
> > > > > > > > > author
> > > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > case. But what about the old case whose author
> > can't
> > > be
> > > > > > > found?)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So, the ignored cases should be the block of the
> > next
> > > > > release
> > > > > > > > > until
> > > > > > > > > > > the
> > > > > > > > > > > > > > reason is found or the case is fixed?  We need to
> > > > ensure
> > > > > that
> > > > > > > > > > someone
> > > > > > > > > > > > will
> > > > > > > > > > > > > > take care of these cases, because there is no
> > > deepening
> > > > > of
> > > > > > > > failed
> > > > > > > > > > > > tests, no
> > > > > > > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think this guideline should consider these
> > > > situations,
> > > > > and
> > > > > > > > show
> > > > > > > > > > how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > solve them.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Jingsong
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > > > imjark@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks to Xintong for bringing up this topic,
> I'm
> > > +1
> > > > in
> > > > > > > > > general.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, I think it's still not very clear how
> we
> > > > > address
> > > > > > > the
> > > > > > > > > > > > unstable
> > > > > > > > > > > > > > > tests.
> > > > > > > > > > > > > > > I think this is a very important part of this
> new
> > > > > > > guideline.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > According to the discussion above, if some
> tests
> > > are
> > > > > > > > unstable,
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > > > 1) Is the instability judged by the committer
> > > > > themselves or
> > > > > > > > by
> > > > > > > > > > some
> > > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > 2) Should we log the disable commit in the
> > > > > corresponding
> > > > > > > > issue
> > > > > > > > > > and
> > > > > > > > > > > > > > increase
> > > > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > > > 3) What if nobody looks into this issue and
> this
> > > > > becomes
> > > > > > > some
> > > > > > > > > > > > potential
> > > > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > > > 4) If no person is actively working on the
> issue,
> > > who
> > > > > > > should
> > > > > > > > > > > > re-enable
> > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm also not convinced by the idea of having
> an
> > > > > exception
> > > > > > > > for
> > > > > > > > > > > local
> > > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > > We need to execute the entire build (or at
> > least
> > > > the
> > > > > > > > failing
> > > > > > > > > > > stage)
> > > > > > > > > > > > > > > > locally, to make sure subsequent test cases
> > > > > prevented by
> > > > > > > > the
> > > > > > > > > > > > failure
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > are all executed. In that case, it's probably
> > > > easier
> > > > > to
> > > > > > > > rerun
> > > > > > > > > > the
> > > > > > > > > > > > build
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Concerning disabling unstable test cases that
> > > > > regularly
> > > > > > > > block
> > > > > > > > > > PRs
> > > > > > > > > > > > from
> > > > > > > > > > > > > > > > merging, maybe we can say that such cases can
> > > only
> > > > be
> > > > > > > > > disabled
> > > > > > > > > > > when
> > > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > > is actively looking into it, likely the
> person
> > > who
> > > > > > > disabled
> > > > > > > > > the
> > > > > > > > > > > > case.
> > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > this person is no longer actively working on
> > it,
> > > > > he/she
> > > > > > > > > should
> > > > > > > > > > > > enable
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 to provide guidelines on handling test
> > > failures.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 on this. Currently, the release managers
> are
> > > > > > > monitoring
> > > > > > > > > the
> > > > > > > > > > ci
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > cron
> > > > > > > > > > > > > > > > build instabilities and reporting them on
> JIRA.
> > > We
> > > > > should
> > > > > > > > > also
> > > > > > > > > > > > > > encourage
> > > > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. Set a deadline to find out the root cause
> > and
> > > > > solve
> > > > > > > the
> > > > > > > > > > > failure
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > new created JIRA  because we could not
> block
> > > > other
> > > > > > > commit
> > > > > > > > > > > merges
> > > > > > > > > > > > for
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > significant
> > > > > > > progress
> > > > > > > > > > when
> > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm not sure about these two. It feels a bit
> > > > against
> > > > > the
> > > > > > > > > > > voluntary
> > > > > > > > > > > > > > nature
> > > > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IMHO, frequent instabilities are more likely
> to
> > > be
> > > > > > > upgraded
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > / blocker priority, receive more attention
> and
> > > > > eventually
> > > > > > > > get
> > > > > > > > > > > > fixed.
> > > > > > > > > > > > > > > > Release managers are also responsible for
> > looking
> > > > for
> > > > > > > > > assignees
> > > > > > > > > > > for
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > issues. If a case is still not fixed soonish,
> > > even
> > > > > with
> > > > > > > all
> > > > > > > > > > these
> > > > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > > I'm not sure how setting a deadline can help
> > > this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > temporarily,
> > > > we
> > > > > > > also
> > > > > > > > > > need a
> > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > to ensure the issue would be continued to
> be
> > > > > > > investigated
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1. As mentioned above, we may consider
> > disabling
> > > > > such
> > > > > > > > tests
> > > > > > > > > > iff
> > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > > > > > > beyond1920@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > > > In order to better comply with the rule, it
> > is
> > > > > > > necessary
> > > > > > > > to
> > > > > > > > > > > > describe
> > > > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > > > best practice if encountering test failure
> > > which
> > > > > seems
> > > > > > > > > > > unrelated
> > > > > > > > > > > > with
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > > > How to avoid merging PR with test failures
> > and
> > > > not
> > > > > > > > blocking
> > > > > > > > > > > code
> > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > > > I tried to think about the possible steps,
> > and
> > > > > found
> > > > > > > > there
> > > > > > > > > > are
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > detailed problems that need to be discussed
> > in
> > > a
> > > > > step
> > > > > > > > > > further:
> > > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > > 2. Set a deadline to find out the root
> cause
> > > and
> > > > > solve
> > > > > > > > the
> > > > > > > > > > > > failure
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > new created JIRA  because we could not
> block
> > > > other
> > > > > > > commit
> > > > > > > > > > > merges
> > > > > > > > > > > > for
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > > significant
> > > > > > > > progress
> > > > > > > > > > > when
> > > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > > >     There are several situations as
> follows,
> > > > maybe
> > > > > > > > > different
> > > > > > > > > > > > cases
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > > > >     3. not found a good solution, but
> already
> > > > > found the
> > > > > > > > > root
> > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > >     4. found a solution, but it needs more
> > time
> > > > to
> > > > > be
> > > > > > > > done.
> > > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > > temporarily,
> > > > > we
> > > > > > > > also
> > > > > > > > > > > need a
> > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > to ensure the issue would be continued to
> be
> > > > > > > investigated
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
> > 于2021年6月23日周三
> > > > > > > 下午8:16写道：
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till
> > > Rohrmann <
> > > > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I would first try to not introduce the
> > > > > exception
> > > > > > > for
> > > > > > > > > > local
> > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > it quite hard for others to verify the
> > > build
> > > > > and to
> > > > > > > > > make
> > > > > > > > > > > sure
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > right things were executed. If we see
> > that
> > > > this
> > > > > > > > becomes
> > > > > > > > > > an
> > > > > > > > > > > > issue
> > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze
> > Guo
> > > <
> > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > +1 for appending this to community
> > > > > guidelines for
> > > > > > > > > > merging
> > > > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > > > I agree that with this approach
> > unstable
> > > > > tests
> > > > > > > will
> > > > > > > > > not
> > > > > > > > > > > > block
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > commit merges. However, it might be
> > hard
> > > to
> > > > > > > prevent
> > > > > > > > > > > merging
> > > > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > > > that are related to those tests and
> > > should
> > > > > have
> > > > > > > > been
> > > > > > > > > > > passed
> > > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > > true that this judgment can be made
> by
> > > the
> > > > > > > > > committers,
> > > > > > > > > > > but
> > > > > > > > > > > > no
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > ensure the judgment is always precise
> > and
> > > > so
> > > > > that
> > > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding the unstable tests, how
> about
> > > > > adding
> > > > > > > > > another
> > > > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > > > committers verify it in their local
> > > > > environment
> > > > > > > and
> > > > > > > > > > > > comment in
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > It is a good principle to run all
> > tests
> > > > > > > > > successfully
> > > > > > > > > > > > with any
> > > > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > means a lot for project's stability
> > and
> > > > > > > > > development.
> > > > > > > > > > I
> > > > > > > > > > > > am big
> > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Till Rohrmann <
> trohrmann@apache.org>
> > > > > > > > 于2021年6月22日周二
> > > > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > One way to address the problem of
> > > > > regularly
> > > > > > > > > failing
> > > > > > > > > > > > tests
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > > > merging of PRs is to disable the
> > > > > respective
> > > > > > > > tests
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > > > course, the failing test then
> needs
> > > to
> > > > be
> > > > > > > > fixed.
> > > > > > > > > > But
> > > > > > > > > > > at
> > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > would not block everyone from
> > making
> > > > > > > progress.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM
> > > Arvid
> > > > > Heise
> > > > > > > <
> > > > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I think this is overall a good
> > > idea.
> > > > > So +1
> > > > > > > > from
> > > > > > > > > > my
> > > > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > > > However, I'd like to put a
> higher
> > > > > priority
> > > > > > > on
> > > > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > particular docker
> image/artifact
> > > > > caches.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50
> AM
> > > Till
> > > > > > > > Rohrmann
> > > > > > > > > <
> > > > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing this
> topic
> > to
> > > > our
> > > > > > > > > attention
> > > > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of sense
> > and
> > > > we
> > > > > > > should
> > > > > > > > > > > follow
> > > > > > > > > > > > it.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > > > confidence that our changes
> are
> > > > > working
> > > > > > > and
> > > > > > > > > it
> > > > > > > > > > > > might
> > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > quickly fix build
> > instabilities.
> > > > > Hence,
> > > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12
> > AM
> > > > > Xintong
> > > > > > > > > Song <
> > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > In the past a couple of
> > weeks,
> > > > I've
> > > > > > > > > observed
> > > > > > > > > > > > several
> > > > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > merged without a green
> light
> > > from
> > > > > the
> > > > > > > CI
> > > > > > > > > > tests,
> > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > > considered *unrelated*.
> This
> > > may
> > > > > not
> > > > > > > > always
> > > > > > > > > > > cause
> > > > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > increase the chance of
> > breaking
> > > > our
> > > > > > > code
> > > > > > > > > > base.
> > > > > > > > > > > In
> > > > > > > > > > > > > > fact,
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > me twice in the past few
> > weeks
> > > > > that I
> > > > > > > had
> > > > > > > > > to
> > > > > > > > > > > > revert a
> > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > > > the master branch due to
> > this.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I think it would be nicer
> to
> > > > > enforce a
> > > > > > > > > > stricter
> > > > > > > > > > > > rule,
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > The problems of merging PRs
> > > with
> > > > > > > > > "unrelated"
> > > > > > > > > > > test
> > > > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > > > - It's not always
> > > straightforward
> > > > > to
> > > > > > > tell
> > > > > > > > > > > > whether a
> > > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > > > - It prevents subsequent
> test
> > > > cases
> > > > > > > from
> > > > > > > > > > being
> > > > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > To make things easier for
> the
> > > > > > > committers,
> > > > > > > > > the
> > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in
> the
> > > > > > > > contributor's
> > > > > > > > > > > > personal
> > > > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have been
> > > > triggered
> > > > > > > > multiple
> > > > > > > > > > > > times, on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > each stage has at least
> > passed
> > > > for
> > > > > > > once.
> > > > > > > > > > Please
> > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > If we all agree on this,
> I'd
> > > > > update the
> > > > > > > > > > > community
> > > > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Please let me know what do
> > you
> > > > > think.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Rui Li <li...@gmail.com>.

Thanks Xintong. +1 to the proposal.

On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 <li...@gmail.com> wrote:

> +1 for the proposal. Since the test time is long and environment may vary,
> unstable tests are really annoying for developers. The solution is welcome.
>
> Best
> liujiangang
>
> Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：
>
> > +1 Thanks Xintong for the update!
> >
> > Best,
> > Jingsong
> >
> > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> > > +1, thanks for updating the guidelines Xintong!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > >
> > > > +1
> > > >
> > > > Thanks Xintong for drafting this doc.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com>
> > wrote:
> > > > >
> > > > > Thanks Xintong for giving detailed documentation.
> > > > >
> > > > > The best practice for handling test failure is very detailed, it's
> a
> > > good
> > > > > guidelines document with clear action steps.
> > > > >
> > > > > +1 to Xintong's proposal.
> > > > >
> > > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > > > >
> > > > > > Thanks all for the discussion.
> > > > > >
> > > > > > Based on the opinions so far, I've drafted the new guidelines
> [1],
> > > as a
> > > > > > potential replacement of the original wiki page [2].
> > > > > >
> > > > > > Hopefully this draft has covered the most opinions discussed and
> > > > consensus
> > > > > > made in this discussion thread.
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > >
> > > > > > [2]
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > pnowojski@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the clarification Till. +1 for what you have
> written.
> > > > > > >
> > > > > > > Piotrek
> > > > > > >
> > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> > > > > > napisał(a):
> > > > > > >
> > > > > > > > One quick note for clarification. I don't have anything
> against
> > > > builds
> > > > > > > > running on your personal Azure account and this is not what I
> > > > > > understood
> > > > > > > > under "local environment". For me "local environment" means
> > that
> > > > > > someone
> > > > > > > > runs the test locally on his machine and then says that the
> > > > > > > > tests have passed locally.
> > > > > > > >
> > > > > > > > I do agree that there might be a conflict of interests if a
> PR
> > > > author
> > > > > > > > disables tests. Here I would argue that we don't have
> malignant
> > > > > > > committers
> > > > > > > > which means that every committer will probably first check
> the
> > > > > > respective
> > > > > > > > ticket for how often the test failed. Then I guess the next
> > step
> > > > would
> > > > > > be
> > > > > > > > to discuss on the ticket whether to disable it or not. And
> > > finally,
> > > > > > after
> > > > > > > > reaching a consensus, it will be disabled. If we see someone
> > > > abusing
> > > > > > this
> > > > > > > > policy, then we can still think about how to guard against
> it.
> > > But,
> > > > > > > > honestly, I have very rarely seen such a case. I am also ok
> to
> > > > pull in
> > > > > > > the
> > > > > > > > release manager to make the final call if this resolves
> > concerns.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > pnowojski@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the general idea, however I have concerns about a
> > couple
> > > > of
> > > > > > > > details.
> > > > > > > > >
> > > > > > > > > > I would first try to not introduce the exception for
> local
> > > > builds.
> > > > > > > > > > It makes it quite hard for others to verify the build and
> > to
> > > > make
> > > > > > > sure
> > > > > > > > > that the right things were executed.
> > > > > > > > >
> > > > > > > > > I would counter Till's proposal to ignore local green
> builds.
> > > If
> > > > > > > > committer
> > > > > > > > > is merging and closing a PR, with official azure failure,
> but
> > > > there
> > > > > > > was a
> > > > > > > > > green build before or in local azure it's IMO enough to
> leave
> > > the
> > > > > > > > message:
> > > > > > > > >
> > > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > > Green local build: URL
> > > > > > > > >
> > > > > > > > > This should address Till's concern about verification.
> > > > > > > > >
> > > > > > > > > On the other hand I have concerns about disabling tests.*
> It
> > > > > > shouldn't
> > > > > > > be
> > > > > > > > > the PR author/committer that's disabling a test on his own,
> > as
> > > > > > that's a
> > > > > > > > > conflict of interests*. I have however no problems with
> > > disabling
> > > > > > test
> > > > > > > > > instabilities that were marked as "blockers" though, that
> > > should
> > > > work
> > > > > > > > > pretty well. But the important thing here is to correctly
> > judge
> > > > > > bumping
> > > > > > > > > priorities of test instabilities based on their frequency
> and
> > > > current
> > > > > > > > > general health of the system. I believe that release
> managers
> > > > should
> > > > > > be
> > > > > > > > > playing a big role here in deciding on the guidelines of
> what
> > > > should
> > > > > > > be a
> > > > > > > > > priority of certain test instabilities.
> > > > > > > > >
> > > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > > 1. if we have a handful of very frequently failing tests
> and
> > a
> > > > > > handful
> > > > > > > of
> > > > > > > > > very rarely failing tests (like one reported failure and no
> > > > another
> > > > > > > > > occurrence in many months, and let's even say that the
> > failure
> > > > looks
> > > > > > > like
> > > > > > > > > infrastructure/network timeout), we should focus on the
> > > > frequently
> > > > > > > > failing
> > > > > > > > > ones, and probably we are safe to ignore for the time being
> > the
> > > > rare
> > > > > > > > issues
> > > > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > > > 2. If we have tons of rarely failing test instabilities, we
> > > > should
> > > > > > > > probably
> > > > > > > > > start addressing them as blockers.
> > > > > > > > >
> > > > > > > > > I'm using my own conscious and my best judgement when I'm
> > > > > > > > > bumping/decreasing priorities of test instabilities (and
> > bugs),
> > > > but
> > > > > > as
> > > > > > > > > individual committer I don't have the full picture. As I
> > wrote
> > > > > > above, I
> > > > > > > > > think release managers are in a much better position to
> keep
> > > > > > adjusting
> > > > > > > > > those kind of guidelines.
> > > > > > > > >
> > > > > > > > > Best, Piotrek
> > > > > > > > >
> > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> > napisał(a):
> > > > > > > > >
> > > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > > >
> > > > > > > > > > For me, resolving problems directly (fixing the
> > > infrastructure
> > > > > > issue,
> > > > > > > > > > disabling unstable tests and creating blocker JIRAs to
> > track
> > > > the
> > > > > > fix
> > > > > > > > and
> > > > > > > > > > re-enable them asap, etc.) is (in most cases) better than
> > > > working
> > > > > > > > around
> > > > > > > > > > them (verify locally, manually check and judge the
> failure
> > as
> > > > > > > > > "unrelated",
> > > > > > > > > > etc.), and I believe the proposal could help us pushing
> > those
> > > > more
> > > > > > > > "real"
> > > > > > > > > > solutions forward.
> > > > > > > > > >
> > > > > > > > > > Best Regards,
> > > > > > > > > > Yu
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> > karmagyz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Creating a blocker issue for the manually disabled
> tests
> > > > sounds
> > > > > > > good
> > > > > > > > to
> > > > > > > > > > me.
> > > > > > > > > > >
> > > > > > > > > > > Minor: I'm still a bit worried about the commits merged
> > > > before we
> > > > > > > fix
> > > > > > > > > > > the unstable tests can also break those tests. Instead
> of
> > > > letting
> > > > > > > the
> > > > > > > > > > > assigners keep a look at all potentially related
> commits,
> > > > they
> > > > > > can
> > > > > > > > > > > maintain a branch that is periodically synced with the
> > > master
> > > > > > > branch
> > > > > > > > > > > while enabling the unstable test. So that they can
> catch
> > > the
> > > > > > > breaking
> > > > > > > > > > > changes asap.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > > trohrmann@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I like the idea of creating a blocker issue for a
> > > disabled
> > > > > > test.
> > > > > > > > This
> > > > > > > > > > > will
> > > > > > > > > > > > force us to resolve it in a timely manner and it
> won't
> > > fall
> > > > > > > through
> > > > > > > > > the
> > > > > > > > > > > > cracks.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think unstable cases can be divided into these
> > types:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Force majeure: For example, network timeout,
> sudden
> > > > > > > > environmental
> > > > > > > > > > > > > collapse, they are accidental and can always be
> > solved
> > > by
> > > > > > > > > triggering
> > > > > > > > > > > azure
> > > > > > > > > > > > > again. Committers should wait for the next green
> > azure.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Obvious mistakes: For example, some errors caused
> > by
> > > > > > obvious
> > > > > > > > > > reasons
> > > > > > > > > > > may
> > > > > > > > > > > > > be repaired quickly. At this time, do we need to
> > wait,
> > > > or not
> > > > > > > > wait
> > > > > > > > > > and
> > > > > > > > > > > just
> > > > > > > > > > > > > ignore?
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Difficult questions: These problems are very
> > > difficult
> > > > to
> > > > > > > find.
> > > > > > > > > > There
> > > > > > > > > > > > > will be no solution for a while and a half. We
> don't
> > > even
> > > > > > know
> > > > > > > > the
> > > > > > > > > > > reason.
> > > > > > > > > > > > > At this time, we should ignore it. (Maybe it's
> judged
> > > by
> > > > the
> > > > > > > > author
> > > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > case. But what about the old case whose author
> can't
> > be
> > > > > > found?)
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, the ignored cases should be the block of the
> next
> > > > release
> > > > > > > > until
> > > > > > > > > > the
> > > > > > > > > > > > > reason is found or the case is fixed?  We need to
> > > ensure
> > > > that
> > > > > > > > > someone
> > > > > > > > > > > will
> > > > > > > > > > > > > take care of these cases, because there is no
> > deepening
> > > > of
> > > > > > > failed
> > > > > > > > > > > tests, no
> > > > > > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this guideline should consider these
> > > situations,
> > > > and
> > > > > > > show
> > > > > > > > > how
> > > > > > > > > > > to
> > > > > > > > > > > > > solve them.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Jingsong
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > > imjark@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks to Xintong for bringing up this topic, I'm
> > +1
> > > in
> > > > > > > > general.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, I think it's still not very clear how we
> > > > address
> > > > > > the
> > > > > > > > > > > unstable
> > > > > > > > > > > > > > tests.
> > > > > > > > > > > > > > I think this is a very important part of this new
> > > > > > guideline.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > According to the discussion above, if some tests
> > are
> > > > > > > unstable,
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > > 1) Is the instability judged by the committer
> > > > themselves or
> > > > > > > by
> > > > > > > > > some
> > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > 2) Should we log the disable commit in the
> > > > corresponding
> > > > > > > issue
> > > > > > > > > and
> > > > > > > > > > > > > increase
> > > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > > 3) What if nobody looks into this issue and this
> > > > becomes
> > > > > > some
> > > > > > > > > > > potential
> > > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > > 4) If no person is actively working on the issue,
> > who
> > > > > > should
> > > > > > > > > > > re-enable
> > > > > > > > > > > > > it?
> > > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm also not convinced by the idea of having an
> > > > exception
> > > > > > > for
> > > > > > > > > > local
> > > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > > We need to execute the entire build (or at
> least
> > > the
> > > > > > > failing
> > > > > > > > > > stage)
> > > > > > > > > > > > > > > locally, to make sure subsequent test cases
> > > > prevented by
> > > > > > > the
> > > > > > > > > > > failure
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > are all executed. In that case, it's probably
> > > easier
> > > > to
> > > > > > > rerun
> > > > > > > > > the
> > > > > > > > > > > build
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Concerning disabling unstable test cases that
> > > > regularly
> > > > > > > block
> > > > > > > > > PRs
> > > > > > > > > > > from
> > > > > > > > > > > > > > > merging, maybe we can say that such cases can
> > only
> > > be
> > > > > > > > disabled
> > > > > > > > > > when
> > > > > > > > > > > > > > someone
> > > > > > > > > > > > > > > is actively looking into it, likely the person
> > who
> > > > > > disabled
> > > > > > > > the
> > > > > > > > > > > case.
> > > > > > > > > > > > > If
> > > > > > > > > > > > > > > this person is no longer actively working on
> it,
> > > > he/she
> > > > > > > > should
> > > > > > > > > > > enable
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 to provide guidelines on handling test
> > failures.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 on this. Currently, the release managers are
> > > > > > monitoring
> > > > > > > > the
> > > > > > > > > ci
> > > > > > > > > > > and
> > > > > > > > > > > > > > cron
> > > > > > > > > > > > > > > build instabilities and reporting them on JIRA.
> > We
> > > > should
> > > > > > > > also
> > > > > > > > > > > > > encourage
> > > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. Set a deadline to find out the root cause
> and
> > > > solve
> > > > > > the
> > > > > > > > > > failure
> > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > new created JIRA  because we could not block
> > > other
> > > > > > commit
> > > > > > > > > > merges
> > > > > > > > > > > for
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > significant
> > > > > > progress
> > > > > > > > > when
> > > > > > > > > > > > > reached
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm not sure about these two. It feels a bit
> > > against
> > > > the
> > > > > > > > > > voluntary
> > > > > > > > > > > > > nature
> > > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IMHO, frequent instabilities are more likely to
> > be
> > > > > > upgraded
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > / blocker priority, receive more attention and
> > > > eventually
> > > > > > > get
> > > > > > > > > > > fixed.
> > > > > > > > > > > > > > > Release managers are also responsible for
> looking
> > > for
> > > > > > > > assignees
> > > > > > > > > > for
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > issues. If a case is still not fixed soonish,
> > even
> > > > with
> > > > > > all
> > > > > > > > > these
> > > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > I'm not sure how setting a deadline can help
> > this.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 4. If we disable the respective tests
> > temporarily,
> > > we
> > > > > > also
> > > > > > > > > need a
> > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > > > investigated
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1. As mentioned above, we may consider
> disabling
> > > > such
> > > > > > > tests
> > > > > > > > > iff
> > > > > > > > > > > > > someone
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > > > > > beyond1920@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > > In order to better comply with the rule, it
> is
> > > > > > necessary
> > > > > > > to
> > > > > > > > > > > describe
> > > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > > best practice if encountering test failure
> > which
> > > > seems
> > > > > > > > > > unrelated
> > > > > > > > > > > with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > > How to avoid merging PR with test failures
> and
> > > not
> > > > > > > blocking
> > > > > > > > > > code
> > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > > I tried to think about the possible steps,
> and
> > > > found
> > > > > > > there
> > > > > > > > > are
> > > > > > > > > > > some
> > > > > > > > > > > > > > > > detailed problems that need to be discussed
> in
> > a
> > > > step
> > > > > > > > > further:
> > > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > > 2. Set a deadline to find out the root cause
> > and
> > > > solve
> > > > > > > the
> > > > > > > > > > > failure
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > new created JIRA  because we could not block
> > > other
> > > > > > commit
> > > > > > > > > > merges
> > > > > > > > > > > for
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > > significant
> > > > > > > progress
> > > > > > > > > > when
> > > > > > > > > > > > > > reached
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > > >     There are several situations as follows,
> > > maybe
> > > > > > > > different
> > > > > > > > > > > cases
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > > >     3. not found a good solution, but already
> > > > found the
> > > > > > > > root
> > > > > > > > > > > cause
> > > > > > > > > > > > > > > >     4. found a solution, but it needs more
> time
> > > to
> > > > be
> > > > > > > done.
> > > > > > > > > > > > > > > > 4. If we disable the respective tests
> > > temporarily,
> > > > we
> > > > > > > also
> > > > > > > > > > need a
> > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > > > investigated
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org>
> 于2021年6月23日周三
> > > > > > 下午8:16写道：
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till
> > Rohrmann <
> > > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I would first try to not introduce the
> > > > exception
> > > > > > for
> > > > > > > > > local
> > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > it quite hard for others to verify the
> > build
> > > > and to
> > > > > > > > make
> > > > > > > > > > sure
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > right things were executed. If we see
> that
> > > this
> > > > > > > becomes
> > > > > > > > > an
> > > > > > > > > > > issue
> > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze
> Guo
> > <
> > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1 for appending this to community
> > > > guidelines for
> > > > > > > > > merging
> > > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > > I agree that with this approach
> unstable
> > > > tests
> > > > > > will
> > > > > > > > not
> > > > > > > > > > > block
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > commit merges. However, it might be
> hard
> > to
> > > > > > prevent
> > > > > > > > > > merging
> > > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > > that are related to those tests and
> > should
> > > > have
> > > > > > > been
> > > > > > > > > > passed
> > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > true that this judgment can be made by
> > the
> > > > > > > > committers,
> > > > > > > > > > but
> > > > > > > > > > > no
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > ensure the judgment is always precise
> and
> > > so
> > > > that
> > > > > > > we
> > > > > > > > > have
> > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regarding the unstable tests, how about
> > > > adding
> > > > > > > > another
> > > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > > committers verify it in their local
> > > > environment
> > > > > > and
> > > > > > > > > > > comment in
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > It is a good principle to run all
> tests
> > > > > > > > successfully
> > > > > > > > > > > with any
> > > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > means a lot for project's stability
> and
> > > > > > > > development.
> > > > > > > > > I
> > > > > > > > > > > am big
> > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > > > > > > 于2021年6月22日周二
> > > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > One way to address the problem of
> > > > regularly
> > > > > > > > failing
> > > > > > > > > > > tests
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > > merging of PRs is to disable the
> > > > respective
> > > > > > > tests
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > > course, the failing test then needs
> > to
> > > be
> > > > > > > fixed.
> > > > > > > > > But
> > > > > > > > > > at
> > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > would not block everyone from
> making
> > > > > > progress.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM
> > Arvid
> > > > Heise
> > > > > > <
> > > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I think this is overall a good
> > idea.
> > > > So +1
> > > > > > > from
> > > > > > > > > my
> > > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > > However, I'd like to put a higher
> > > > priority
> > > > > > on
> > > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > particular docker image/artifact
> > > > caches.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM
> > Till
> > > > > > > Rohrmann
> > > > > > > > <
> > > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks for bringing this topic
> to
> > > our
> > > > > > > > attention
> > > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of sense
> and
> > > we
> > > > > > should
> > > > > > > > > > follow
> > > > > > > > > > > it.
> > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > > confidence that our changes are
> > > > working
> > > > > > and
> > > > > > > > it
> > > > > > > > > > > might
> > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > quickly fix build
> instabilities.
> > > > Hence,
> > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12
> AM
> > > > Xintong
> > > > > > > > Song <
> > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > In the past a couple of
> weeks,
> > > I've
> > > > > > > > observed
> > > > > > > > > > > several
> > > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > merged without a green light
> > from
> > > > the
> > > > > > CI
> > > > > > > > > tests,
> > > > > > > > > > > where
> > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > considered *unrelated*. This
> > may
> > > > not
> > > > > > > always
> > > > > > > > > > cause
> > > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > increase the chance of
> breaking
> > > our
> > > > > > code
> > > > > > > > > base.
> > > > > > > > > > In
> > > > > > > > > > > > > fact,
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > me twice in the past few
> weeks
> > > > that I
> > > > > > had
> > > > > > > > to
> > > > > > > > > > > revert a
> > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > > the master branch due to
> this.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I think it would be nicer to
> > > > enforce a
> > > > > > > > > stricter
> > > > > > > > > > > rule,
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The problems of merging PRs
> > with
> > > > > > > > "unrelated"
> > > > > > > > > > test
> > > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > > - It's not always
> > straightforward
> > > > to
> > > > > > tell
> > > > > > > > > > > whether a
> > > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > > - It prevents subsequent test
> > > cases
> > > > > > from
> > > > > > > > > being
> > > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > To make things easier for the
> > > > > > committers,
> > > > > > > > the
> > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > > > > > > contributor's
> > > > > > > > > > > personal
> > > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > > > - The CI tests have been
> > > triggered
> > > > > > > multiple
> > > > > > > > > > > times, on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > each stage has at least
> passed
> > > for
> > > > > > once.
> > > > > > > > > Please
> > > > > > > > > > > also
> > > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > If we all agree on this, I'd
> > > > update the
> > > > > > > > > > community
> > > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Please let me know what do
> you
> > > > think.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


-- 
Best regards!
Rui Li

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by 刘建刚 <li...@gmail.com>.

+1 for the proposal. Since the test time is long and environment may vary,
unstable tests are really annoying for developers. The solution is welcome.

Best
liujiangang

Jingsong Li <ji...@gmail.com> 于2021年6月29日周二 上午10:31写道：

> +1 Thanks Xintong for the update!
>
> Best,
> Jingsong
>
> On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <tr...@apache.org>
> wrote:
>
> > +1, thanks for updating the guidelines Xintong!
> >
> > Cheers,
> > Till
> >
> > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > +1
> > >
> > > Thanks Xintong for drafting this doc.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com>
> wrote:
> > > >
> > > > Thanks Xintong for giving detailed documentation.
> > > >
> > > > The best practice for handling test failure is very detailed, it's a
> > good
> > > > guidelines document with clear action steps.
> > > >
> > > > +1 to Xintong's proposal.
> > > >
> > > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > > >
> > > > > Thanks all for the discussion.
> > > > >
> > > > > Based on the opinions so far, I've drafted the new guidelines [1],
> > as a
> > > > > potential replacement of the original wiki page [2].
> > > > >
> > > > > Hopefully this draft has covered the most opinions discussed and
> > > consensus
> > > > > made in this discussion thread.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > >
> > > > > [2]
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > pnowojski@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > > >
> > > > > > Piotrek
> > > > > >
> > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> > > > > napisał(a):
> > > > > >
> > > > > > > One quick note for clarification. I don't have anything against
> > > builds
> > > > > > > running on your personal Azure account and this is not what I
> > > > > understood
> > > > > > > under "local environment". For me "local environment" means
> that
> > > > > someone
> > > > > > > runs the test locally on his machine and then says that the
> > > > > > > tests have passed locally.
> > > > > > >
> > > > > > > I do agree that there might be a conflict of interests if a PR
> > > author
> > > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > > committers
> > > > > > > which means that every committer will probably first check the
> > > > > respective
> > > > > > > ticket for how often the test failed. Then I guess the next
> step
> > > would
> > > > > be
> > > > > > > to discuss on the ticket whether to disable it or not. And
> > finally,
> > > > > after
> > > > > > > reaching a consensus, it will be disabled. If we see someone
> > > abusing
> > > > > this
> > > > > > > policy, then we can still think about how to guard against it.
> > But,
> > > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > > pull in
> > > > > > the
> > > > > > > release manager to make the final call if this resolves
> concerns.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > pnowojski@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 for the general idea, however I have concerns about a
> couple
> > > of
> > > > > > > details.
> > > > > > > >
> > > > > > > > > I would first try to not introduce the exception for local
> > > builds.
> > > > > > > > > It makes it quite hard for others to verify the build and
> to
> > > make
> > > > > > sure
> > > > > > > > that the right things were executed.
> > > > > > > >
> > > > > > > > I would counter Till's proposal to ignore local green builds.
> > If
> > > > > > > committer
> > > > > > > > is merging and closing a PR, with official azure failure, but
> > > there
> > > > > > was a
> > > > > > > > green build before or in local azure it's IMO enough to leave
> > the
> > > > > > > message:
> > > > > > > >
> > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > Green local build: URL
> > > > > > > >
> > > > > > > > This should address Till's concern about verification.
> > > > > > > >
> > > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > > shouldn't
> > > > > > be
> > > > > > > > the PR author/committer that's disabling a test on his own,
> as
> > > > > that's a
> > > > > > > > conflict of interests*. I have however no problems with
> > disabling
> > > > > test
> > > > > > > > instabilities that were marked as "blockers" though, that
> > should
> > > work
> > > > > > > > pretty well. But the important thing here is to correctly
> judge
> > > > > bumping
> > > > > > > > priorities of test instabilities based on their frequency and
> > > current
> > > > > > > > general health of the system. I believe that release managers
> > > should
> > > > > be
> > > > > > > > playing a big role here in deciding on the guidelines of what
> > > should
> > > > > > be a
> > > > > > > > priority of certain test instabilities.
> > > > > > > >
> > > > > > > > What I mean by that is two example scenarios:
> > > > > > > > 1. if we have a handful of very frequently failing tests and
> a
> > > > > handful
> > > > > > of
> > > > > > > > very rarely failing tests (like one reported failure and no
> > > another
> > > > > > > > occurrence in many months, and let's even say that the
> failure
> > > looks
> > > > > > like
> > > > > > > > infrastructure/network timeout), we should focus on the
> > > frequently
> > > > > > > failing
> > > > > > > > ones, and probably we are safe to ignore for the time being
> the
> > > rare
> > > > > > > issues
> > > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > > 2. If we have tons of rarely failing test instabilities, we
> > > should
> > > > > > > probably
> > > > > > > > start addressing them as blockers.
> > > > > > > >
> > > > > > > > I'm using my own conscious and my best judgement when I'm
> > > > > > > > bumping/decreasing priorities of test instabilities (and
> bugs),
> > > but
> > > > > as
> > > > > > > > individual committer I don't have the full picture. As I
> wrote
> > > > > above, I
> > > > > > > > think release managers are in a much better position to keep
> > > > > adjusting
> > > > > > > > those kind of guidelines.
> > > > > > > >
> > > > > > > > Best, Piotrek
> > > > > > > >
> > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com>
> napisał(a):
> > > > > > > >
> > > > > > > > > +1 for Xintong's proposal.
> > > > > > > > >
> > > > > > > > > For me, resolving problems directly (fixing the
> > infrastructure
> > > > > issue,
> > > > > > > > > disabling unstable tests and creating blocker JIRAs to
> track
> > > the
> > > > > fix
> > > > > > > and
> > > > > > > > > re-enable them asap, etc.) is (in most cases) better than
> > > working
> > > > > > > around
> > > > > > > > > them (verify locally, manually check and judge the failure
> as
> > > > > > > > "unrelated",
> > > > > > > > > etc.), and I believe the proposal could help us pushing
> those
> > > more
> > > > > > > "real"
> > > > > > > > > solutions forward.
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Yu
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <
> karmagyz@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Creating a blocker issue for the manually disabled tests
> > > sounds
> > > > > > good
> > > > > > > to
> > > > > > > > > me.
> > > > > > > > > >
> > > > > > > > > > Minor: I'm still a bit worried about the commits merged
> > > before we
> > > > > > fix
> > > > > > > > > > the unstable tests can also break those tests. Instead of
> > > letting
> > > > > > the
> > > > > > > > > > assigners keep a look at all potentially related commits,
> > > they
> > > > > can
> > > > > > > > > > maintain a branch that is periodically synced with the
> > master
> > > > > > branch
> > > > > > > > > > while enabling the unstable test. So that they can catch
> > the
> > > > > > breaking
> > > > > > > > > > changes asap.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > > trohrmann@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > I like the idea of creating a blocker issue for a
> > disabled
> > > > > test.
> > > > > > > This
> > > > > > > > > > will
> > > > > > > > > > > force us to resolve it in a timely manner and it won't
> > fall
> > > > > > through
> > > > > > > > the
> > > > > > > > > > > cracks.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > > jingsonglee0@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > >
> > > > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > > > >
> > > > > > > > > > > > I think unstable cases can be divided into these
> types:
> > > > > > > > > > > >
> > > > > > > > > > > > - Force majeure: For example, network timeout, sudden
> > > > > > > environmental
> > > > > > > > > > > > collapse, they are accidental and can always be
> solved
> > by
> > > > > > > > triggering
> > > > > > > > > > azure
> > > > > > > > > > > > again. Committers should wait for the next green
> azure.
> > > > > > > > > > > >
> > > > > > > > > > > > - Obvious mistakes: For example, some errors caused
> by
> > > > > obvious
> > > > > > > > > reasons
> > > > > > > > > > may
> > > > > > > > > > > > be repaired quickly. At this time, do we need to
> wait,
> > > or not
> > > > > > > wait
> > > > > > > > > and
> > > > > > > > > > just
> > > > > > > > > > > > ignore?
> > > > > > > > > > > >
> > > > > > > > > > > > - Difficult questions: These problems are very
> > difficult
> > > to
> > > > > > find.
> > > > > > > > > There
> > > > > > > > > > > > will be no solution for a while and a half. We don't
> > even
> > > > > know
> > > > > > > the
> > > > > > > > > > reason.
> > > > > > > > > > > > At this time, we should ignore it. (Maybe it's judged
> > by
> > > the
> > > > > > > author
> > > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > case. But what about the old case whose author can't
> be
> > > > > found?)
> > > > > > > > > > > >
> > > > > > > > > > > > So, the ignored cases should be the block of the next
> > > release
> > > > > > > until
> > > > > > > > > the
> > > > > > > > > > > > reason is found or the case is fixed?  We need to
> > ensure
> > > that
> > > > > > > > someone
> > > > > > > > > > will
> > > > > > > > > > > > take care of these cases, because there is no
> deepening
> > > of
> > > > > > failed
> > > > > > > > > > tests, no
> > > > > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > > > > >
> > > > > > > > > > > > I think this guideline should consider these
> > situations,
> > > and
> > > > > > show
> > > > > > > > how
> > > > > > > > > > to
> > > > > > > > > > > > solve them.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Jingsong
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > > imjark@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks to Xintong for bringing up this topic, I'm
> +1
> > in
> > > > > > > general.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, I think it's still not very clear how we
> > > address
> > > > > the
> > > > > > > > > > unstable
> > > > > > > > > > > > > tests.
> > > > > > > > > > > > > I think this is a very important part of this new
> > > > > guideline.
> > > > > > > > > > > > >
> > > > > > > > > > > > > According to the discussion above, if some tests
> are
> > > > > > unstable,
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > > 1) Is the instability judged by the committer
> > > themselves or
> > > > > > by
> > > > > > > > some
> > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > 2) Should we log the disable commit in the
> > > corresponding
> > > > > > issue
> > > > > > > > and
> > > > > > > > > > > > increase
> > > > > > > > > > > > > the priority?
> > > > > > > > > > > > > 3) What if nobody looks into this issue and this
> > > becomes
> > > > > some
> > > > > > > > > > potential
> > > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > > 4) If no person is actively working on the issue,
> who
> > > > > should
> > > > > > > > > > re-enable
> > > > > > > > > > > > it?
> > > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Jark
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm also not convinced by the idea of having an
> > > exception
> > > > > > for
> > > > > > > > > local
> > > > > > > > > > > > > builds.
> > > > > > > > > > > > > > We need to execute the entire build (or at least
> > the
> > > > > > failing
> > > > > > > > > stage)
> > > > > > > > > > > > > > locally, to make sure subsequent test cases
> > > prevented by
> > > > > > the
> > > > > > > > > > failure
> > > > > > > > > > > > one
> > > > > > > > > > > > > > are all executed. In that case, it's probably
> > easier
> > > to
> > > > > > rerun
> > > > > > > > the
> > > > > > > > > > build
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Concerning disabling unstable test cases that
> > > regularly
> > > > > > block
> > > > > > > > PRs
> > > > > > > > > > from
> > > > > > > > > > > > > > merging, maybe we can say that such cases can
> only
> > be
> > > > > > > disabled
> > > > > > > > > when
> > > > > > > > > > > > > someone
> > > > > > > > > > > > > > is actively looking into it, likely the person
> who
> > > > > disabled
> > > > > > > the
> > > > > > > > > > case.
> > > > > > > > > > > > If
> > > > > > > > > > > > > > this person is no longer actively working on it,
> > > he/she
> > > > > > > should
> > > > > > > > > > enable
> > > > > > > > > > > > the
> > > > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Jing
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 to provide guidelines on handling test
> failures.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 on this. Currently, the release managers are
> > > > > monitoring
> > > > > > > the
> > > > > > > > ci
> > > > > > > > > > and
> > > > > > > > > > > > > cron
> > > > > > > > > > > > > > build instabilities and reporting them on JIRA.
> We
> > > should
> > > > > > > also
> > > > > > > > > > > > encourage
> > > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. Set a deadline to find out the root cause and
> > > solve
> > > > > the
> > > > > > > > > failure
> > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new created JIRA  because we could not block
> > other
> > > > > commit
> > > > > > > > > merges
> > > > > > > > > > for
> > > > > > > > > > > > a
> > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> significant
> > > > > progress
> > > > > > > > when
> > > > > > > > > > > > reached
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm not sure about these two. It feels a bit
> > against
> > > the
> > > > > > > > > voluntary
> > > > > > > > > > > > nature
> > > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IMHO, frequent instabilities are more likely to
> be
> > > > > upgraded
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > critical
> > > > > > > > > > > > > > / blocker priority, receive more attention and
> > > eventually
> > > > > > get
> > > > > > > > > > fixed.
> > > > > > > > > > > > > > Release managers are also responsible for looking
> > for
> > > > > > > assignees
> > > > > > > > > for
> > > > > > > > > > > > such
> > > > > > > > > > > > > > issues. If a case is still not fixed soonish,
> even
> > > with
> > > > > all
> > > > > > > > these
> > > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > I'm not sure how setting a deadline can help
> this.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. If we disable the respective tests
> temporarily,
> > we
> > > > > also
> > > > > > > > need a
> > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > > investigated
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1. As mentioned above, we may consider disabling
> > > such
> > > > > > tests
> > > > > > > > iff
> > > > > > > > > > > > someone
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > > > > beyond1920@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > > In order to better comply with the rule, it is
> > > > > necessary
> > > > > > to
> > > > > > > > > > describe
> > > > > > > > > > > > > > what's
> > > > > > > > > > > > > > > best practice if encountering test failure
> which
> > > seems
> > > > > > > > > unrelated
> > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > > How to avoid merging PR with test failures and
> > not
> > > > > > blocking
> > > > > > > > > code
> > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > > I tried to think about the possible steps, and
> > > found
> > > > > > there
> > > > > > > > are
> > > > > > > > > > some
> > > > > > > > > > > > > > > detailed problems that need to be discussed in
> a
> > > step
> > > > > > > > further:
> > > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > > 2. Set a deadline to find out the root cause
> and
> > > solve
> > > > > > the
> > > > > > > > > > failure
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new created JIRA  because we could not block
> > other
> > > > > commit
> > > > > > > > > merges
> > > > > > > > > > for
> > > > > > > > > > > > a
> > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> > significant
> > > > > > progress
> > > > > > > > > when
> > > > > > > > > > > > > reached
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > > >     There are several situations as follows,
> > maybe
> > > > > > > different
> > > > > > > > > > cases
> > > > > > > > > > > > need
> > > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > > >     3. not found a good solution, but already
> > > found the
> > > > > > > root
> > > > > > > > > > cause
> > > > > > > > > > > > > > >     4. found a solution, but it needs more time
> > to
> > > be
> > > > > > done.
> > > > > > > > > > > > > > > 4. If we disable the respective tests
> > temporarily,
> > > we
> > > > > > also
> > > > > > > > > need a
> > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > > investigated
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三
> > > > > 下午8:16写道：
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till
> Rohrmann <
> > > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I would first try to not introduce the
> > > exception
> > > > > for
> > > > > > > > local
> > > > > > > > > > > > builds.
> > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > it quite hard for others to verify the
> build
> > > and to
> > > > > > > make
> > > > > > > > > sure
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > right things were executed. If we see that
> > this
> > > > > > becomes
> > > > > > > > an
> > > > > > > > > > issue
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo
> <
> > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 for appending this to community
> > > guidelines for
> > > > > > > > merging
> > > > > > > > > > PRs.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > > I agree that with this approach unstable
> > > tests
> > > > > will
> > > > > > > not
> > > > > > > > > > block
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > commit merges. However, it might be hard
> to
> > > > > prevent
> > > > > > > > > merging
> > > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > > that are related to those tests and
> should
> > > have
> > > > > > been
> > > > > > > > > passed
> > > > > > > > > > > > them.
> > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > true that this judgment can be made by
> the
> > > > > > > committers,
> > > > > > > > > but
> > > > > > > > > > no
> > > > > > > > > > > > one
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > ensure the judgment is always precise and
> > so
> > > that
> > > > > > we
> > > > > > > > have
> > > > > > > > > > this
> > > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding the unstable tests, how about
> > > adding
> > > > > > > another
> > > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > > committers verify it in their local
> > > environment
> > > > > and
> > > > > > > > > > comment in
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It is a good principle to run all tests
> > > > > > > successfully
> > > > > > > > > > with any
> > > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > means a lot for project's stability and
> > > > > > > development.
> > > > > > > > I
> > > > > > > > > > am big
> > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > > > > > 于2021年6月22日周二
> > > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > One way to address the problem of
> > > regularly
> > > > > > > failing
> > > > > > > > > > tests
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > > merging of PRs is to disable the
> > > respective
> > > > > > tests
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > > course, the failing test then needs
> to
> > be
> > > > > > fixed.
> > > > > > > > But
> > > > > > > > > at
> > > > > > > > > > > > least
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > would not block everyone from making
> > > > > progress.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM
> Arvid
> > > Heise
> > > > > <
> > > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I think this is overall a good
> idea.
> > > So +1
> > > > > > from
> > > > > > > > my
> > > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > > However, I'd like to put a higher
> > > priority
> > > > > on
> > > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > particular docker image/artifact
> > > caches.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM
> Till
> > > > > > Rohrmann
> > > > > > > <
> > > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for bringing this topic to
> > our
> > > > > > > attention
> > > > > > > > > > > > Xintong.
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > proposal makes a lot of sense and
> > we
> > > > > should
> > > > > > > > > follow
> > > > > > > > > > it.
> > > > > > > > > > > > It
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > > confidence that our changes are
> > > working
> > > > > and
> > > > > > > it
> > > > > > > > > > might
> > > > > > > > > > > > be a
> > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > quickly fix build instabilities.
> > > Hence,
> > > > > +1.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM
> > > Xintong
> > > > > > > Song <
> > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > In the past a couple of weeks,
> > I've
> > > > > > > observed
> > > > > > > > > > several
> > > > > > > > > > > > > > times
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > merged without a green light
> from
> > > the
> > > > > CI
> > > > > > > > tests,
> > > > > > > > > > where
> > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > considered *unrelated*. This
> may
> > > not
> > > > > > always
> > > > > > > > > cause
> > > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > increase the chance of breaking
> > our
> > > > > code
> > > > > > > > base.
> > > > > > > > > In
> > > > > > > > > > > > fact,
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > me twice in the past few weeks
> > > that I
> > > > > had
> > > > > > > to
> > > > > > > > > > revert a
> > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I think it would be nicer to
> > > enforce a
> > > > > > > > stricter
> > > > > > > > > > rule,
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > The problems of merging PRs
> with
> > > > > > > "unrelated"
> > > > > > > > > test
> > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > > - It's not always
> straightforward
> > > to
> > > > > tell
> > > > > > > > > > whether a
> > > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > > - It prevents subsequent test
> > cases
> > > > > from
> > > > > > > > being
> > > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > To make things easier for the
> > > > > committers,
> > > > > > > the
> > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > > > > > contributor's
> > > > > > > > > > personal
> > > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > > - The CI tests have been
> > triggered
> > > > > > multiple
> > > > > > > > > > times, on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > each stage has at least passed
> > for
> > > > > once.
> > > > > > > > Please
> > > > > > > > > > also
> > > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > If we all agree on this, I'd
> > > update the
> > > > > > > > > community
> > > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Please let me know what do you
> > > think.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Jingsong Li <ji...@gmail.com>.

+1 Thanks Xintong for the update!

Best,
Jingsong

On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <tr...@apache.org> wrote:

> +1, thanks for updating the guidelines Xintong!
>
> Cheers,
> Till
>
> On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > +1
> >
> > Thanks Xintong for drafting this doc.
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com> wrote:
> > >
> > > Thanks Xintong for giving detailed documentation.
> > >
> > > The best practice for handling test failure is very detailed, it's a
> good
> > > guidelines document with clear action steps.
> > >
> > > +1 to Xintong's proposal.
> > >
> > > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> > >
> > > > Thanks all for the discussion.
> > > >
> > > > Based on the opinions so far, I've drafted the new guidelines [1],
> as a
> > > > potential replacement of the original wiki page [2].
> > > >
> > > > Hopefully this draft has covered the most opinions discussed and
> > consensus
> > > > made in this discussion thread.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > >
> > > > [2]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > >
> > > >
> > > >
> > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> pnowojski@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > >
> > > > > Piotrek
> > > > >
> > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> > > > napisał(a):
> > > > >
> > > > > > One quick note for clarification. I don't have anything against
> > builds
> > > > > > running on your personal Azure account and this is not what I
> > > > understood
> > > > > > under "local environment". For me "local environment" means that
> > > > someone
> > > > > > runs the test locally on his machine and then says that the
> > > > > > tests have passed locally.
> > > > > >
> > > > > > I do agree that there might be a conflict of interests if a PR
> > author
> > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > committers
> > > > > > which means that every committer will probably first check the
> > > > respective
> > > > > > ticket for how often the test failed. Then I guess the next step
> > would
> > > > be
> > > > > > to discuss on the ticket whether to disable it or not. And
> finally,
> > > > after
> > > > > > reaching a consensus, it will be disabled. If we see someone
> > abusing
> > > > this
> > > > > > policy, then we can still think about how to guard against it.
> But,
> > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > pull in
> > > > > the
> > > > > > release manager to make the final call if this resolves concerns.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > pnowojski@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for the general idea, however I have concerns about a couple
> > of
> > > > > > details.
> > > > > > >
> > > > > > > > I would first try to not introduce the exception for local
> > builds.
> > > > > > > > It makes it quite hard for others to verify the build and to
> > make
> > > > > sure
> > > > > > > that the right things were executed.
> > > > > > >
> > > > > > > I would counter Till's proposal to ignore local green builds.
> If
> > > > > > committer
> > > > > > > is merging and closing a PR, with official azure failure, but
> > there
> > > > > was a
> > > > > > > green build before or in local azure it's IMO enough to leave
> the
> > > > > > message:
> > > > > > >
> > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > Green local build: URL
> > > > > > >
> > > > > > > This should address Till's concern about verification.
> > > > > > >
> > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > shouldn't
> > > > > be
> > > > > > > the PR author/committer that's disabling a test on his own, as
> > > > that's a
> > > > > > > conflict of interests*. I have however no problems with
> disabling
> > > > test
> > > > > > > instabilities that were marked as "blockers" though, that
> should
> > work
> > > > > > > pretty well. But the important thing here is to correctly judge
> > > > bumping
> > > > > > > priorities of test instabilities based on their frequency and
> > current
> > > > > > > general health of the system. I believe that release managers
> > should
> > > > be
> > > > > > > playing a big role here in deciding on the guidelines of what
> > should
> > > > > be a
> > > > > > > priority of certain test instabilities.
> > > > > > >
> > > > > > > What I mean by that is two example scenarios:
> > > > > > > 1. if we have a handful of very frequently failing tests and a
> > > > handful
> > > > > of
> > > > > > > very rarely failing tests (like one reported failure and no
> > another
> > > > > > > occurrence in many months, and let's even say that the failure
> > looks
> > > > > like
> > > > > > > infrastructure/network timeout), we should focus on the
> > frequently
> > > > > > failing
> > > > > > > ones, and probably we are safe to ignore for the time being the
> > rare
> > > > > > issues
> > > > > > > - at least until we deal with the most pressing ones.
> > > > > > > 2. If we have tons of rarely failing test instabilities, we
> > should
> > > > > > probably
> > > > > > > start addressing them as blockers.
> > > > > > >
> > > > > > > I'm using my own conscious and my best judgement when I'm
> > > > > > > bumping/decreasing priorities of test instabilities (and bugs),
> > but
> > > > as
> > > > > > > individual committer I don't have the full picture. As I wrote
> > > > above, I
> > > > > > > think release managers are in a much better position to keep
> > > > adjusting
> > > > > > > those kind of guidelines.
> > > > > > >
> > > > > > > Best, Piotrek
> > > > > > >
> > > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> > > > > > >
> > > > > > > > +1 for Xintong's proposal.
> > > > > > > >
> > > > > > > > For me, resolving problems directly (fixing the
> infrastructure
> > > > issue,
> > > > > > > > disabling unstable tests and creating blocker JIRAs to track
> > the
> > > > fix
> > > > > > and
> > > > > > > > re-enable them asap, etc.) is (in most cases) better than
> > working
> > > > > > around
> > > > > > > > them (verify locally, manually check and judge the failure as
> > > > > > > "unrelated",
> > > > > > > > etc.), and I believe the proposal could help us pushing those
> > more
> > > > > > "real"
> > > > > > > > solutions forward.
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Yu
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <karmagyz@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Creating a blocker issue for the manually disabled tests
> > sounds
> > > > > good
> > > > > > to
> > > > > > > > me.
> > > > > > > > >
> > > > > > > > > Minor: I'm still a bit worried about the commits merged
> > before we
> > > > > fix
> > > > > > > > > the unstable tests can also break those tests. Instead of
> > letting
> > > > > the
> > > > > > > > > assigners keep a look at all potentially related commits,
> > they
> > > > can
> > > > > > > > > maintain a branch that is periodically synced with the
> master
> > > > > branch
> > > > > > > > > while enabling the unstable test. So that they can catch
> the
> > > > > breaking
> > > > > > > > > changes asap.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > > trohrmann@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I like the idea of creating a blocker issue for a
> disabled
> > > > test.
> > > > > > This
> > > > > > > > > will
> > > > > > > > > > force us to resolve it in a timely manner and it won't
> fall
> > > > > through
> > > > > > > the
> > > > > > > > > > cracks.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > > jingsonglee0@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > >
> > > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > > >
> > > > > > > > > > > I think unstable cases can be divided into these types:
> > > > > > > > > > >
> > > > > > > > > > > - Force majeure: For example, network timeout, sudden
> > > > > > environmental
> > > > > > > > > > > collapse, they are accidental and can always be solved
> by
> > > > > > > triggering
> > > > > > > > > azure
> > > > > > > > > > > again. Committers should wait for the next green azure.
> > > > > > > > > > >
> > > > > > > > > > > - Obvious mistakes: For example, some errors caused by
> > > > obvious
> > > > > > > > reasons
> > > > > > > > > may
> > > > > > > > > > > be repaired quickly. At this time, do we need to wait,
> > or not
> > > > > > wait
> > > > > > > > and
> > > > > > > > > just
> > > > > > > > > > > ignore?
> > > > > > > > > > >
> > > > > > > > > > > - Difficult questions: These problems are very
> difficult
> > to
> > > > > find.
> > > > > > > > There
> > > > > > > > > > > will be no solution for a while and a half. We don't
> even
> > > > know
> > > > > > the
> > > > > > > > > reason.
> > > > > > > > > > > At this time, we should ignore it. (Maybe it's judged
> by
> > the
> > > > > > author
> > > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > case. But what about the old case whose author can't be
> > > > found?)
> > > > > > > > > > >
> > > > > > > > > > > So, the ignored cases should be the block of the next
> > release
> > > > > > until
> > > > > > > > the
> > > > > > > > > > > reason is found or the case is fixed?  We need to
> ensure
> > that
> > > > > > > someone
> > > > > > > > > will
> > > > > > > > > > > take care of these cases, because there is no deepening
> > of
> > > > > failed
> > > > > > > > > tests, no
> > > > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > > > >
> > > > > > > > > > > I think this guideline should consider these
> situations,
> > and
> > > > > show
> > > > > > > how
> > > > > > > > > to
> > > > > > > > > > > solve them.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Jingsong
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> > imjark@gmail.com>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1
> in
> > > > > > general.
> > > > > > > > > > > >
> > > > > > > > > > > > However, I think it's still not very clear how we
> > address
> > > > the
> > > > > > > > > unstable
> > > > > > > > > > > > tests.
> > > > > > > > > > > > I think this is a very important part of this new
> > > > guideline.
> > > > > > > > > > > >
> > > > > > > > > > > > According to the discussion above, if some tests are
> > > > > unstable,
> > > > > > we
> > > > > > > > can
> > > > > > > > > > > > manually disable it.
> > > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > > 1) Is the instability judged by the committer
> > themselves or
> > > > > by
> > > > > > > some
> > > > > > > > > > > > metrics?
> > > > > > > > > > > > 2) Should we log the disable commit in the
> > corresponding
> > > > > issue
> > > > > > > and
> > > > > > > > > > > increase
> > > > > > > > > > > > the priority?
> > > > > > > > > > > > 3) What if nobody looks into this issue and this
> > becomes
> > > > some
> > > > > > > > > potential
> > > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > > 4) If no person is actively working on the issue, who
> > > > should
> > > > > > > > > re-enable
> > > > > > > > > > > it?
> > > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Jark
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm also not convinced by the idea of having an
> > exception
> > > > > for
> > > > > > > > local
> > > > > > > > > > > > builds.
> > > > > > > > > > > > > We need to execute the entire build (or at least
> the
> > > > > failing
> > > > > > > > stage)
> > > > > > > > > > > > > locally, to make sure subsequent test cases
> > prevented by
> > > > > the
> > > > > > > > > failure
> > > > > > > > > > > one
> > > > > > > > > > > > > are all executed. In that case, it's probably
> easier
> > to
> > > > > rerun
> > > > > > > the
> > > > > > > > > build
> > > > > > > > > > > > on
> > > > > > > > > > > > > azure than locally.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Concerning disabling unstable test cases that
> > regularly
> > > > > block
> > > > > > > PRs
> > > > > > > > > from
> > > > > > > > > > > > > merging, maybe we can say that such cases can only
> be
> > > > > > disabled
> > > > > > > > when
> > > > > > > > > > > > someone
> > > > > > > > > > > > > is actively looking into it, likely the person who
> > > > disabled
> > > > > > the
> > > > > > > > > case.
> > > > > > > > > > > If
> > > > > > > > > > > > > this person is no longer actively working on it,
> > he/she
> > > > > > should
> > > > > > > > > enable
> > > > > > > > > > > the
> > > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Jing
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > +1 on this. Currently, the release managers are
> > > > monitoring
> > > > > > the
> > > > > > > ci
> > > > > > > > > and
> > > > > > > > > > > > cron
> > > > > > > > > > > > > build instabilities and reporting them on JIRA. We
> > should
> > > > > > also
> > > > > > > > > > > encourage
> > > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. Set a deadline to find out the root cause and
> > solve
> > > > the
> > > > > > > > failure
> > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > new created JIRA  because we could not block
> other
> > > > commit
> > > > > > > > merges
> > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > long
> > > > > > > > > > > > > > time
> > > > > > > > > > > > > >
> > > > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > > > progress
> > > > > > > when
> > > > > > > > > > > reached
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not sure about these two. It feels a bit
> against
> > the
> > > > > > > > voluntary
> > > > > > > > > > > nature
> > > > > > > > > > > > > of open source projects.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMHO, frequent instabilities are more likely to be
> > > > upgraded
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > critical
> > > > > > > > > > > > > / blocker priority, receive more attention and
> > eventually
> > > > > get
> > > > > > > > > fixed.
> > > > > > > > > > > > > Release managers are also responsible for looking
> for
> > > > > > assignees
> > > > > > > > for
> > > > > > > > > > > such
> > > > > > > > > > > > > issues. If a case is still not fixed soonish, even
> > with
> > > > all
> > > > > > > these
> > > > > > > > > > > > efforts,
> > > > > > > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4. If we disable the respective tests temporarily,
> we
> > > > also
> > > > > > > need a
> > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > investigated
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > +1. As mentioned above, we may consider disabling
> > such
> > > > > tests
> > > > > > > iff
> > > > > > > > > > > someone
> > > > > > > > > > > > is
> > > > > > > > > > > > > actively working on it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > > > beyond1920@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > > In order to better comply with the rule, it is
> > > > necessary
> > > > > to
> > > > > > > > > describe
> > > > > > > > > > > > > what's
> > > > > > > > > > > > > > best practice if encountering test failure which
> > seems
> > > > > > > > unrelated
> > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > > How to avoid merging PR with test failures and
> not
> > > > > blocking
> > > > > > > > code
> > > > > > > > > > > > merging
> > > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > > I tried to think about the possible steps, and
> > found
> > > > > there
> > > > > > > are
> > > > > > > > > some
> > > > > > > > > > > > > > detailed problems that need to be discussed in a
> > step
> > > > > > > further:
> > > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > > 2. Set a deadline to find out the root cause and
> > solve
> > > > > the
> > > > > > > > > failure
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > new created JIRA  because we could not block
> other
> > > > commit
> > > > > > > > merges
> > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > long
> > > > > > > > > > > > > > time
> > > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > > 3. What to do if the JIRA has not made
> significant
> > > > > progress
> > > > > > > > when
> > > > > > > > > > > > reached
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > > >     There are several situations as follows,
> maybe
> > > > > > different
> > > > > > > > > cases
> > > > > > > > > > > need
> > > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > > >     3. not found a good solution, but already
> > found the
> > > > > > root
> > > > > > > > > cause
> > > > > > > > > > > > > >     4. found a solution, but it needs more time
> to
> > be
> > > > > done.
> > > > > > > > > > > > > > 4. If we disable the respective tests
> temporarily,
> > we
> > > > > also
> > > > > > > > need a
> > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > to ensure the issue would be continued to be
> > > > investigated
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三
> > > > 下午8:16写道：
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I would first try to not introduce the
> > exception
> > > > for
> > > > > > > local
> > > > > > > > > > > builds.
> > > > > > > > > > > > It
> > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > it quite hard for others to verify the build
> > and to
> > > > > > make
> > > > > > > > sure
> > > > > > > > > > > that
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > right things were executed. If we see that
> this
> > > > > becomes
> > > > > > > an
> > > > > > > > > issue
> > > > > > > > > > > > then
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 for appending this to community
> > guidelines for
> > > > > > > merging
> > > > > > > > > PRs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > > I agree that with this approach unstable
> > tests
> > > > will
> > > > > > not
> > > > > > > > > block
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > commit merges. However, it might be hard to
> > > > prevent
> > > > > > > > merging
> > > > > > > > > > > > commits
> > > > > > > > > > > > > > > > > that are related to those tests and should
> > have
> > > > > been
> > > > > > > > passed
> > > > > > > > > > > them.
> > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > true that this judgment can be made by the
> > > > > > committers,
> > > > > > > > but
> > > > > > > > > no
> > > > > > > > > > > one
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > ensure the judgment is always precise and
> so
> > that
> > > > > we
> > > > > > > have
> > > > > > > > > this
> > > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regarding the unstable tests, how about
> > adding
> > > > > > another
> > > > > > > > > > > exception:
> > > > > > > > > > > > > > > > > committers verify it in their local
> > environment
> > > > and
> > > > > > > > > comment in
> > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It is a good principle to run all tests
> > > > > > successfully
> > > > > > > > > with any
> > > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > means a lot for project's stability and
> > > > > > development.
> > > > > > > I
> > > > > > > > > am big
> > > > > > > > > > > > +1
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > > > > 于2021年6月22日周二
> > > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > One way to address the problem of
> > regularly
> > > > > > failing
> > > > > > > > > tests
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > > merging of PRs is to disable the
> > respective
> > > > > tests
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > time
> > > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > > course, the failing test then needs to
> be
> > > > > fixed.
> > > > > > > But
> > > > > > > > at
> > > > > > > > > > > least
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > would not block everyone from making
> > > > progress.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid
> > Heise
> > > > <
> > > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I think this is overall a good idea.
> > So +1
> > > > > from
> > > > > > > my
> > > > > > > > > side.
> > > > > > > > > > > > > > > > > > > > However, I'd like to put a higher
> > priority
> > > > on
> > > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > particular docker image/artifact
> > caches.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till
> > > > > Rohrmann
> > > > > > <
> > > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for bringing this topic to
> our
> > > > > > attention
> > > > > > > > > > > Xintong.
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > proposal makes a lot of sense and
> we
> > > > should
> > > > > > > > follow
> > > > > > > > > it.
> > > > > > > > > > > It
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > > confidence that our changes are
> > working
> > > > and
> > > > > > it
> > > > > > > > > might
> > > > > > > > > > > be a
> > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > quickly fix build instabilities.
> > Hence,
> > > > +1.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM
> > Xintong
> > > > > > Song <
> > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > In the past a couple of weeks,
> I've
> > > > > > observed
> > > > > > > > > several
> > > > > > > > > > > > > times
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > merged without a green light from
> > the
> > > > CI
> > > > > > > tests,
> > > > > > > > > where
> > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > considered *unrelated*. This may
> > not
> > > > > always
> > > > > > > > cause
> > > > > > > > > > > > > problems,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > increase the chance of breaking
> our
> > > > code
> > > > > > > base.
> > > > > > > > In
> > > > > > > > > > > fact,
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > me twice in the past few weeks
> > that I
> > > > had
> > > > > > to
> > > > > > > > > revert a
> > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I think it would be nicer to
> > enforce a
> > > > > > > stricter
> > > > > > > > > rule,
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > The problems of merging PRs with
> > > > > > "unrelated"
> > > > > > > > test
> > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > > - It's not always straightforward
> > to
> > > > tell
> > > > > > > > > whether a
> > > > > > > > > > > > test
> > > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > > - It prevents subsequent test
> cases
> > > > from
> > > > > > > being
> > > > > > > > > > > > executed,
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > To make things easier for the
> > > > committers,
> > > > > > the
> > > > > > > > > > > following
> > > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > > > > contributor's
> > > > > > > > > personal
> > > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > > - The CI tests have been
> triggered
> > > > > multiple
> > > > > > > > > times, on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > each stage has at least passed
> for
> > > > once.
> > > > > > > Please
> > > > > > > > > also
> > > > > > > > > > > > > > comment
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > If we all agree on this, I'd
> > update the
> > > > > > > > community
> > > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Please let me know what do you
> > think.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


-- 
Best, Jingsong Lee

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

+1, thanks for updating the guidelines Xintong!

Cheers,
Till

On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <ka...@gmail.com> wrote:

> +1
>
> Thanks Xintong for drafting this doc.
>
> Best,
> Yangze Guo
>
> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com> wrote:
> >
> > Thanks Xintong for giving detailed documentation.
> >
> > The best practice for handling test failure is very detailed, it's a good
> > guidelines document with clear action steps.
> >
> > +1 to Xintong's proposal.
> >
> > Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
> >
> > > Thanks all for the discussion.
> > >
> > > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > > potential replacement of the original wiki page [2].
> > >
> > > Hopefully this draft has covered the most opinions discussed and
> consensus
> > > made in this discussion thread.
> > >
> > > Looking forward to your feedback.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > >
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > >
> > >
> > >
> > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > > > Thanks for the clarification Till. +1 for what you have written.
> > > >
> > > > Piotrek
> > > >
> > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> > > napisał(a):
> > > >
> > > > > One quick note for clarification. I don't have anything against
> builds
> > > > > running on your personal Azure account and this is not what I
> > > understood
> > > > > under "local environment". For me "local environment" means that
> > > someone
> > > > > runs the test locally on his machine and then says that the
> > > > > tests have passed locally.
> > > > >
> > > > > I do agree that there might be a conflict of interests if a PR
> author
> > > > > disables tests. Here I would argue that we don't have malignant
> > > > committers
> > > > > which means that every committer will probably first check the
> > > respective
> > > > > ticket for how often the test failed. Then I guess the next step
> would
> > > be
> > > > > to discuss on the ticket whether to disable it or not. And finally,
> > > after
> > > > > reaching a consensus, it will be disabled. If we see someone
> abusing
> > > this
> > > > > policy, then we can still think about how to guard against it. But,
> > > > > honestly, I have very rarely seen such a case. I am also ok to
> pull in
> > > > the
> > > > > release manager to make the final call if this resolves concerns.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> pnowojski@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 for the general idea, however I have concerns about a couple
> of
> > > > > details.
> > > > > >
> > > > > > > I would first try to not introduce the exception for local
> builds.
> > > > > > > It makes it quite hard for others to verify the build and to
> make
> > > > sure
> > > > > > that the right things were executed.
> > > > > >
> > > > > > I would counter Till's proposal to ignore local green builds. If
> > > > > committer
> > > > > > is merging and closing a PR, with official azure failure, but
> there
> > > > was a
> > > > > > green build before or in local azure it's IMO enough to leave the
> > > > > message:
> > > > > >
> > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > Green local build: URL
> > > > > >
> > > > > > This should address Till's concern about verification.
> > > > > >
> > > > > > On the other hand I have concerns about disabling tests.* It
> > > shouldn't
> > > > be
> > > > > > the PR author/committer that's disabling a test on his own, as
> > > that's a
> > > > > > conflict of interests*. I have however no problems with disabling
> > > test
> > > > > > instabilities that were marked as "blockers" though, that should
> work
> > > > > > pretty well. But the important thing here is to correctly judge
> > > bumping
> > > > > > priorities of test instabilities based on their frequency and
> current
> > > > > > general health of the system. I believe that release managers
> should
> > > be
> > > > > > playing a big role here in deciding on the guidelines of what
> should
> > > > be a
> > > > > > priority of certain test instabilities.
> > > > > >
> > > > > > What I mean by that is two example scenarios:
> > > > > > 1. if we have a handful of very frequently failing tests and a
> > > handful
> > > > of
> > > > > > very rarely failing tests (like one reported failure and no
> another
> > > > > > occurrence in many months, and let's even say that the failure
> looks
> > > > like
> > > > > > infrastructure/network timeout), we should focus on the
> frequently
> > > > > failing
> > > > > > ones, and probably we are safe to ignore for the time being the
> rare
> > > > > issues
> > > > > > - at least until we deal with the most pressing ones.
> > > > > > 2. If we have tons of rarely failing test instabilities, we
> should
> > > > > probably
> > > > > > start addressing them as blockers.
> > > > > >
> > > > > > I'm using my own conscious and my best judgement when I'm
> > > > > > bumping/decreasing priorities of test instabilities (and bugs),
> but
> > > as
> > > > > > individual committer I don't have the full picture. As I wrote
> > > above, I
> > > > > > think release managers are in a much better position to keep
> > > adjusting
> > > > > > those kind of guidelines.
> > > > > >
> > > > > > Best, Piotrek
> > > > > >
> > > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> > > > > >
> > > > > > > +1 for Xintong's proposal.
> > > > > > >
> > > > > > > For me, resolving problems directly (fixing the infrastructure
> > > issue,
> > > > > > > disabling unstable tests and creating blocker JIRAs to track
> the
> > > fix
> > > > > and
> > > > > > > re-enable them asap, etc.) is (in most cases) better than
> working
> > > > > around
> > > > > > > them (verify locally, manually check and judge the failure as
> > > > > > "unrelated",
> > > > > > > etc.), and I believe the proposal could help us pushing those
> more
> > > > > "real"
> > > > > > > solutions forward.
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Yu
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Creating a blocker issue for the manually disabled tests
> sounds
> > > > good
> > > > > to
> > > > > > > me.
> > > > > > > >
> > > > > > > > Minor: I'm still a bit worried about the commits merged
> before we
> > > > fix
> > > > > > > > the unstable tests can also break those tests. Instead of
> letting
> > > > the
> > > > > > > > assigners keep a look at all potentially related commits,
> they
> > > can
> > > > > > > > maintain a branch that is periodically synced with the master
> > > > branch
> > > > > > > > while enabling the unstable test. So that they can catch the
> > > > breaking
> > > > > > > > changes asap.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > I like the idea of creating a blocker issue for a disabled
> > > test.
> > > > > This
> > > > > > > > will
> > > > > > > > > force us to resolve it in a timely manner and it won't fall
> > > > through
> > > > > > the
> > > > > > > > > cracks.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > > jingsonglee0@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > >
> > > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > > >
> > > > > > > > > > I think unstable cases can be divided into these types:
> > > > > > > > > >
> > > > > > > > > > - Force majeure: For example, network timeout, sudden
> > > > > environmental
> > > > > > > > > > collapse, they are accidental and can always be solved by
> > > > > > triggering
> > > > > > > > azure
> > > > > > > > > > again. Committers should wait for the next green azure.
> > > > > > > > > >
> > > > > > > > > > - Obvious mistakes: For example, some errors caused by
> > > obvious
> > > > > > > reasons
> > > > > > > > may
> > > > > > > > > > be repaired quickly. At this time, do we need to wait,
> or not
> > > > > wait
> > > > > > > and
> > > > > > > > just
> > > > > > > > > > ignore?
> > > > > > > > > >
> > > > > > > > > > - Difficult questions: These problems are very difficult
> to
> > > > find.
> > > > > > > There
> > > > > > > > > > will be no solution for a while and a half. We don't even
> > > know
> > > > > the
> > > > > > > > reason.
> > > > > > > > > > At this time, we should ignore it. (Maybe it's judged by
> the
> > > > > author
> > > > > > > of
> > > > > > > > the
> > > > > > > > > > case. But what about the old case whose author can't be
> > > found?)
> > > > > > > > > >
> > > > > > > > > > So, the ignored cases should be the block of the next
> release
> > > > > until
> > > > > > > the
> > > > > > > > > > reason is found or the case is fixed?  We need to ensure
> that
> > > > > > someone
> > > > > > > > will
> > > > > > > > > > take care of these cases, because there is no deepening
> of
> > > > failed
> > > > > > > > tests, no
> > > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > > >
> > > > > > > > > > I think this guideline should consider these situations,
> and
> > > > show
> > > > > > how
> > > > > > > > to
> > > > > > > > > > solve them.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jingsong
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <
> imjark@gmail.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in
> > > > > general.
> > > > > > > > > > >
> > > > > > > > > > > However, I think it's still not very clear how we
> address
> > > the
> > > > > > > > unstable
> > > > > > > > > > > tests.
> > > > > > > > > > > I think this is a very important part of this new
> > > guideline.
> > > > > > > > > > >
> > > > > > > > > > > According to the discussion above, if some tests are
> > > > unstable,
> > > > > we
> > > > > > > can
> > > > > > > > > > > manually disable it.
> > > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > > 1) Is the instability judged by the committer
> themselves or
> > > > by
> > > > > > some
> > > > > > > > > > > metrics?
> > > > > > > > > > > 2) Should we log the disable commit in the
> corresponding
> > > > issue
> > > > > > and
> > > > > > > > > > increase
> > > > > > > > > > > the priority?
> > > > > > > > > > > 3) What if nobody looks into this issue and this
> becomes
> > > some
> > > > > > > > potential
> > > > > > > > > > > bugs released with the new version?
> > > > > > > > > > > 4) If no person is actively working on the issue, who
> > > should
> > > > > > > > re-enable
> > > > > > > > > > it?
> > > > > > > > > > > Would it block PRs again?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Jark
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > > >
> > > > > > > > > > > > @Till @Yangze
> > > > > > > > > > > >
> > > > > > > > > > > > I'm also not convinced by the idea of having an
> exception
> > > > for
> > > > > > > local
> > > > > > > > > > > builds.
> > > > > > > > > > > > We need to execute the entire build (or at least the
> > > > failing
> > > > > > > stage)
> > > > > > > > > > > > locally, to make sure subsequent test cases
> prevented by
> > > > the
> > > > > > > > failure
> > > > > > > > > > one
> > > > > > > > > > > > are all executed. In that case, it's probably easier
> to
> > > > rerun
> > > > > > the
> > > > > > > > build
> > > > > > > > > > > on
> > > > > > > > > > > > azure than locally.
> > > > > > > > > > > >
> > > > > > > > > > > > Concerning disabling unstable test cases that
> regularly
> > > > block
> > > > > > PRs
> > > > > > > > from
> > > > > > > > > > > > merging, maybe we can say that such cases can only be
> > > > > disabled
> > > > > > > when
> > > > > > > > > > > someone
> > > > > > > > > > > > is actively looking into it, likely the person who
> > > disabled
> > > > > the
> > > > > > > > case.
> > > > > > > > > > If
> > > > > > > > > > > > this person is no longer actively working on it,
> he/she
> > > > > should
> > > > > > > > enable
> > > > > > > > > > the
> > > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > > >
> > > > > > > > > > > > @Jing
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > > >
> > > > > > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > +1 on this. Currently, the release managers are
> > > monitoring
> > > > > the
> > > > > > ci
> > > > > > > > and
> > > > > > > > > > > cron
> > > > > > > > > > > > build instabilities and reporting them on JIRA. We
> should
> > > > > also
> > > > > > > > > > encourage
> > > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. Set a deadline to find out the root cause and
> solve
> > > the
> > > > > > > failure
> > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > new created JIRA  because we could not block other
> > > commit
> > > > > > > merges
> > > > > > > > for
> > > > > > > > > > a
> > > > > > > > > > > > long
> > > > > > > > > > > > > time
> > > > > > > > > > > > >
> > > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > > progress
> > > > > > when
> > > > > > > > > > reached
> > > > > > > > > > > to
> > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not sure about these two. It feels a bit against
> the
> > > > > > > voluntary
> > > > > > > > > > nature
> > > > > > > > > > > > of open source projects.
> > > > > > > > > > > >
> > > > > > > > > > > > IMHO, frequent instabilities are more likely to be
> > > upgraded
> > > > > to
> > > > > > > the
> > > > > > > > > > > critical
> > > > > > > > > > > > / blocker priority, receive more attention and
> eventually
> > > > get
> > > > > > > > fixed.
> > > > > > > > > > > > Release managers are also responsible for looking for
> > > > > assignees
> > > > > > > for
> > > > > > > > > > such
> > > > > > > > > > > > issues. If a case is still not fixed soonish, even
> with
> > > all
> > > > > > these
> > > > > > > > > > > efforts,
> > > > > > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > > > > > >
> > > > > > > > > > > > 4. If we disable the respective tests temporarily, we
> > > also
> > > > > > need a
> > > > > > > > > > > mechanism
> > > > > > > > > > > > > to ensure the issue would be continued to be
> > > investigated
> > > > > in
> > > > > > > the
> > > > > > > > > > > future.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > +1. As mentioned above, we may consider disabling
> such
> > > > tests
> > > > > > iff
> > > > > > > > > > someone
> > > > > > > > > > > is
> > > > > > > > > > > > actively working on it.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > > beyond1920@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > > In order to better comply with the rule, it is
> > > necessary
> > > > to
> > > > > > > > describe
> > > > > > > > > > > > what's
> > > > > > > > > > > > > best practice if encountering test failure which
> seems
> > > > > > > unrelated
> > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > current commits.
> > > > > > > > > > > > > How to avoid merging PR with test failures and not
> > > > blocking
> > > > > > > code
> > > > > > > > > > > merging
> > > > > > > > > > > > > for a long time?
> > > > > > > > > > > > > I tried to think about the possible steps, and
> found
> > > > there
> > > > > > are
> > > > > > > > some
> > > > > > > > > > > > > detailed problems that need to be discussed in a
> step
> > > > > > further:
> > > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > > 2. Set a deadline to find out the root cause and
> solve
> > > > the
> > > > > > > > failure
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > new created JIRA  because we could not block other
> > > commit
> > > > > > > merges
> > > > > > > > for
> > > > > > > > > > a
> > > > > > > > > > > > long
> > > > > > > > > > > > > time
> > > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > > > progress
> > > > > > > when
> > > > > > > > > > > reached
> > > > > > > > > > > > to
> > > > > > > > > > > > > the deadline time?
> > > > > > > > > > > > >     There are several situations as follows, maybe
> > > > > different
> > > > > > > > cases
> > > > > > > > > > need
> > > > > > > > > > > > > different approaches.
> > > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > > >     3. not found a good solution, but already
> found the
> > > > > root
> > > > > > > > cause
> > > > > > > > > > > > >     4. found a solution, but it needs more time to
> be
> > > > done.
> > > > > > > > > > > > > 4. If we disable the respective tests temporarily,
> we
> > > > also
> > > > > > > need a
> > > > > > > > > > > > mechanism
> > > > > > > > > > > > > to ensure the issue would be continued to be
> > > investigated
> > > > > in
> > > > > > > the
> > > > > > > > > > > future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > > >
> > > > > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三
> > > 下午8:16写道：
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would first try to not introduce the
> exception
> > > for
> > > > > > local
> > > > > > > > > > builds.
> > > > > > > > > > > It
> > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > it quite hard for others to verify the build
> and to
> > > > > make
> > > > > > > sure
> > > > > > > > > > that
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > right things were executed. If we see that this
> > > > becomes
> > > > > > an
> > > > > > > > issue
> > > > > > > > > > > then
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 for appending this to community
> guidelines for
> > > > > > merging
> > > > > > > > PRs.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > > I agree that with this approach unstable
> tests
> > > will
> > > > > not
> > > > > > > > block
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > commit merges. However, it might be hard to
> > > prevent
> > > > > > > merging
> > > > > > > > > > > commits
> > > > > > > > > > > > > > > > that are related to those tests and should
> have
> > > > been
> > > > > > > passed
> > > > > > > > > > them.
> > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > true that this judgment can be made by the
> > > > > committers,
> > > > > > > but
> > > > > > > > no
> > > > > > > > > > one
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > > ensure the judgment is always precise and so
> that
> > > > we
> > > > > > have
> > > > > > > > this
> > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regarding the unstable tests, how about
> adding
> > > > > another
> > > > > > > > > > exception:
> > > > > > > > > > > > > > > > committers verify it in their local
> environment
> > > and
> > > > > > > > comment in
> > > > > > > > > > > such
> > > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It is a good principle to run all tests
> > > > > successfully
> > > > > > > > with any
> > > > > > > > > > > > > change.
> > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > means a lot for project's stability and
> > > > > development.
> > > > > > I
> > > > > > > > am big
> > > > > > > > > > > +1
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > > > 于2021年6月22日周二
> > > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > One way to address the problem of
> regularly
> > > > > failing
> > > > > > > > tests
> > > > > > > > > > > that
> > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > > merging of PRs is to disable the
> respective
> > > > tests
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > time
> > > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > > course, the failing test then needs to be
> > > > fixed.
> > > > > > But
> > > > > > > at
> > > > > > > > > > least
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > would not block everyone from making
> > > progress.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid
> Heise
> > > <
> > > > > > > > > > > arvid@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I think this is overall a good idea.
> So +1
> > > > from
> > > > > > my
> > > > > > > > side.
> > > > > > > > > > > > > > > > > > > However, I'd like to put a higher
> priority
> > > on
> > > > > > > > > > > infrastructure
> > > > > > > > > > > > > > then,
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > particular docker image/artifact
> caches.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till
> > > > Rohrmann
> > > > > <
> > > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for bringing this topic to our
> > > > > attention
> > > > > > > > > > Xintong.
> > > > > > > > > > > I
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > proposal makes a lot of sense and we
> > > should
> > > > > > > follow
> > > > > > > > it.
> > > > > > > > > > It
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > > confidence that our changes are
> working
> > > and
> > > > > it
> > > > > > > > might
> > > > > > > > > > be a
> > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > quickly fix build instabilities.
> Hence,
> > > +1.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM
> Xintong
> > > > > Song <
> > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > In the past a couple of weeks, I've
> > > > > observed
> > > > > > > > several
> > > > > > > > > > > > times
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > merged without a green light from
> the
> > > CI
> > > > > > tests,
> > > > > > > > where
> > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > considered *unrelated*. This may
> not
> > > > always
> > > > > > > cause
> > > > > > > > > > > > problems,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > increase the chance of breaking our
> > > code
> > > > > > base.
> > > > > > > In
> > > > > > > > > > fact,
> > > > > > > > > > > > it
> > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > me twice in the past few weeks
> that I
> > > had
> > > > > to
> > > > > > > > revert a
> > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I think it would be nicer to
> enforce a
> > > > > > stricter
> > > > > > > > rule,
> > > > > > > > > > > > that
> > > > > > > > > > > > > no
> > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > The problems of merging PRs with
> > > > > "unrelated"
> > > > > > > test
> > > > > > > > > > > > failures
> > > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > > - It's not always straightforward
> to
> > > tell
> > > > > > > > whether a
> > > > > > > > > > > test
> > > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > > - It prevents subsequent test cases
> > > from
> > > > > > being
> > > > > > > > > > > executed,
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > To make things easier for the
> > > committers,
> > > > > the
> > > > > > > > > > following
> > > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > > > contributor's
> > > > > > > > personal
> > > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > > - The CI tests have been triggered
> > > > multiple
> > > > > > > > times, on
> > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > each stage has at least passed for
> > > once.
> > > > > > Please
> > > > > > > > also
> > > > > > > > > > > > > comment
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > If we all agree on this, I'd
> update the
> > > > > > > community
> > > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Please let me know what do you
> think.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best, Jingsong Lee
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Yangze Guo <ka...@gmail.com>.

+1

Thanks Xintong for drafting this doc.

Best,
Yangze Guo

On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <be...@gmail.com> wrote:
>
> Thanks Xintong for giving detailed documentation.
>
> The best practice for handling test failure is very detailed, it's a good
> guidelines document with clear action steps.
>
> +1 to Xintong's proposal.
>
> Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：
>
> > Thanks all for the discussion.
> >
> > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > potential replacement of the original wiki page [2].
> >
> > Hopefully this draft has covered the most opinions discussed and consensus
> > made in this discussion thread.
> >
> > Looking forward to your feedback.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> >
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
> >
> >
> > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > Thanks for the clarification Till. +1 for what you have written.
> > >
> > > Piotrek
> > >
> > > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> > napisał(a):
> > >
> > > > One quick note for clarification. I don't have anything against builds
> > > > running on your personal Azure account and this is not what I
> > understood
> > > > under "local environment". For me "local environment" means that
> > someone
> > > > runs the test locally on his machine and then says that the
> > > > tests have passed locally.
> > > >
> > > > I do agree that there might be a conflict of interests if a PR author
> > > > disables tests. Here I would argue that we don't have malignant
> > > committers
> > > > which means that every committer will probably first check the
> > respective
> > > > ticket for how often the test failed. Then I guess the next step would
> > be
> > > > to discuss on the ticket whether to disable it or not. And finally,
> > after
> > > > reaching a consensus, it will be disabled. If we see someone abusing
> > this
> > > > policy, then we can still think about how to guard against it. But,
> > > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > > the
> > > > release manager to make the final call if this resolves concerns.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <pn...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 for the general idea, however I have concerns about a couple of
> > > > details.
> > > > >
> > > > > > I would first try to not introduce the exception for local builds.
> > > > > > It makes it quite hard for others to verify the build and to make
> > > sure
> > > > > that the right things were executed.
> > > > >
> > > > > I would counter Till's proposal to ignore local green builds. If
> > > > committer
> > > > > is merging and closing a PR, with official azure failure, but there
> > > was a
> > > > > green build before or in local azure it's IMO enough to leave the
> > > > message:
> > > > >
> > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > Green local build: URL
> > > > >
> > > > > This should address Till's concern about verification.
> > > > >
> > > > > On the other hand I have concerns about disabling tests.* It
> > shouldn't
> > > be
> > > > > the PR author/committer that's disabling a test on his own, as
> > that's a
> > > > > conflict of interests*. I have however no problems with disabling
> > test
> > > > > instabilities that were marked as "blockers" though, that should work
> > > > > pretty well. But the important thing here is to correctly judge
> > bumping
> > > > > priorities of test instabilities based on their frequency and current
> > > > > general health of the system. I believe that release managers should
> > be
> > > > > playing a big role here in deciding on the guidelines of what should
> > > be a
> > > > > priority of certain test instabilities.
> > > > >
> > > > > What I mean by that is two example scenarios:
> > > > > 1. if we have a handful of very frequently failing tests and a
> > handful
> > > of
> > > > > very rarely failing tests (like one reported failure and no another
> > > > > occurrence in many months, and let's even say that the failure looks
> > > like
> > > > > infrastructure/network timeout), we should focus on the frequently
> > > > failing
> > > > > ones, and probably we are safe to ignore for the time being the rare
> > > > issues
> > > > > - at least until we deal with the most pressing ones.
> > > > > 2. If we have tons of rarely failing test instabilities, we should
> > > > probably
> > > > > start addressing them as blockers.
> > > > >
> > > > > I'm using my own conscious and my best judgement when I'm
> > > > > bumping/decreasing priorities of test instabilities (and bugs), but
> > as
> > > > > individual committer I don't have the full picture. As I wrote
> > above, I
> > > > > think release managers are in a much better position to keep
> > adjusting
> > > > > those kind of guidelines.
> > > > >
> > > > > Best, Piotrek
> > > > >
> > > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> > > > >
> > > > > > +1 for Xintong's proposal.
> > > > > >
> > > > > > For me, resolving problems directly (fixing the infrastructure
> > issue,
> > > > > > disabling unstable tests and creating blocker JIRAs to track the
> > fix
> > > > and
> > > > > > re-enable them asap, etc.) is (in most cases) better than working
> > > > around
> > > > > > them (verify locally, manually check and judge the failure as
> > > > > "unrelated",
> > > > > > etc.), and I believe the proposal could help us pushing those more
> > > > "real"
> > > > > > solutions forward.
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Creating a blocker issue for the manually disabled tests sounds
> > > good
> > > > to
> > > > > > me.
> > > > > > >
> > > > > > > Minor: I'm still a bit worried about the commits merged before we
> > > fix
> > > > > > > the unstable tests can also break those tests. Instead of letting
> > > the
> > > > > > > assigners keep a look at all potentially related commits, they
> > can
> > > > > > > maintain a branch that is periodically synced with the master
> > > branch
> > > > > > > while enabling the unstable test. So that they can catch the
> > > breaking
> > > > > > > changes asap.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > I like the idea of creating a blocker issue for a disabled
> > test.
> > > > This
> > > > > > > will
> > > > > > > > force us to resolve it in a timely manner and it won't fall
> > > through
> > > > > the
> > > > > > > > cracks.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > > jingsonglee0@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 to Xintong's proposal
> > > > > > > > >
> > > > > > > > > I also have some concerns about unstable cases.
> > > > > > > > >
> > > > > > > > > I think unstable cases can be divided into these types:
> > > > > > > > >
> > > > > > > > > - Force majeure: For example, network timeout, sudden
> > > > environmental
> > > > > > > > > collapse, they are accidental and can always be solved by
> > > > > triggering
> > > > > > > azure
> > > > > > > > > again. Committers should wait for the next green azure.
> > > > > > > > >
> > > > > > > > > - Obvious mistakes: For example, some errors caused by
> > obvious
> > > > > > reasons
> > > > > > > may
> > > > > > > > > be repaired quickly. At this time, do we need to wait, or not
> > > > wait
> > > > > > and
> > > > > > > just
> > > > > > > > > ignore?
> > > > > > > > >
> > > > > > > > > - Difficult questions: These problems are very difficult to
> > > find.
> > > > > > There
> > > > > > > > > will be no solution for a while and a half. We don't even
> > know
> > > > the
> > > > > > > reason.
> > > > > > > > > At this time, we should ignore it. (Maybe it's judged by the
> > > > author
> > > > > > of
> > > > > > > the
> > > > > > > > > case. But what about the old case whose author can't be
> > found?)
> > > > > > > > >
> > > > > > > > > So, the ignored cases should be the block of the next release
> > > > until
> > > > > > the
> > > > > > > > > reason is found or the case is fixed?  We need to ensure that
> > > > > someone
> > > > > > > will
> > > > > > > > > take care of these cases, because there is no deepening of
> > > failed
> > > > > > > tests, no
> > > > > > > > > one may continue to pay attention to these cases.
> > > > > > > > >
> > > > > > > > > I think this guideline should consider these situations, and
> > > show
> > > > > how
> > > > > > > to
> > > > > > > > > solve them.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Jingsong
> > > > > > > > >
> > > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in
> > > > general.
> > > > > > > > > >
> > > > > > > > > > However, I think it's still not very clear how we address
> > the
> > > > > > > unstable
> > > > > > > > > > tests.
> > > > > > > > > > I think this is a very important part of this new
> > guideline.
> > > > > > > > > >
> > > > > > > > > > According to the discussion above, if some tests are
> > > unstable,
> > > > we
> > > > > > can
> > > > > > > > > > manually disable it.
> > > > > > > > > > But I have some questions in my mind:
> > > > > > > > > > 1) Is the instability judged by the committer themselves or
> > > by
> > > > > some
> > > > > > > > > > metrics?
> > > > > > > > > > 2) Should we log the disable commit in the corresponding
> > > issue
> > > > > and
> > > > > > > > > increase
> > > > > > > > > > the priority?
> > > > > > > > > > 3) What if nobody looks into this issue and this becomes
> > some
> > > > > > > potential
> > > > > > > > > > bugs released with the new version?
> > > > > > > > > > 4) If no person is actively working on the issue, who
> > should
> > > > > > > re-enable
> > > > > > > > > it?
> > > > > > > > > > Would it block PRs again?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jark
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > > >
> > > > > > > > > > > @Till @Yangze
> > > > > > > > > > >
> > > > > > > > > > > I'm also not convinced by the idea of having an exception
> > > for
> > > > > > local
> > > > > > > > > > builds.
> > > > > > > > > > > We need to execute the entire build (or at least the
> > > failing
> > > > > > stage)
> > > > > > > > > > > locally, to make sure subsequent test cases prevented by
> > > the
> > > > > > > failure
> > > > > > > > > one
> > > > > > > > > > > are all executed. In that case, it's probably easier to
> > > rerun
> > > > > the
> > > > > > > build
> > > > > > > > > > on
> > > > > > > > > > > azure than locally.
> > > > > > > > > > >
> > > > > > > > > > > Concerning disabling unstable test cases that regularly
> > > block
> > > > > PRs
> > > > > > > from
> > > > > > > > > > > merging, maybe we can say that such cases can only be
> > > > disabled
> > > > > > when
> > > > > > > > > > someone
> > > > > > > > > > > is actively looking into it, likely the person who
> > disabled
> > > > the
> > > > > > > case.
> > > > > > > > > If
> > > > > > > > > > > this person is no longer actively working on it, he/she
> > > > should
> > > > > > > enable
> > > > > > > > > the
> > > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > > >
> > > > > > > > > > > @Jing
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > > >
> > > > > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > > > > >
> > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > +1 on this. Currently, the release managers are
> > monitoring
> > > > the
> > > > > ci
> > > > > > > and
> > > > > > > > > > cron
> > > > > > > > > > > build instabilities and reporting them on JIRA. We should
> > > > also
> > > > > > > > > encourage
> > > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > > >
> > > > > > > > > > > 2. Set a deadline to find out the root cause and solve
> > the
> > > > > > failure
> > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > new created JIRA  because we could not block other
> > commit
> > > > > > merges
> > > > > > > for
> > > > > > > > > a
> > > > > > > > > > > long
> > > > > > > > > > > > time
> > > > > > > > > > > >
> > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > progress
> > > > > when
> > > > > > > > > reached
> > > > > > > > > > to
> > > > > > > > > > > > the deadline time?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm not sure about these two. It feels a bit against the
> > > > > > voluntary
> > > > > > > > > nature
> > > > > > > > > > > of open source projects.
> > > > > > > > > > >
> > > > > > > > > > > IMHO, frequent instabilities are more likely to be
> > upgraded
> > > > to
> > > > > > the
> > > > > > > > > > critical
> > > > > > > > > > > / blocker priority, receive more attention and eventually
> > > get
> > > > > > > fixed.
> > > > > > > > > > > Release managers are also responsible for looking for
> > > > assignees
> > > > > > for
> > > > > > > > > such
> > > > > > > > > > > issues. If a case is still not fixed soonish, even with
> > all
> > > > > these
> > > > > > > > > > efforts,
> > > > > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > > > > >
> > > > > > > > > > > 4. If we disable the respective tests temporarily, we
> > also
> > > > > need a
> > > > > > > > > > mechanism
> > > > > > > > > > > > to ensure the issue would be continued to be
> > investigated
> > > > in
> > > > > > the
> > > > > > > > > > future.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > +1. As mentioned above, we may consider disabling such
> > > tests
> > > > > iff
> > > > > > > > > someone
> > > > > > > > > > is
> > > > > > > > > > > actively working on it.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > > beyond1920@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > > In order to better comply with the rule, it is
> > necessary
> > > to
> > > > > > > describe
> > > > > > > > > > > what's
> > > > > > > > > > > > best practice if encountering test failure which seems
> > > > > > unrelated
> > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > current commits.
> > > > > > > > > > > > How to avoid merging PR with test failures and not
> > > blocking
> > > > > > code
> > > > > > > > > > merging
> > > > > > > > > > > > for a long time?
> > > > > > > > > > > > I tried to think about the possible steps, and found
> > > there
> > > > > are
> > > > > > > some
> > > > > > > > > > > > detailed problems that need to be discussed in a step
> > > > > further:
> > > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > > 2. Set a deadline to find out the root cause and solve
> > > the
> > > > > > > failure
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > new created JIRA  because we could not block other
> > commit
> > > > > > merges
> > > > > > > for
> > > > > > > > > a
> > > > > > > > > > > long
> > > > > > > > > > > > time
> > > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > > progress
> > > > > > when
> > > > > > > > > > reached
> > > > > > > > > > > to
> > > > > > > > > > > > the deadline time?
> > > > > > > > > > > >     There are several situations as follows, maybe
> > > > different
> > > > > > > cases
> > > > > > > > > need
> > > > > > > > > > > > different approaches.
> > > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > > >     3. not found a good solution, but already found the
> > > > root
> > > > > > > cause
> > > > > > > > > > > >     4. found a solution, but it needs more time to be
> > > done.
> > > > > > > > > > > > 4. If we disable the respective tests temporarily, we
> > > also
> > > > > > need a
> > > > > > > > > > > mechanism
> > > > > > > > > > > > to ensure the issue would be continued to be
> > investigated
> > > > in
> > > > > > the
> > > > > > > > > > future.
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > JING ZHANG
> > > > > > > > > > > >
> > > > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三
> > 下午8:16写道：
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I would first try to not introduce the exception
> > for
> > > > > local
> > > > > > > > > builds.
> > > > > > > > > > It
> > > > > > > > > > > > > makes
> > > > > > > > > > > > > > it quite hard for others to verify the build and to
> > > > make
> > > > > > sure
> > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > > > > right things were executed. If we see that this
> > > becomes
> > > > > an
> > > > > > > issue
> > > > > > > > > > then
> > > > > > > > > > > > we
> > > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 for appending this to community guidelines for
> > > > > merging
> > > > > > > PRs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > > I agree that with this approach unstable tests
> > will
> > > > not
> > > > > > > block
> > > > > > > > > > other
> > > > > > > > > > > > > > > commit merges. However, it might be hard to
> > prevent
> > > > > > merging
> > > > > > > > > > commits
> > > > > > > > > > > > > > > that are related to those tests and should have
> > > been
> > > > > > passed
> > > > > > > > > them.
> > > > > > > > > > > > It's
> > > > > > > > > > > > > > > true that this judgment can be made by the
> > > > committers,
> > > > > > but
> > > > > > > no
> > > > > > > > > one
> > > > > > > > > > > can
> > > > > > > > > > > > > > > ensure the judgment is always precise and so that
> > > we
> > > > > have
> > > > > > > this
> > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regarding the unstable tests, how about adding
> > > > another
> > > > > > > > > exception:
> > > > > > > > > > > > > > > committers verify it in their local environment
> > and
> > > > > > > comment in
> > > > > > > > > > such
> > > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > > liujiangangpeng@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It is a good principle to run all tests
> > > > successfully
> > > > > > > with any
> > > > > > > > > > > > change.
> > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > means a lot for project's stability and
> > > > development.
> > > > > I
> > > > > > > am big
> > > > > > > > > > +1
> > > > > > > > > > > > for
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > > 于2021年6月22日周二
> > > > > > > 下午6:36写道：
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > One way to address the problem of regularly
> > > > failing
> > > > > > > tests
> > > > > > > > > > that
> > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > merging of PRs is to disable the respective
> > > tests
> > > > > for
> > > > > > > the
> > > > > > > > > > time
> > > > > > > > > > > > > being.
> > > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > > course, the failing test then needs to be
> > > fixed.
> > > > > But
> > > > > > at
> > > > > > > > > least
> > > > > > > > > > > > that
> > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > would not block everyone from making
> > progress.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise
> > <
> > > > > > > > > > arvid@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think this is overall a good idea. So +1
> > > from
> > > > > my
> > > > > > > side.
> > > > > > > > > > > > > > > > > > However, I'd like to put a higher priority
> > on
> > > > > > > > > > infrastructure
> > > > > > > > > > > > > then,
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till
> > > Rohrmann
> > > > <
> > > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for bringing this topic to our
> > > > attention
> > > > > > > > > Xintong.
> > > > > > > > > > I
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > proposal makes a lot of sense and we
> > should
> > > > > > follow
> > > > > > > it.
> > > > > > > > > It
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > > confidence that our changes are working
> > and
> > > > it
> > > > > > > might
> > > > > > > > > be a
> > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > quickly fix build instabilities. Hence,
> > +1.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong
> > > > Song <
> > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > In the past a couple of weeks, I've
> > > > observed
> > > > > > > several
> > > > > > > > > > > times
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > merged without a green light from the
> > CI
> > > > > tests,
> > > > > > > where
> > > > > > > > > > > > failure
> > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > considered *unrelated*. This may not
> > > always
> > > > > > cause
> > > > > > > > > > > problems,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > increase the chance of breaking our
> > code
> > > > > base.
> > > > > > In
> > > > > > > > > fact,
> > > > > > > > > > > it
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > me twice in the past few weeks that I
> > had
> > > > to
> > > > > > > revert a
> > > > > > > > > > > > commit
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I think it would be nicer to enforce a
> > > > > stricter
> > > > > > > rule,
> > > > > > > > > > > that
> > > > > > > > > > > > no
> > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The problems of merging PRs with
> > > > "unrelated"
> > > > > > test
> > > > > > > > > > > failures
> > > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > > - It's not always straightforward to
> > tell
> > > > > > > whether a
> > > > > > > > > > test
> > > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > > - It prevents subsequent test cases
> > from
> > > > > being
> > > > > > > > > > executed,
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > To make things easier for the
> > committers,
> > > > the
> > > > > > > > > following
> > > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > > contributor's
> > > > > > > personal
> > > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > > - The CI tests have been triggered
> > > multiple
> > > > > > > times, on
> > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > each stage has at least passed for
> > once.
> > > > > Please
> > > > > > > also
> > > > > > > > > > > > comment
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > If we all agree on this, I'd update the
> > > > > > community
> > > > > > > > > > > > guidelines
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best, Jingsong Lee
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by JING ZHANG <be...@gmail.com>.

Thanks Xintong for giving detailed documentation.

The best practice for handling test failure is very detailed, it's a good
guidelines document with clear action steps.

+1 to Xintong's proposal.

Xintong Song <to...@gmail.com> 于2021年6月28日周一 下午4:07写道：

> Thanks all for the discussion.
>
> Based on the opinions so far, I've drafted the new guidelines [1], as a
> potential replacement of the original wiki page [2].
>
> Hopefully this draft has covered the most opinions discussed and consensus
> made in this discussion thread.
>
> Looking forward to your feedback.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>
>
>
> On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Thanks for the clarification Till. +1 for what you have written.
> >
> > Piotrek
> >
> > pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org>
> napisał(a):
> >
> > > One quick note for clarification. I don't have anything against builds
> > > running on your personal Azure account and this is not what I
> understood
> > > under "local environment". For me "local environment" means that
> someone
> > > runs the test locally on his machine and then says that the
> > > tests have passed locally.
> > >
> > > I do agree that there might be a conflict of interests if a PR author
> > > disables tests. Here I would argue that we don't have malignant
> > committers
> > > which means that every committer will probably first check the
> respective
> > > ticket for how often the test failed. Then I guess the next step would
> be
> > > to discuss on the ticket whether to disable it or not. And finally,
> after
> > > reaching a consensus, it will be disabled. If we see someone abusing
> this
> > > policy, then we can still think about how to guard against it. But,
> > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > the
> > > release manager to make the final call if this resolves concerns.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > > > +1 for the general idea, however I have concerns about a couple of
> > > details.
> > > >
> > > > > I would first try to not introduce the exception for local builds.
> > > > > It makes it quite hard for others to verify the build and to make
> > sure
> > > > that the right things were executed.
> > > >
> > > > I would counter Till's proposal to ignore local green builds. If
> > > committer
> > > > is merging and closing a PR, with official azure failure, but there
> > was a
> > > > green build before or in local azure it's IMO enough to leave the
> > > message:
> > > >
> > > > > Latest build failure is a known issue: FLINK-12345
> > > > > Green local build: URL
> > > >
> > > > This should address Till's concern about verification.
> > > >
> > > > On the other hand I have concerns about disabling tests.* It
> shouldn't
> > be
> > > > the PR author/committer that's disabling a test on his own, as
> that's a
> > > > conflict of interests*. I have however no problems with disabling
> test
> > > > instabilities that were marked as "blockers" though, that should work
> > > > pretty well. But the important thing here is to correctly judge
> bumping
> > > > priorities of test instabilities based on their frequency and current
> > > > general health of the system. I believe that release managers should
> be
> > > > playing a big role here in deciding on the guidelines of what should
> > be a
> > > > priority of certain test instabilities.
> > > >
> > > > What I mean by that is two example scenarios:
> > > > 1. if we have a handful of very frequently failing tests and a
> handful
> > of
> > > > very rarely failing tests (like one reported failure and no another
> > > > occurrence in many months, and let's even say that the failure looks
> > like
> > > > infrastructure/network timeout), we should focus on the frequently
> > > failing
> > > > ones, and probably we are safe to ignore for the time being the rare
> > > issues
> > > > - at least until we deal with the most pressing ones.
> > > > 2. If we have tons of rarely failing test instabilities, we should
> > > probably
> > > > start addressing them as blockers.
> > > >
> > > > I'm using my own conscious and my best judgement when I'm
> > > > bumping/decreasing priorities of test instabilities (and bugs), but
> as
> > > > individual committer I don't have the full picture. As I wrote
> above, I
> > > > think release managers are in a much better position to keep
> adjusting
> > > > those kind of guidelines.
> > > >
> > > > Best, Piotrek
> > > >
> > > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> > > >
> > > > > +1 for Xintong's proposal.
> > > > >
> > > > > For me, resolving problems directly (fixing the infrastructure
> issue,
> > > > > disabling unstable tests and creating blocker JIRAs to track the
> fix
> > > and
> > > > > re-enable them asap, etc.) is (in most cases) better than working
> > > around
> > > > > them (verify locally, manually check and judge the failure as
> > > > "unrelated",
> > > > > etc.), and I believe the proposal could help us pushing those more
> > > "real"
> > > > > solutions forward.
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > >
> > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com>
> wrote:
> > > > >
> > > > > > Creating a blocker issue for the manually disabled tests sounds
> > good
> > > to
> > > > > me.
> > > > > >
> > > > > > Minor: I'm still a bit worried about the commits merged before we
> > fix
> > > > > > the unstable tests can also break those tests. Instead of letting
> > the
> > > > > > assigners keep a look at all potentially related commits, they
> can
> > > > > > maintain a branch that is periodically synced with the master
> > branch
> > > > > > while enabling the unstable test. So that they can catch the
> > breaking
> > > > > > changes asap.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > I like the idea of creating a blocker issue for a disabled
> test.
> > > This
> > > > > > will
> > > > > > > force us to resolve it in a timely manner and it won't fall
> > through
> > > > the
> > > > > > > cracks.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > > jingsonglee0@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 to Xintong's proposal
> > > > > > > >
> > > > > > > > I also have some concerns about unstable cases.
> > > > > > > >
> > > > > > > > I think unstable cases can be divided into these types:
> > > > > > > >
> > > > > > > > - Force majeure: For example, network timeout, sudden
> > > environmental
> > > > > > > > collapse, they are accidental and can always be solved by
> > > > triggering
> > > > > > azure
> > > > > > > > again. Committers should wait for the next green azure.
> > > > > > > >
> > > > > > > > - Obvious mistakes: For example, some errors caused by
> obvious
> > > > > reasons
> > > > > > may
> > > > > > > > be repaired quickly. At this time, do we need to wait, or not
> > > wait
> > > > > and
> > > > > > just
> > > > > > > > ignore?
> > > > > > > >
> > > > > > > > - Difficult questions: These problems are very difficult to
> > find.
> > > > > There
> > > > > > > > will be no solution for a while and a half. We don't even
> know
> > > the
> > > > > > reason.
> > > > > > > > At this time, we should ignore it. (Maybe it's judged by the
> > > author
> > > > > of
> > > > > > the
> > > > > > > > case. But what about the old case whose author can't be
> found?)
> > > > > > > >
> > > > > > > > So, the ignored cases should be the block of the next release
> > > until
> > > > > the
> > > > > > > > reason is found or the case is fixed?  We need to ensure that
> > > > someone
> > > > > > will
> > > > > > > > take care of these cases, because there is no deepening of
> > failed
> > > > > > tests, no
> > > > > > > > one may continue to pay attention to these cases.
> > > > > > > >
> > > > > > > > I think this guideline should consider these situations, and
> > show
> > > > how
> > > > > > to
> > > > > > > > solve them.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong
> > > > > > > >
> > > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in
> > > general.
> > > > > > > > >
> > > > > > > > > However, I think it's still not very clear how we address
> the
> > > > > > unstable
> > > > > > > > > tests.
> > > > > > > > > I think this is a very important part of this new
> guideline.
> > > > > > > > >
> > > > > > > > > According to the discussion above, if some tests are
> > unstable,
> > > we
> > > > > can
> > > > > > > > > manually disable it.
> > > > > > > > > But I have some questions in my mind:
> > > > > > > > > 1) Is the instability judged by the committer themselves or
> > by
> > > > some
> > > > > > > > > metrics?
> > > > > > > > > 2) Should we log the disable commit in the corresponding
> > issue
> > > > and
> > > > > > > > increase
> > > > > > > > > the priority?
> > > > > > > > > 3) What if nobody looks into this issue and this becomes
> some
> > > > > > potential
> > > > > > > > > bugs released with the new version?
> > > > > > > > > 4) If no person is actively working on the issue, who
> should
> > > > > > re-enable
> > > > > > > > it?
> > > > > > > > > Would it block PRs again?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Jark
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks all for the feedback.
> > > > > > > > > >
> > > > > > > > > > @Till @Yangze
> > > > > > > > > >
> > > > > > > > > > I'm also not convinced by the idea of having an exception
> > for
> > > > > local
> > > > > > > > > builds.
> > > > > > > > > > We need to execute the entire build (or at least the
> > failing
> > > > > stage)
> > > > > > > > > > locally, to make sure subsequent test cases prevented by
> > the
> > > > > > failure
> > > > > > > > one
> > > > > > > > > > are all executed. In that case, it's probably easier to
> > rerun
> > > > the
> > > > > > build
> > > > > > > > > on
> > > > > > > > > > azure than locally.
> > > > > > > > > >
> > > > > > > > > > Concerning disabling unstable test cases that regularly
> > block
> > > > PRs
> > > > > > from
> > > > > > > > > > merging, maybe we can say that such cases can only be
> > > disabled
> > > > > when
> > > > > > > > > someone
> > > > > > > > > > is actively looking into it, likely the person who
> disabled
> > > the
> > > > > > case.
> > > > > > > > If
> > > > > > > > > > this person is no longer actively working on it, he/she
> > > should
> > > > > > enable
> > > > > > > > the
> > > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > > >
> > > > > > > > > > @Jing
> > > > > > > > > >
> > > > > > > > > > Thanks for the suggestions.
> > > > > > > > > >
> > > > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > > > >
> > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > +1 on this. Currently, the release managers are
> monitoring
> > > the
> > > > ci
> > > > > > and
> > > > > > > > > cron
> > > > > > > > > > build instabilities and reporting them on JIRA. We should
> > > also
> > > > > > > > encourage
> > > > > > > > > > other contributors to do that for PRs.
> > > > > > > > > >
> > > > > > > > > > 2. Set a deadline to find out the root cause and solve
> the
> > > > > failure
> > > > > > for
> > > > > > > > > the
> > > > > > > > > > > new created JIRA  because we could not block other
> commit
> > > > > merges
> > > > > > for
> > > > > > > > a
> > > > > > > > > > long
> > > > > > > > > > > time
> > > > > > > > > > >
> > > > > > > > > > 3. What to do if the JIRA has not made significant
> progress
> > > > when
> > > > > > > > reached
> > > > > > > > > to
> > > > > > > > > > > the deadline time?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I'm not sure about these two. It feels a bit against the
> > > > > voluntary
> > > > > > > > nature
> > > > > > > > > > of open source projects.
> > > > > > > > > >
> > > > > > > > > > IMHO, frequent instabilities are more likely to be
> upgraded
> > > to
> > > > > the
> > > > > > > > > critical
> > > > > > > > > > / blocker priority, receive more attention and eventually
> > get
> > > > > > fixed.
> > > > > > > > > > Release managers are also responsible for looking for
> > > assignees
> > > > > for
> > > > > > > > such
> > > > > > > > > > issues. If a case is still not fixed soonish, even with
> all
> > > > these
> > > > > > > > > efforts,
> > > > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > > > >
> > > > > > > > > > 4. If we disable the respective tests temporarily, we
> also
> > > > need a
> > > > > > > > > mechanism
> > > > > > > > > > > to ensure the issue would be continued to be
> investigated
> > > in
> > > > > the
> > > > > > > > > future.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > +1. As mentioned above, we may consider disabling such
> > tests
> > > > iff
> > > > > > > > someone
> > > > > > > > > is
> > > > > > > > > > actively working on it.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > > beyond1920@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Xintong,
> > > > > > > > > > > +1 to the proposal.
> > > > > > > > > > > In order to better comply with the rule, it is
> necessary
> > to
> > > > > > describe
> > > > > > > > > > what's
> > > > > > > > > > > best practice if encountering test failure which seems
> > > > > unrelated
> > > > > > with
> > > > > > > > > the
> > > > > > > > > > > current commits.
> > > > > > > > > > > How to avoid merging PR with test failures and not
> > blocking
> > > > > code
> > > > > > > > > merging
> > > > > > > > > > > for a long time?
> > > > > > > > > > > I tried to think about the possible steps, and found
> > there
> > > > are
> > > > > > some
> > > > > > > > > > > detailed problems that need to be discussed in a step
> > > > further:
> > > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > > 2. Set a deadline to find out the root cause and solve
> > the
> > > > > > failure
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > new created JIRA  because we could not block other
> commit
> > > > > merges
> > > > > > for
> > > > > > > > a
> > > > > > > > > > long
> > > > > > > > > > > time
> > > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > > 3. What to do if the JIRA has not made significant
> > progress
> > > > > when
> > > > > > > > > reached
> > > > > > > > > > to
> > > > > > > > > > > the deadline time?
> > > > > > > > > > >     There are several situations as follows, maybe
> > > different
> > > > > > cases
> > > > > > > > need
> > > > > > > > > > > different approaches.
> > > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > > >     3. not found a good solution, but already found the
> > > root
> > > > > > cause
> > > > > > > > > > >     4. found a solution, but it needs more time to be
> > done.
> > > > > > > > > > > 4. If we disable the respective tests temporarily, we
> > also
> > > > > need a
> > > > > > > > > > mechanism
> > > > > > > > > > > to ensure the issue would be continued to be
> investigated
> > > in
> > > > > the
> > > > > > > > > future.
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > JING ZHANG
> > > > > > > > > > >
> > > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三
> 下午8:16写道：
> > > > > > > > > > >
> > > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > > > trohrmann@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I would first try to not introduce the exception
> for
> > > > local
> > > > > > > > builds.
> > > > > > > > > It
> > > > > > > > > > > > makes
> > > > > > > > > > > > > it quite hard for others to verify the build and to
> > > make
> > > > > sure
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > > > right things were executed. If we see that this
> > becomes
> > > > an
> > > > > > issue
> > > > > > > > > then
> > > > > > > > > > > we
> > > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Till
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > > > karmagyz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 for appending this to community guidelines for
> > > > merging
> > > > > > PRs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > > I agree that with this approach unstable tests
> will
> > > not
> > > > > > block
> > > > > > > > > other
> > > > > > > > > > > > > > commit merges. However, it might be hard to
> prevent
> > > > > merging
> > > > > > > > > commits
> > > > > > > > > > > > > > that are related to those tests and should have
> > been
> > > > > passed
> > > > > > > > them.
> > > > > > > > > > > It's
> > > > > > > > > > > > > > true that this judgment can be made by the
> > > committers,
> > > > > but
> > > > > > no
> > > > > > > > one
> > > > > > > > > > can
> > > > > > > > > > > > > > ensure the judgment is always precise and so that
> > we
> > > > have
> > > > > > this
> > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding the unstable tests, how about adding
> > > another
> > > > > > > > exception:
> > > > > > > > > > > > > > committers verify it in their local environment
> and
> > > > > > comment in
> > > > > > > > > such
> > > > > > > > > > > > > > cases?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > > liujiangangpeng@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It is a good principle to run all tests
> > > successfully
> > > > > > with any
> > > > > > > > > > > change.
> > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > means a lot for project's stability and
> > > development.
> > > > I
> > > > > > am big
> > > > > > > > > +1
> > > > > > > > > > > for
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> > 于2021年6月22日周二
> > > > > > 下午6:36写道：
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One way to address the problem of regularly
> > > failing
> > > > > > tests
> > > > > > > > > that
> > > > > > > > > > > > block
> > > > > > > > > > > > > > > > merging of PRs is to disable the respective
> > tests
> > > > for
> > > > > > the
> > > > > > > > > time
> > > > > > > > > > > > being.
> > > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > > course, the failing test then needs to be
> > fixed.
> > > > But
> > > > > at
> > > > > > > > least
> > > > > > > > > > > that
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > would not block everyone from making
> progress.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise
> <
> > > > > > > > > arvid@apache.org
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think this is overall a good idea. So +1
> > from
> > > > my
> > > > > > side.
> > > > > > > > > > > > > > > > > However, I'd like to put a higher priority
> on
> > > > > > > > > infrastructure
> > > > > > > > > > > > then,
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till
> > Rohrmann
> > > <
> > > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for bringing this topic to our
> > > attention
> > > > > > > > Xintong.
> > > > > > > > > I
> > > > > > > > > > > > think
> > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > proposal makes a lot of sense and we
> should
> > > > > follow
> > > > > > it.
> > > > > > > > It
> > > > > > > > > > > will
> > > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > > confidence that our changes are working
> and
> > > it
> > > > > > might
> > > > > > > > be a
> > > > > > > > > > > good
> > > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > quickly fix build instabilities. Hence,
> +1.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong
> > > Song <
> > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > In the past a couple of weeks, I've
> > > observed
> > > > > > several
> > > > > > > > > > times
> > > > > > > > > > > > that
> > > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > merged without a green light from the
> CI
> > > > tests,
> > > > > > where
> > > > > > > > > > > failure
> > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > considered *unrelated*. This may not
> > always
> > > > > cause
> > > > > > > > > > problems,
> > > > > > > > > > > > but
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > increase the chance of breaking our
> code
> > > > base.
> > > > > In
> > > > > > > > fact,
> > > > > > > > > > it
> > > > > > > > > > > > has
> > > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > me twice in the past few weeks that I
> had
> > > to
> > > > > > revert a
> > > > > > > > > > > commit
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I think it would be nicer to enforce a
> > > > stricter
> > > > > > rule,
> > > > > > > > > > that
> > > > > > > > > > > no
> > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The problems of merging PRs with
> > > "unrelated"
> > > > > test
> > > > > > > > > > failures
> > > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > > - It's not always straightforward to
> tell
> > > > > > whether a
> > > > > > > > > test
> > > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > > - It prevents subsequent test cases
> from
> > > > being
> > > > > > > > > executed,
> > > > > > > > > > > > which
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > To make things easier for the
> committers,
> > > the
> > > > > > > > following
> > > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> > contributor's
> > > > > > personal
> > > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > > - The CI tests have been triggered
> > multiple
> > > > > > times, on
> > > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > each stage has at least passed for
> once.
> > > > Please
> > > > > > also
> > > > > > > > > > > comment
> > > > > > > > > > > > in
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If we all agree on this, I'd update the
> > > > > community
> > > > > > > > > > > guidelines
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best, Jingsong Lee
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Xintong Song <to...@gmail.com>.

Thanks all for the discussion.

Based on the opinions so far, I've drafted the new guidelines [1], as a
potential replacement of the original wiki page [2].

Hopefully this draft has covered the most opinions discussed and consensus
made in this discussion thread.

Looking forward to your feedback.

Thank you~

Xintong Song


[1]
https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing

[2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests



On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <pn...@apache.org>
wrote:

> Thanks for the clarification Till. +1 for what you have written.
>
> Piotrek
>
> pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org> napisał(a):
>
> > One quick note for clarification. I don't have anything against builds
> > running on your personal Azure account and this is not what I understood
> > under "local environment". For me "local environment" means that someone
> > runs the test locally on his machine and then says that the
> > tests have passed locally.
> >
> > I do agree that there might be a conflict of interests if a PR author
> > disables tests. Here I would argue that we don't have malignant
> committers
> > which means that every committer will probably first check the respective
> > ticket for how often the test failed. Then I guess the next step would be
> > to discuss on the ticket whether to disable it or not. And finally, after
> > reaching a consensus, it will be disabled. If we see someone abusing this
> > policy, then we can still think about how to guard against it. But,
> > honestly, I have very rarely seen such a case. I am also ok to pull in
> the
> > release manager to make the final call if this resolves concerns.
> >
> > Cheers,
> > Till
> >
> > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > +1 for the general idea, however I have concerns about a couple of
> > details.
> > >
> > > > I would first try to not introduce the exception for local builds.
> > > > It makes it quite hard for others to verify the build and to make
> sure
> > > that the right things were executed.
> > >
> > > I would counter Till's proposal to ignore local green builds. If
> > committer
> > > is merging and closing a PR, with official azure failure, but there
> was a
> > > green build before or in local azure it's IMO enough to leave the
> > message:
> > >
> > > > Latest build failure is a known issue: FLINK-12345
> > > > Green local build: URL
> > >
> > > This should address Till's concern about verification.
> > >
> > > On the other hand I have concerns about disabling tests.* It shouldn't
> be
> > > the PR author/committer that's disabling a test on his own, as that's a
> > > conflict of interests*. I have however no problems with disabling test
> > > instabilities that were marked as "blockers" though, that should work
> > > pretty well. But the important thing here is to correctly judge bumping
> > > priorities of test instabilities based on their frequency and current
> > > general health of the system. I believe that release managers should be
> > > playing a big role here in deciding on the guidelines of what should
> be a
> > > priority of certain test instabilities.
> > >
> > > What I mean by that is two example scenarios:
> > > 1. if we have a handful of very frequently failing tests and a handful
> of
> > > very rarely failing tests (like one reported failure and no another
> > > occurrence in many months, and let's even say that the failure looks
> like
> > > infrastructure/network timeout), we should focus on the frequently
> > failing
> > > ones, and probably we are safe to ignore for the time being the rare
> > issues
> > > - at least until we deal with the most pressing ones.
> > > 2. If we have tons of rarely failing test instabilities, we should
> > probably
> > > start addressing them as blockers.
> > >
> > > I'm using my own conscious and my best judgement when I'm
> > > bumping/decreasing priorities of test instabilities (and bugs), but as
> > > individual committer I don't have the full picture. As I wrote above, I
> > > think release managers are in a much better position to keep adjusting
> > > those kind of guidelines.
> > >
> > > Best, Piotrek
> > >
> > > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> > >
> > > > +1 for Xintong's proposal.
> > > >
> > > > For me, resolving problems directly (fixing the infrastructure issue,
> > > > disabling unstable tests and creating blocker JIRAs to track the fix
> > and
> > > > re-enable them asap, etc.) is (in most cases) better than working
> > around
> > > > them (verify locally, manually check and judge the failure as
> > > "unrelated",
> > > > etc.), and I believe the proposal could help us pushing those more
> > "real"
> > > > solutions forward.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com> wrote:
> > > >
> > > > > Creating a blocker issue for the manually disabled tests sounds
> good
> > to
> > > > me.
> > > > >
> > > > > Minor: I'm still a bit worried about the commits merged before we
> fix
> > > > > the unstable tests can also break those tests. Instead of letting
> the
> > > > > assigners keep a look at all potentially related commits, they can
> > > > > maintain a branch that is periodically synced with the master
> branch
> > > > > while enabling the unstable test. So that they can catch the
> breaking
> > > > > changes asap.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <
> trohrmann@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > I like the idea of creating a blocker issue for a disabled test.
> > This
> > > > > will
> > > > > > force us to resolve it in a timely manner and it won't fall
> through
> > > the
> > > > > > cracks.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> > jingsonglee0@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 to Xintong's proposal
> > > > > > >
> > > > > > > I also have some concerns about unstable cases.
> > > > > > >
> > > > > > > I think unstable cases can be divided into these types:
> > > > > > >
> > > > > > > - Force majeure: For example, network timeout, sudden
> > environmental
> > > > > > > collapse, they are accidental and can always be solved by
> > > triggering
> > > > > azure
> > > > > > > again. Committers should wait for the next green azure.
> > > > > > >
> > > > > > > - Obvious mistakes: For example, some errors caused by obvious
> > > > reasons
> > > > > may
> > > > > > > be repaired quickly. At this time, do we need to wait, or not
> > wait
> > > > and
> > > > > just
> > > > > > > ignore?
> > > > > > >
> > > > > > > - Difficult questions: These problems are very difficult to
> find.
> > > > There
> > > > > > > will be no solution for a while and a half. We don't even know
> > the
> > > > > reason.
> > > > > > > At this time, we should ignore it. (Maybe it's judged by the
> > author
> > > > of
> > > > > the
> > > > > > > case. But what about the old case whose author can't be found?)
> > > > > > >
> > > > > > > So, the ignored cases should be the block of the next release
> > until
> > > > the
> > > > > > > reason is found or the case is fixed?  We need to ensure that
> > > someone
> > > > > will
> > > > > > > take care of these cases, because there is no deepening of
> failed
> > > > > tests, no
> > > > > > > one may continue to pay attention to these cases.
> > > > > > >
> > > > > > > I think this guideline should consider these situations, and
> show
> > > how
> > > > > to
> > > > > > > solve them.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in
> > general.
> > > > > > > >
> > > > > > > > However, I think it's still not very clear how we address the
> > > > > unstable
> > > > > > > > tests.
> > > > > > > > I think this is a very important part of this new guideline.
> > > > > > > >
> > > > > > > > According to the discussion above, if some tests are
> unstable,
> > we
> > > > can
> > > > > > > > manually disable it.
> > > > > > > > But I have some questions in my mind:
> > > > > > > > 1) Is the instability judged by the committer themselves or
> by
> > > some
> > > > > > > > metrics?
> > > > > > > > 2) Should we log the disable commit in the corresponding
> issue
> > > and
> > > > > > > increase
> > > > > > > > the priority?
> > > > > > > > 3) What if nobody looks into this issue and this becomes some
> > > > > potential
> > > > > > > > bugs released with the new version?
> > > > > > > > 4) If no person is actively working on the issue, who should
> > > > > re-enable
> > > > > > > it?
> > > > > > > > Would it block PRs again?
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jark
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks all for the feedback.
> > > > > > > > >
> > > > > > > > > @Till @Yangze
> > > > > > > > >
> > > > > > > > > I'm also not convinced by the idea of having an exception
> for
> > > > local
> > > > > > > > builds.
> > > > > > > > > We need to execute the entire build (or at least the
> failing
> > > > stage)
> > > > > > > > > locally, to make sure subsequent test cases prevented by
> the
> > > > > failure
> > > > > > > one
> > > > > > > > > are all executed. In that case, it's probably easier to
> rerun
> > > the
> > > > > build
> > > > > > > > on
> > > > > > > > > azure than locally.
> > > > > > > > >
> > > > > > > > > Concerning disabling unstable test cases that regularly
> block
> > > PRs
> > > > > from
> > > > > > > > > merging, maybe we can say that such cases can only be
> > disabled
> > > > when
> > > > > > > > someone
> > > > > > > > > is actively looking into it, likely the person who disabled
> > the
> > > > > case.
> > > > > > > If
> > > > > > > > > this person is no longer actively working on it, he/she
> > should
> > > > > enable
> > > > > > > the
> > > > > > > > > case again no matter if it is fixed or not.
> > > > > > > > >
> > > > > > > > > @Jing
> > > > > > > > >
> > > > > > > > > Thanks for the suggestions.
> > > > > > > > >
> > > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > > >
> > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > +1 on this. Currently, the release managers are monitoring
> > the
> > > ci
> > > > > and
> > > > > > > > cron
> > > > > > > > > build instabilities and reporting them on JIRA. We should
> > also
> > > > > > > encourage
> > > > > > > > > other contributors to do that for PRs.
> > > > > > > > >
> > > > > > > > > 2. Set a deadline to find out the root cause and solve the
> > > > failure
> > > > > for
> > > > > > > > the
> > > > > > > > > > new created JIRA  because we could not block other commit
> > > > merges
> > > > > for
> > > > > > > a
> > > > > > > > > long
> > > > > > > > > > time
> > > > > > > > > >
> > > > > > > > > 3. What to do if the JIRA has not made significant progress
> > > when
> > > > > > > reached
> > > > > > > > to
> > > > > > > > > > the deadline time?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm not sure about these two. It feels a bit against the
> > > > voluntary
> > > > > > > nature
> > > > > > > > > of open source projects.
> > > > > > > > >
> > > > > > > > > IMHO, frequent instabilities are more likely to be upgraded
> > to
> > > > the
> > > > > > > > critical
> > > > > > > > > / blocker priority, receive more attention and eventually
> get
> > > > > fixed.
> > > > > > > > > Release managers are also responsible for looking for
> > assignees
> > > > for
> > > > > > > such
> > > > > > > > > issues. If a case is still not fixed soonish, even with all
> > > these
> > > > > > > > efforts,
> > > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > > >
> > > > > > > > > 4. If we disable the respective tests temporarily, we also
> > > need a
> > > > > > > > mechanism
> > > > > > > > > > to ensure the issue would be continued to be investigated
> > in
> > > > the
> > > > > > > > future.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > +1. As mentioned above, we may consider disabling such
> tests
> > > iff
> > > > > > > someone
> > > > > > > > is
> > > > > > > > > actively working on it.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > > beyond1920@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Xintong,
> > > > > > > > > > +1 to the proposal.
> > > > > > > > > > In order to better comply with the rule, it is necessary
> to
> > > > > describe
> > > > > > > > > what's
> > > > > > > > > > best practice if encountering test failure which seems
> > > > unrelated
> > > > > with
> > > > > > > > the
> > > > > > > > > > current commits.
> > > > > > > > > > How to avoid merging PR with test failures and not
> blocking
> > > > code
> > > > > > > > merging
> > > > > > > > > > for a long time?
> > > > > > > > > > I tried to think about the possible steps, and found
> there
> > > are
> > > > > some
> > > > > > > > > > detailed problems that need to be discussed in a step
> > > further:
> > > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > > 2. Set a deadline to find out the root cause and solve
> the
> > > > > failure
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > new created JIRA  because we could not block other commit
> > > > merges
> > > > > for
> > > > > > > a
> > > > > > > > > long
> > > > > > > > > > time
> > > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > > 3. What to do if the JIRA has not made significant
> progress
> > > > when
> > > > > > > > reached
> > > > > > > > > to
> > > > > > > > > > the deadline time?
> > > > > > > > > >     There are several situations as follows, maybe
> > different
> > > > > cases
> > > > > > > need
> > > > > > > > > > different approaches.
> > > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > > >     2. not found the root cause yet
> > > > > > > > > >     3. not found a good solution, but already found the
> > root
> > > > > cause
> > > > > > > > > >     4. found a solution, but it needs more time to be
> done.
> > > > > > > > > > 4. If we disable the respective tests temporarily, we
> also
> > > > need a
> > > > > > > > > mechanism
> > > > > > > > > > to ensure the issue would be continued to be investigated
> > in
> > > > the
> > > > > > > > future.
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > JING ZHANG
> > > > > > > > > >
> > > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > > > > > > >
> > > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > > trohrmann@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I would first try to not introduce the exception for
> > > local
> > > > > > > builds.
> > > > > > > > It
> > > > > > > > > > > makes
> > > > > > > > > > > > it quite hard for others to verify the build and to
> > make
> > > > sure
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > > > > right things were executed. If we see that this
> becomes
> > > an
> > > > > issue
> > > > > > > > then
> > > > > > > > > > we
> > > > > > > > > > > > can revisit this idea.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > > karmagyz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 for appending this to community guidelines for
> > > merging
> > > > > PRs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > > I agree that with this approach unstable tests will
> > not
> > > > > block
> > > > > > > > other
> > > > > > > > > > > > > commit merges. However, it might be hard to prevent
> > > > merging
> > > > > > > > commits
> > > > > > > > > > > > > that are related to those tests and should have
> been
> > > > passed
> > > > > > > them.
> > > > > > > > > > It's
> > > > > > > > > > > > > true that this judgment can be made by the
> > committers,
> > > > but
> > > > > no
> > > > > > > one
> > > > > > > > > can
> > > > > > > > > > > > > ensure the judgment is always precise and so that
> we
> > > have
> > > > > this
> > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding the unstable tests, how about adding
> > another
> > > > > > > exception:
> > > > > > > > > > > > > committers verify it in their local environment and
> > > > > comment in
> > > > > > > > such
> > > > > > > > > > > > > cases?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > > liujiangangpeng@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It is a good principle to run all tests
> > successfully
> > > > > with any
> > > > > > > > > > change.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > means a lot for project's stability and
> > development.
> > > I
> > > > > am big
> > > > > > > > +1
> > > > > > > > > > for
> > > > > > > > > > > > this
> > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best
> > > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Till Rohrmann <tr...@apache.org>
> 于2021年6月22日周二
> > > > > 下午6:36写道：
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One way to address the problem of regularly
> > failing
> > > > > tests
> > > > > > > > that
> > > > > > > > > > > block
> > > > > > > > > > > > > > > merging of PRs is to disable the respective
> tests
> > > for
> > > > > the
> > > > > > > > time
> > > > > > > > > > > being.
> > > > > > > > > > > > > Of
> > > > > > > > > > > > > > > course, the failing test then needs to be
> fixed.
> > > But
> > > > at
> > > > > > > least
> > > > > > > > > > that
> > > > > > > > > > > > way
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > > > > > > arvid@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think this is overall a good idea. So +1
> from
> > > my
> > > > > side.
> > > > > > > > > > > > > > > > However, I'd like to put a higher priority on
> > > > > > > > infrastructure
> > > > > > > > > > > then,
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till
> Rohrmann
> > <
> > > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for bringing this topic to our
> > attention
> > > > > > > Xintong.
> > > > > > > > I
> > > > > > > > > > > think
> > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > proposal makes a lot of sense and we should
> > > > follow
> > > > > it.
> > > > > > > It
> > > > > > > > > > will
> > > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > > confidence that our changes are working and
> > it
> > > > > might
> > > > > > > be a
> > > > > > > > > > good
> > > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong
> > Song <
> > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In the past a couple of weeks, I've
> > observed
> > > > > several
> > > > > > > > > times
> > > > > > > > > > > that
> > > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > merged without a green light from the CI
> > > tests,
> > > > > where
> > > > > > > > > > failure
> > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > considered *unrelated*. This may not
> always
> > > > cause
> > > > > > > > > problems,
> > > > > > > > > > > but
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > increase the chance of breaking our code
> > > base.
> > > > In
> > > > > > > fact,
> > > > > > > > > it
> > > > > > > > > > > has
> > > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > me twice in the past few weeks that I had
> > to
> > > > > revert a
> > > > > > > > > > commit
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think it would be nicer to enforce a
> > > stricter
> > > > > rule,
> > > > > > > > > that
> > > > > > > > > > no
> > > > > > > > > > > > PRs
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The problems of merging PRs with
> > "unrelated"
> > > > test
> > > > > > > > > failures
> > > > > > > > > > > are:
> > > > > > > > > > > > > > > > > > - It's not always straightforward to tell
> > > > > whether a
> > > > > > > > test
> > > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > > - It prevents subsequent test cases from
> > > being
> > > > > > > > executed,
> > > > > > > > > > > which
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > To make things easier for the committers,
> > the
> > > > > > > following
> > > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > > - The PR has passed CI in the
> contributor's
> > > > > personal
> > > > > > > > > > > workspace.
> > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > > - The CI tests have been triggered
> multiple
> > > > > times, on
> > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > each stage has at least passed for once.
> > > Please
> > > > > also
> > > > > > > > > > comment
> > > > > > > > > > > in
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If we all agree on this, I'd update the
> > > > community
> > > > > > > > > > guidelines
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best, Jingsong Lee
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Piotr Nowojski <pn...@apache.org>.

Thanks for the clarification Till. +1 for what you have written.

Piotrek

pt., 25 cze 2021 o 16:00 Till Rohrmann <tr...@apache.org> napisał(a):

> One quick note for clarification. I don't have anything against builds
> running on your personal Azure account and this is not what I understood
> under "local environment". For me "local environment" means that someone
> runs the test locally on his machine and then says that the
> tests have passed locally.
>
> I do agree that there might be a conflict of interests if a PR author
> disables tests. Here I would argue that we don't have malignant committers
> which means that every committer will probably first check the respective
> ticket for how often the test failed. Then I guess the next step would be
> to discuss on the ticket whether to disable it or not. And finally, after
> reaching a consensus, it will be disabled. If we see someone abusing this
> policy, then we can still think about how to guard against it. But,
> honestly, I have very rarely seen such a case. I am also ok to pull in the
> release manager to make the final call if this resolves concerns.
>
> Cheers,
> Till
>
> On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > +1 for the general idea, however I have concerns about a couple of
> details.
> >
> > > I would first try to not introduce the exception for local builds.
> > > It makes it quite hard for others to verify the build and to make sure
> > that the right things were executed.
> >
> > I would counter Till's proposal to ignore local green builds. If
> committer
> > is merging and closing a PR, with official azure failure, but there was a
> > green build before or in local azure it's IMO enough to leave the
> message:
> >
> > > Latest build failure is a known issue: FLINK-12345
> > > Green local build: URL
> >
> > This should address Till's concern about verification.
> >
> > On the other hand I have concerns about disabling tests.* It shouldn't be
> > the PR author/committer that's disabling a test on his own, as that's a
> > conflict of interests*. I have however no problems with disabling test
> > instabilities that were marked as "blockers" though, that should work
> > pretty well. But the important thing here is to correctly judge bumping
> > priorities of test instabilities based on their frequency and current
> > general health of the system. I believe that release managers should be
> > playing a big role here in deciding on the guidelines of what should be a
> > priority of certain test instabilities.
> >
> > What I mean by that is two example scenarios:
> > 1. if we have a handful of very frequently failing tests and a handful of
> > very rarely failing tests (like one reported failure and no another
> > occurrence in many months, and let's even say that the failure looks like
> > infrastructure/network timeout), we should focus on the frequently
> failing
> > ones, and probably we are safe to ignore for the time being the rare
> issues
> > - at least until we deal with the most pressing ones.
> > 2. If we have tons of rarely failing test instabilities, we should
> probably
> > start addressing them as blockers.
> >
> > I'm using my own conscious and my best judgement when I'm
> > bumping/decreasing priorities of test instabilities (and bugs), but as
> > individual committer I don't have the full picture. As I wrote above, I
> > think release managers are in a much better position to keep adjusting
> > those kind of guidelines.
> >
> > Best, Piotrek
> >
> > pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
> >
> > > +1 for Xintong's proposal.
> > >
> > > For me, resolving problems directly (fixing the infrastructure issue,
> > > disabling unstable tests and creating blocker JIRAs to track the fix
> and
> > > re-enable them asap, etc.) is (in most cases) better than working
> around
> > > them (verify locally, manually check and judge the failure as
> > "unrelated",
> > > etc.), and I believe the proposal could help us pushing those more
> "real"
> > > solutions forward.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Creating a blocker issue for the manually disabled tests sounds good
> to
> > > me.
> > > >
> > > > Minor: I'm still a bit worried about the commits merged before we fix
> > > > the unstable tests can also break those tests. Instead of letting the
> > > > assigners keep a look at all potentially related commits, they can
> > > > maintain a branch that is periodically synced with the master branch
> > > > while enabling the unstable test. So that they can catch the breaking
> > > > changes asap.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > > >
> > > > > I like the idea of creating a blocker issue for a disabled test.
> This
> > > > will
> > > > > force us to resolve it in a timely manner and it won't fall through
> > the
> > > > > cracks.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <
> jingsonglee0@gmail.com>
> > > > wrote:
> > > > >
> > > > > > +1 to Xintong's proposal
> > > > > >
> > > > > > I also have some concerns about unstable cases.
> > > > > >
> > > > > > I think unstable cases can be divided into these types:
> > > > > >
> > > > > > - Force majeure: For example, network timeout, sudden
> environmental
> > > > > > collapse, they are accidental and can always be solved by
> > triggering
> > > > azure
> > > > > > again. Committers should wait for the next green azure.
> > > > > >
> > > > > > - Obvious mistakes: For example, some errors caused by obvious
> > > reasons
> > > > may
> > > > > > be repaired quickly. At this time, do we need to wait, or not
> wait
> > > and
> > > > just
> > > > > > ignore?
> > > > > >
> > > > > > - Difficult questions: These problems are very difficult to find.
> > > There
> > > > > > will be no solution for a while and a half. We don't even know
> the
> > > > reason.
> > > > > > At this time, we should ignore it. (Maybe it's judged by the
> author
> > > of
> > > > the
> > > > > > case. But what about the old case whose author can't be found?)
> > > > > >
> > > > > > So, the ignored cases should be the block of the next release
> until
> > > the
> > > > > > reason is found or the case is fixed?  We need to ensure that
> > someone
> > > > will
> > > > > > take care of these cases, because there is no deepening of failed
> > > > tests, no
> > > > > > one may continue to pay attention to these cases.
> > > > > >
> > > > > > I think this guideline should consider these situations, and show
> > how
> > > > to
> > > > > > solve them.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in
> general.
> > > > > > >
> > > > > > > However, I think it's still not very clear how we address the
> > > > unstable
> > > > > > > tests.
> > > > > > > I think this is a very important part of this new guideline.
> > > > > > >
> > > > > > > According to the discussion above, if some tests are unstable,
> we
> > > can
> > > > > > > manually disable it.
> > > > > > > But I have some questions in my mind:
> > > > > > > 1) Is the instability judged by the committer themselves or by
> > some
> > > > > > > metrics?
> > > > > > > 2) Should we log the disable commit in the corresponding issue
> > and
> > > > > > increase
> > > > > > > the priority?
> > > > > > > 3) What if nobody looks into this issue and this becomes some
> > > > potential
> > > > > > > bugs released with the new version?
> > > > > > > 4) If no person is actively working on the issue, who should
> > > > re-enable
> > > > > > it?
> > > > > > > Would it block PRs again?
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> > tonysong820@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks all for the feedback.
> > > > > > > >
> > > > > > > > @Till @Yangze
> > > > > > > >
> > > > > > > > I'm also not convinced by the idea of having an exception for
> > > local
> > > > > > > builds.
> > > > > > > > We need to execute the entire build (or at least the failing
> > > stage)
> > > > > > > > locally, to make sure subsequent test cases prevented by the
> > > > failure
> > > > > > one
> > > > > > > > are all executed. In that case, it's probably easier to rerun
> > the
> > > > build
> > > > > > > on
> > > > > > > > azure than locally.
> > > > > > > >
> > > > > > > > Concerning disabling unstable test cases that regularly block
> > PRs
> > > > from
> > > > > > > > merging, maybe we can say that such cases can only be
> disabled
> > > when
> > > > > > > someone
> > > > > > > > is actively looking into it, likely the person who disabled
> the
> > > > case.
> > > > > > If
> > > > > > > > this person is no longer actively working on it, he/she
> should
> > > > enable
> > > > > > the
> > > > > > > > case again no matter if it is fixed or not.
> > > > > > > >
> > > > > > > > @Jing
> > > > > > > >
> > > > > > > > Thanks for the suggestions.
> > > > > > > >
> > > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > > >
> > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > >
> > > > > > > >
> > > > > > > > +1 on this. Currently, the release managers are monitoring
> the
> > ci
> > > > and
> > > > > > > cron
> > > > > > > > build instabilities and reporting them on JIRA. We should
> also
> > > > > > encourage
> > > > > > > > other contributors to do that for PRs.
> > > > > > > >
> > > > > > > > 2. Set a deadline to find out the root cause and solve the
> > > failure
> > > > for
> > > > > > > the
> > > > > > > > > new created JIRA  because we could not block other commit
> > > merges
> > > > for
> > > > > > a
> > > > > > > > long
> > > > > > > > > time
> > > > > > > > >
> > > > > > > > 3. What to do if the JIRA has not made significant progress
> > when
> > > > > > reached
> > > > > > > to
> > > > > > > > > the deadline time?
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm not sure about these two. It feels a bit against the
> > > voluntary
> > > > > > nature
> > > > > > > > of open source projects.
> > > > > > > >
> > > > > > > > IMHO, frequent instabilities are more likely to be upgraded
> to
> > > the
> > > > > > > critical
> > > > > > > > / blocker priority, receive more attention and eventually get
> > > > fixed.
> > > > > > > > Release managers are also responsible for looking for
> assignees
> > > for
> > > > > > such
> > > > > > > > issues. If a case is still not fixed soonish, even with all
> > these
> > > > > > > efforts,
> > > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > > >
> > > > > > > > 4. If we disable the respective tests temporarily, we also
> > need a
> > > > > > > mechanism
> > > > > > > > > to ensure the issue would be continued to be investigated
> in
> > > the
> > > > > > > future.
> > > > > > > > >
> > > > > > > >
> > > > > > > > +1. As mentioned above, we may consider disabling such tests
> > iff
> > > > > > someone
> > > > > > > is
> > > > > > > > actively working on it.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> > beyond1920@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Xintong,
> > > > > > > > > +1 to the proposal.
> > > > > > > > > In order to better comply with the rule, it is necessary to
> > > > describe
> > > > > > > > what's
> > > > > > > > > best practice if encountering test failure which seems
> > > unrelated
> > > > with
> > > > > > > the
> > > > > > > > > current commits.
> > > > > > > > > How to avoid merging PR with test failures and not blocking
> > > code
> > > > > > > merging
> > > > > > > > > for a long time?
> > > > > > > > > I tried to think about the possible steps, and found there
> > are
> > > > some
> > > > > > > > > detailed problems that need to be discussed in a step
> > further:
> > > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > > 2. Set a deadline to find out the root cause and solve the
> > > > failure
> > > > > > for
> > > > > > > > the
> > > > > > > > > new created JIRA  because we could not block other commit
> > > merges
> > > > for
> > > > > > a
> > > > > > > > long
> > > > > > > > > time
> > > > > > > > >     When is a reasonable deadline here?
> > > > > > > > > 3. What to do if the JIRA has not made significant progress
> > > when
> > > > > > > reached
> > > > > > > > to
> > > > > > > > > the deadline time?
> > > > > > > > >     There are several situations as follows, maybe
> different
> > > > cases
> > > > > > need
> > > > > > > > > different approaches.
> > > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > > >     2. not found the root cause yet
> > > > > > > > >     3. not found a good solution, but already found the
> root
> > > > cause
> > > > > > > > >     4. found a solution, but it needs more time to be done.
> > > > > > > > > 4. If we disable the respective tests temporarily, we also
> > > need a
> > > > > > > > mechanism
> > > > > > > > > to ensure the issue would be continued to be investigated
> in
> > > the
> > > > > > > future.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > JING ZHANG
> > > > > > > > >
> > > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > > > > > >
> > > > > > > > > > +1 to Xintong's proposal
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > > trohrmann@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I would first try to not introduce the exception for
> > local
> > > > > > builds.
> > > > > > > It
> > > > > > > > > > makes
> > > > > > > > > > > it quite hard for others to verify the build and to
> make
> > > sure
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > right things were executed. If we see that this becomes
> > an
> > > > issue
> > > > > > > then
> > > > > > > > > we
> > > > > > > > > > > can revisit this idea.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > > karmagyz@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 for appending this to community guidelines for
> > merging
> > > > PRs.
> > > > > > > > > > > >
> > > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > > I agree that with this approach unstable tests will
> not
> > > > block
> > > > > > > other
> > > > > > > > > > > > commit merges. However, it might be hard to prevent
> > > merging
> > > > > > > commits
> > > > > > > > > > > > that are related to those tests and should have been
> > > passed
> > > > > > them.
> > > > > > > > > It's
> > > > > > > > > > > > true that this judgment can be made by the
> committers,
> > > but
> > > > no
> > > > > > one
> > > > > > > > can
> > > > > > > > > > > > ensure the judgment is always precise and so that we
> > have
> > > > this
> > > > > > > > > > > > discussion thread.
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding the unstable tests, how about adding
> another
> > > > > > exception:
> > > > > > > > > > > > committers verify it in their local environment and
> > > > comment in
> > > > > > > such
> > > > > > > > > > > > cases?
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > > liujiangangpeng@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > It is a good principle to run all tests
> successfully
> > > > with any
> > > > > > > > > change.
> > > > > > > > > > > > This
> > > > > > > > > > > > > means a lot for project's stability and
> development.
> > I
> > > > am big
> > > > > > > +1
> > > > > > > > > for
> > > > > > > > > > > this
> > > > > > > > > > > > > proposal.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best
> > > > > > > > > > > > > liujiangang
> > > > > > > > > > > > >
> > > > > > > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二
> > > > 下午6:36写道：
> > > > > > > > > > > > >
> > > > > > > > > > > > > > One way to address the problem of regularly
> failing
> > > > tests
> > > > > > > that
> > > > > > > > > > block
> > > > > > > > > > > > > > merging of PRs is to disable the respective tests
> > for
> > > > the
> > > > > > > time
> > > > > > > > > > being.
> > > > > > > > > > > > Of
> > > > > > > > > > > > > > course, the failing test then needs to be fixed.
> > But
> > > at
> > > > > > least
> > > > > > > > > that
> > > > > > > > > > > way
> > > > > > > > > > > > we
> > > > > > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > > > > > arvid@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think this is overall a good idea. So +1 from
> > my
> > > > side.
> > > > > > > > > > > > > > > However, I'd like to put a higher priority on
> > > > > > > infrastructure
> > > > > > > > > > then,
> > > > > > > > > > > in
> > > > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann
> <
> > > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for bringing this topic to our
> attention
> > > > > > Xintong.
> > > > > > > I
> > > > > > > > > > think
> > > > > > > > > > > > your
> > > > > > > > > > > > > > > > proposal makes a lot of sense and we should
> > > follow
> > > > it.
> > > > > > It
> > > > > > > > > will
> > > > > > > > > > > > give us
> > > > > > > > > > > > > > > > confidence that our changes are working and
> it
> > > > might
> > > > > > be a
> > > > > > > > > good
> > > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong
> Song <
> > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In the past a couple of weeks, I've
> observed
> > > > several
> > > > > > > > times
> > > > > > > > > > that
> > > > > > > > > > > > PRs
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > merged without a green light from the CI
> > tests,
> > > > where
> > > > > > > > > failure
> > > > > > > > > > > > cases
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > considered *unrelated*. This may not always
> > > cause
> > > > > > > > problems,
> > > > > > > > > > but
> > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > increase the chance of breaking our code
> > base.
> > > In
> > > > > > fact,
> > > > > > > > it
> > > > > > > > > > has
> > > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > me twice in the past few weeks that I had
> to
> > > > revert a
> > > > > > > > > commit
> > > > > > > > > > > > which
> > > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think it would be nicer to enforce a
> > stricter
> > > > rule,
> > > > > > > > that
> > > > > > > > > no
> > > > > > > > > > > PRs
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The problems of merging PRs with
> "unrelated"
> > > test
> > > > > > > > failures
> > > > > > > > > > are:
> > > > > > > > > > > > > > > > > - It's not always straightforward to tell
> > > > whether a
> > > > > > > test
> > > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > > - It prevents subsequent test cases from
> > being
> > > > > > > executed,
> > > > > > > > > > which
> > > > > > > > > > > > may
> > > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > To make things easier for the committers,
> the
> > > > > > following
> > > > > > > > > > > > exceptions
> > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > > - The PR has passed CI in the contributor's
> > > > personal
> > > > > > > > > > workspace.
> > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > > - The CI tests have been triggered multiple
> > > > times, on
> > > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > each stage has at least passed for once.
> > Please
> > > > also
> > > > > > > > > comment
> > > > > > > > > > in
> > > > > > > > > > > > such
> > > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If we all agree on this, I'd update the
> > > community
> > > > > > > > > guidelines
> > > > > > > > > > > for
> > > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best, Jingsong Lee
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

One quick note for clarification. I don't have anything against builds
running on your personal Azure account and this is not what I understood
under "local environment". For me "local environment" means that someone
runs the test locally on his machine and then says that the
tests have passed locally.

I do agree that there might be a conflict of interests if a PR author
disables tests. Here I would argue that we don't have malignant committers
which means that every committer will probably first check the respective
ticket for how often the test failed. Then I guess the next step would be
to discuss on the ticket whether to disable it or not. And finally, after
reaching a consensus, it will be disabled. If we see someone abusing this
policy, then we can still think about how to guard against it. But,
honestly, I have very rarely seen such a case. I am also ok to pull in the
release manager to make the final call if this resolves concerns.

Cheers,
Till

On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <pn...@apache.org> wrote:

> +1 for the general idea, however I have concerns about a couple of details.
>
> > I would first try to not introduce the exception for local builds.
> > It makes it quite hard for others to verify the build and to make sure
> that the right things were executed.
>
> I would counter Till's proposal to ignore local green builds. If committer
> is merging and closing a PR, with official azure failure, but there was a
> green build before or in local azure it's IMO enough to leave the message:
>
> > Latest build failure is a known issue: FLINK-12345
> > Green local build: URL
>
> This should address Till's concern about verification.
>
> On the other hand I have concerns about disabling tests.* It shouldn't be
> the PR author/committer that's disabling a test on his own, as that's a
> conflict of interests*. I have however no problems with disabling test
> instabilities that were marked as "blockers" though, that should work
> pretty well. But the important thing here is to correctly judge bumping
> priorities of test instabilities based on their frequency and current
> general health of the system. I believe that release managers should be
> playing a big role here in deciding on the guidelines of what should be a
> priority of certain test instabilities.
>
> What I mean by that is two example scenarios:
> 1. if we have a handful of very frequently failing tests and a handful of
> very rarely failing tests (like one reported failure and no another
> occurrence in many months, and let's even say that the failure looks like
> infrastructure/network timeout), we should focus on the frequently failing
> ones, and probably we are safe to ignore for the time being the rare issues
> - at least until we deal with the most pressing ones.
> 2. If we have tons of rarely failing test instabilities, we should probably
> start addressing them as blockers.
>
> I'm using my own conscious and my best judgement when I'm
> bumping/decreasing priorities of test instabilities (and bugs), but as
> individual committer I don't have the full picture. As I wrote above, I
> think release managers are in a much better position to keep adjusting
> those kind of guidelines.
>
> Best, Piotrek
>
> pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):
>
> > +1 for Xintong's proposal.
> >
> > For me, resolving problems directly (fixing the infrastructure issue,
> > disabling unstable tests and creating blocker JIRAs to track the fix and
> > re-enable them asap, etc.) is (in most cases) better than working around
> > them (verify locally, manually check and judge the failure as
> "unrelated",
> > etc.), and I believe the proposal could help us pushing those more "real"
> > solutions forward.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Creating a blocker issue for the manually disabled tests sounds good to
> > me.
> > >
> > > Minor: I'm still a bit worried about the commits merged before we fix
> > > the unstable tests can also break those tests. Instead of letting the
> > > assigners keep a look at all potentially related commits, they can
> > > maintain a branch that is periodically synced with the master branch
> > > while enabling the unstable test. So that they can catch the breaking
> > > changes asap.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > I like the idea of creating a blocker issue for a disabled test. This
> > > will
> > > > force us to resolve it in a timely manner and it won't fall through
> the
> > > > cracks.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <ji...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 to Xintong's proposal
> > > > >
> > > > > I also have some concerns about unstable cases.
> > > > >
> > > > > I think unstable cases can be divided into these types:
> > > > >
> > > > > - Force majeure: For example, network timeout, sudden environmental
> > > > > collapse, they are accidental and can always be solved by
> triggering
> > > azure
> > > > > again. Committers should wait for the next green azure.
> > > > >
> > > > > - Obvious mistakes: For example, some errors caused by obvious
> > reasons
> > > may
> > > > > be repaired quickly. At this time, do we need to wait, or not wait
> > and
> > > just
> > > > > ignore?
> > > > >
> > > > > - Difficult questions: These problems are very difficult to find.
> > There
> > > > > will be no solution for a while and a half. We don't even know the
> > > reason.
> > > > > At this time, we should ignore it. (Maybe it's judged by the author
> > of
> > > the
> > > > > case. But what about the old case whose author can't be found?)
> > > > >
> > > > > So, the ignored cases should be the block of the next release until
> > the
> > > > > reason is found or the case is fixed?  We need to ensure that
> someone
> > > will
> > > > > take care of these cases, because there is no deepening of failed
> > > tests, no
> > > > > one may continue to pay attention to these cases.
> > > > >
> > > > > I think this guideline should consider these situations, and show
> how
> > > to
> > > > > solve them.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:
> > > > >
> > > > > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > > > > >
> > > > > > However, I think it's still not very clear how we address the
> > > unstable
> > > > > > tests.
> > > > > > I think this is a very important part of this new guideline.
> > > > > >
> > > > > > According to the discussion above, if some tests are unstable, we
> > can
> > > > > > manually disable it.
> > > > > > But I have some questions in my mind:
> > > > > > 1) Is the instability judged by the committer themselves or by
> some
> > > > > > metrics?
> > > > > > 2) Should we log the disable commit in the corresponding issue
> and
> > > > > increase
> > > > > > the priority?
> > > > > > 3) What if nobody looks into this issue and this becomes some
> > > potential
> > > > > > bugs released with the new version?
> > > > > > 4) If no person is actively working on the issue, who should
> > > re-enable
> > > > > it?
> > > > > > Would it block PRs again?
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <
> tonysong820@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks all for the feedback.
> > > > > > >
> > > > > > > @Till @Yangze
> > > > > > >
> > > > > > > I'm also not convinced by the idea of having an exception for
> > local
> > > > > > builds.
> > > > > > > We need to execute the entire build (or at least the failing
> > stage)
> > > > > > > locally, to make sure subsequent test cases prevented by the
> > > failure
> > > > > one
> > > > > > > are all executed. In that case, it's probably easier to rerun
> the
> > > build
> > > > > > on
> > > > > > > azure than locally.
> > > > > > >
> > > > > > > Concerning disabling unstable test cases that regularly block
> PRs
> > > from
> > > > > > > merging, maybe we can say that such cases can only be disabled
> > when
> > > > > > someone
> > > > > > > is actively looking into it, likely the person who disabled the
> > > case.
> > > > > If
> > > > > > > this person is no longer actively working on it, he/she should
> > > enable
> > > > > the
> > > > > > > case again no matter if it is fixed or not.
> > > > > > >
> > > > > > > @Jing
> > > > > > >
> > > > > > > Thanks for the suggestions.
> > > > > > >
> > > > > > > +1 to provide guidelines on handling test failures.
> > > > > > >
> > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > >
> > > > > > >
> > > > > > > +1 on this. Currently, the release managers are monitoring the
> ci
> > > and
> > > > > > cron
> > > > > > > build instabilities and reporting them on JIRA. We should also
> > > > > encourage
> > > > > > > other contributors to do that for PRs.
> > > > > > >
> > > > > > > 2. Set a deadline to find out the root cause and solve the
> > failure
> > > for
> > > > > > the
> > > > > > > > new created JIRA  because we could not block other commit
> > merges
> > > for
> > > > > a
> > > > > > > long
> > > > > > > > time
> > > > > > > >
> > > > > > > 3. What to do if the JIRA has not made significant progress
> when
> > > > > reached
> > > > > > to
> > > > > > > > the deadline time?
> > > > > > >
> > > > > > >
> > > > > > > I'm not sure about these two. It feels a bit against the
> > voluntary
> > > > > nature
> > > > > > > of open source projects.
> > > > > > >
> > > > > > > IMHO, frequent instabilities are more likely to be upgraded to
> > the
> > > > > > critical
> > > > > > > / blocker priority, receive more attention and eventually get
> > > fixed.
> > > > > > > Release managers are also responsible for looking for assignees
> > for
> > > > > such
> > > > > > > issues. If a case is still not fixed soonish, even with all
> these
> > > > > > efforts,
> > > > > > > I'm not sure how setting a deadline can help this.
> > > > > > >
> > > > > > > 4. If we disable the respective tests temporarily, we also
> need a
> > > > > > mechanism
> > > > > > > > to ensure the issue would be continued to be investigated in
> > the
> > > > > > future.
> > > > > > > >
> > > > > > >
> > > > > > > +1. As mentioned above, we may consider disabling such tests
> iff
> > > > > someone
> > > > > > is
> > > > > > > actively working on it.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <
> beyond1920@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Xintong,
> > > > > > > > +1 to the proposal.
> > > > > > > > In order to better comply with the rule, it is necessary to
> > > describe
> > > > > > > what's
> > > > > > > > best practice if encountering test failure which seems
> > unrelated
> > > with
> > > > > > the
> > > > > > > > current commits.
> > > > > > > > How to avoid merging PR with test failures and not blocking
> > code
> > > > > > merging
> > > > > > > > for a long time?
> > > > > > > > I tried to think about the possible steps, and found there
> are
> > > some
> > > > > > > > detailed problems that need to be discussed in a step
> further:
> > > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > > 2. Set a deadline to find out the root cause and solve the
> > > failure
> > > > > for
> > > > > > > the
> > > > > > > > new created JIRA  because we could not block other commit
> > merges
> > > for
> > > > > a
> > > > > > > long
> > > > > > > > time
> > > > > > > >     When is a reasonable deadline here?
> > > > > > > > 3. What to do if the JIRA has not made significant progress
> > when
> > > > > > reached
> > > > > > > to
> > > > > > > > the deadline time?
> > > > > > > >     There are several situations as follows, maybe different
> > > cases
> > > > > need
> > > > > > > > different approaches.
> > > > > > > >     1. the JIRA is non-assigned yet
> > > > > > > >     2. not found the root cause yet
> > > > > > > >     3. not found a good solution, but already found the root
> > > cause
> > > > > > > >     4. found a solution, but it needs more time to be done.
> > > > > > > > 4. If we disable the respective tests temporarily, we also
> > need a
> > > > > > > mechanism
> > > > > > > > to ensure the issue would be continued to be investigated in
> > the
> > > > > > future.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > JING ZHANG
> > > > > > > >
> > > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > > > > >
> > > > > > > > > +1 to Xintong's proposal
> > > > > > > > >
> > > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > > trohrmann@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I would first try to not introduce the exception for
> local
> > > > > builds.
> > > > > > It
> > > > > > > > > makes
> > > > > > > > > > it quite hard for others to verify the build and to make
> > sure
> > > > > that
> > > > > > > the
> > > > > > > > > > right things were executed. If we see that this becomes
> an
> > > issue
> > > > > > then
> > > > > > > > we
> > > > > > > > > > can revisit this idea.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > > karmagyz@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 for appending this to community guidelines for
> merging
> > > PRs.
> > > > > > > > > > >
> > > > > > > > > > > @Till Rohrmann
> > > > > > > > > > > I agree that with this approach unstable tests will not
> > > block
> > > > > > other
> > > > > > > > > > > commit merges. However, it might be hard to prevent
> > merging
> > > > > > commits
> > > > > > > > > > > that are related to those tests and should have been
> > passed
> > > > > them.
> > > > > > > > It's
> > > > > > > > > > > true that this judgment can be made by the committers,
> > but
> > > no
> > > > > one
> > > > > > > can
> > > > > > > > > > > ensure the judgment is always precise and so that we
> have
> > > this
> > > > > > > > > > > discussion thread.
> > > > > > > > > > >
> > > > > > > > > > > Regarding the unstable tests, how about adding another
> > > > > exception:
> > > > > > > > > > > committers verify it in their local environment and
> > > comment in
> > > > > > such
> > > > > > > > > > > cases?
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > > liujiangangpeng@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > It is a good principle to run all tests successfully
> > > with any
> > > > > > > > change.
> > > > > > > > > > > This
> > > > > > > > > > > > means a lot for project's stability and development.
> I
> > > am big
> > > > > > +1
> > > > > > > > for
> > > > > > > > > > this
> > > > > > > > > > > > proposal.
> > > > > > > > > > > >
> > > > > > > > > > > > Best
> > > > > > > > > > > > liujiangang
> > > > > > > > > > > >
> > > > > > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二
> > > 下午6:36写道：
> > > > > > > > > > > >
> > > > > > > > > > > > > One way to address the problem of regularly failing
> > > tests
> > > > > > that
> > > > > > > > > block
> > > > > > > > > > > > > merging of PRs is to disable the respective tests
> for
> > > the
> > > > > > time
> > > > > > > > > being.
> > > > > > > > > > > Of
> > > > > > > > > > > > > course, the failing test then needs to be fixed.
> But
> > at
> > > > > least
> > > > > > > > that
> > > > > > > > > > way
> > > > > > > > > > > we
> > > > > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Till
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > > > > arvid@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I think this is overall a good idea. So +1 from
> my
> > > side.
> > > > > > > > > > > > > > However, I'd like to put a higher priority on
> > > > > > infrastructure
> > > > > > > > > then,
> > > > > > > > > > in
> > > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > > > > > > trohrmann@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for bringing this topic to our attention
> > > > > Xintong.
> > > > > > I
> > > > > > > > > think
> > > > > > > > > > > your
> > > > > > > > > > > > > > > proposal makes a lot of sense and we should
> > follow
> > > it.
> > > > > It
> > > > > > > > will
> > > > > > > > > > > give us
> > > > > > > > > > > > > > > confidence that our changes are working and it
> > > might
> > > > > be a
> > > > > > > > good
> > > > > > > > > > > > > incentive
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Till
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In the past a couple of weeks, I've observed
> > > several
> > > > > > > times
> > > > > > > > > that
> > > > > > > > > > > PRs
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > merged without a green light from the CI
> tests,
> > > where
> > > > > > > > failure
> > > > > > > > > > > cases
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > considered *unrelated*. This may not always
> > cause
> > > > > > > problems,
> > > > > > > > > but
> > > > > > > > > > > would
> > > > > > > > > > > > > > > > increase the chance of breaking our code
> base.
> > In
> > > > > fact,
> > > > > > > it
> > > > > > > > > has
> > > > > > > > > > > > > occurred
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > me twice in the past few weeks that I had to
> > > revert a
> > > > > > > > commit
> > > > > > > > > > > which
> > > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think it would be nicer to enforce a
> stricter
> > > rule,
> > > > > > > that
> > > > > > > > no
> > > > > > > > > > PRs
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The problems of merging PRs with "unrelated"
> > test
> > > > > > > failures
> > > > > > > > > are:
> > > > > > > > > > > > > > > > - It's not always straightforward to tell
> > > whether a
> > > > > > test
> > > > > > > > > > > failures are
> > > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > > - It prevents subsequent test cases from
> being
> > > > > > executed,
> > > > > > > > > which
> > > > > > > > > > > may
> > > > > > > > > > > > > fail
> > > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To make things easier for the committers, the
> > > > > following
> > > > > > > > > > > exceptions
> > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > > - The PR has passed CI in the contributor's
> > > personal
> > > > > > > > > workspace.
> > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > > - The CI tests have been triggered multiple
> > > times, on
> > > > > > the
> > > > > > > > > same
> > > > > > > > > > > > > commit,
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > each stage has at least passed for once.
> Please
> > > also
> > > > > > > > comment
> > > > > > > > > in
> > > > > > > > > > > such
> > > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If we all agree on this, I'd update the
> > community
> > > > > > > > guidelines
> > > > > > > > > > for
> > > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best, Jingsong Lee
> > > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Piotr Nowojski <pn...@apache.org>.

+1 for the general idea, however I have concerns about a couple of details.

> I would first try to not introduce the exception for local builds.
> It makes it quite hard for others to verify the build and to make sure
that the right things were executed.

I would counter Till's proposal to ignore local green builds. If committer
is merging and closing a PR, with official azure failure, but there was a
green build before or in local azure it's IMO enough to leave the message:

> Latest build failure is a known issue: FLINK-12345
> Green local build: URL

This should address Till's concern about verification.

On the other hand I have concerns about disabling tests.* It shouldn't be
the PR author/committer that's disabling a test on his own, as that's a
conflict of interests*. I have however no problems with disabling test
instabilities that were marked as "blockers" though, that should work
pretty well. But the important thing here is to correctly judge bumping
priorities of test instabilities based on their frequency and current
general health of the system. I believe that release managers should be
playing a big role here in deciding on the guidelines of what should be a
priority of certain test instabilities.

What I mean by that is two example scenarios:
1. if we have a handful of very frequently failing tests and a handful of
very rarely failing tests (like one reported failure and no another
occurrence in many months, and let's even say that the failure looks like
infrastructure/network timeout), we should focus on the frequently failing
ones, and probably we are safe to ignore for the time being the rare issues
- at least until we deal with the most pressing ones.
2. If we have tons of rarely failing test instabilities, we should probably
start addressing them as blockers.

I'm using my own conscious and my best judgement when I'm
bumping/decreasing priorities of test instabilities (and bugs), but as
individual committer I don't have the full picture. As I wrote above, I
think release managers are in a much better position to keep adjusting
those kind of guidelines.

Best, Piotrek

pt., 25 cze 2021 o 08:10 Yu Li <ca...@gmail.com> napisał(a):

> +1 for Xintong's proposal.
>
> For me, resolving problems directly (fixing the infrastructure issue,
> disabling unstable tests and creating blocker JIRAs to track the fix and
> re-enable them asap, etc.) is (in most cases) better than working around
> them (verify locally, manually check and judge the failure as "unrelated",
> etc.), and I believe the proposal could help us pushing those more "real"
> solutions forward.
>
> Best Regards,
> Yu
>
>
> On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com> wrote:
>
> > Creating a blocker issue for the manually disabled tests sounds good to
> me.
> >
> > Minor: I'm still a bit worried about the commits merged before we fix
> > the unstable tests can also break those tests. Instead of letting the
> > assigners keep a look at all potentially related commits, they can
> > maintain a branch that is periodically synced with the master branch
> > while enabling the unstable test. So that they can catch the breaking
> > changes asap.
> >
> > Best,
> > Yangze Guo
> >
> > On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > I like the idea of creating a blocker issue for a disabled test. This
> > will
> > > force us to resolve it in a timely manner and it won't fall through the
> > > cracks.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <ji...@gmail.com>
> > wrote:
> > >
> > > > +1 to Xintong's proposal
> > > >
> > > > I also have some concerns about unstable cases.
> > > >
> > > > I think unstable cases can be divided into these types:
> > > >
> > > > - Force majeure: For example, network timeout, sudden environmental
> > > > collapse, they are accidental and can always be solved by triggering
> > azure
> > > > again. Committers should wait for the next green azure.
> > > >
> > > > - Obvious mistakes: For example, some errors caused by obvious
> reasons
> > may
> > > > be repaired quickly. At this time, do we need to wait, or not wait
> and
> > just
> > > > ignore?
> > > >
> > > > - Difficult questions: These problems are very difficult to find.
> There
> > > > will be no solution for a while and a half. We don't even know the
> > reason.
> > > > At this time, we should ignore it. (Maybe it's judged by the author
> of
> > the
> > > > case. But what about the old case whose author can't be found?)
> > > >
> > > > So, the ignored cases should be the block of the next release until
> the
> > > > reason is found or the case is fixed?  We need to ensure that someone
> > will
> > > > take care of these cases, because there is no deepening of failed
> > tests, no
> > > > one may continue to pay attention to these cases.
> > > >
> > > > I think this guideline should consider these situations, and show how
> > to
> > > > solve them.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:
> > > >
> > > > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > > > >
> > > > > However, I think it's still not very clear how we address the
> > unstable
> > > > > tests.
> > > > > I think this is a very important part of this new guideline.
> > > > >
> > > > > According to the discussion above, if some tests are unstable, we
> can
> > > > > manually disable it.
> > > > > But I have some questions in my mind:
> > > > > 1) Is the instability judged by the committer themselves or by some
> > > > > metrics?
> > > > > 2) Should we log the disable commit in the corresponding issue and
> > > > increase
> > > > > the priority?
> > > > > 3) What if nobody looks into this issue and this becomes some
> > potential
> > > > > bugs released with the new version?
> > > > > 4) If no person is actively working on the issue, who should
> > re-enable
> > > > it?
> > > > > Would it block PRs again?
> > > > >
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks all for the feedback.
> > > > > >
> > > > > > @Till @Yangze
> > > > > >
> > > > > > I'm also not convinced by the idea of having an exception for
> local
> > > > > builds.
> > > > > > We need to execute the entire build (or at least the failing
> stage)
> > > > > > locally, to make sure subsequent test cases prevented by the
> > failure
> > > > one
> > > > > > are all executed. In that case, it's probably easier to rerun the
> > build
> > > > > on
> > > > > > azure than locally.
> > > > > >
> > > > > > Concerning disabling unstable test cases that regularly block PRs
> > from
> > > > > > merging, maybe we can say that such cases can only be disabled
> when
> > > > > someone
> > > > > > is actively looking into it, likely the person who disabled the
> > case.
> > > > If
> > > > > > this person is no longer actively working on it, he/she should
> > enable
> > > > the
> > > > > > case again no matter if it is fixed or not.
> > > > > >
> > > > > > @Jing
> > > > > >
> > > > > > Thanks for the suggestions.
> > > > > >
> > > > > > +1 to provide guidelines on handling test failures.
> > > > > >
> > > > > > 1. Report the test failures in the JIRA.
> > > > > > >
> > > > > >
> > > > > > +1 on this. Currently, the release managers are monitoring the ci
> > and
> > > > > cron
> > > > > > build instabilities and reporting them on JIRA. We should also
> > > > encourage
> > > > > > other contributors to do that for PRs.
> > > > > >
> > > > > > 2. Set a deadline to find out the root cause and solve the
> failure
> > for
> > > > > the
> > > > > > > new created JIRA  because we could not block other commit
> merges
> > for
> > > > a
> > > > > > long
> > > > > > > time
> > > > > > >
> > > > > > 3. What to do if the JIRA has not made significant progress when
> > > > reached
> > > > > to
> > > > > > > the deadline time?
> > > > > >
> > > > > >
> > > > > > I'm not sure about these two. It feels a bit against the
> voluntary
> > > > nature
> > > > > > of open source projects.
> > > > > >
> > > > > > IMHO, frequent instabilities are more likely to be upgraded to
> the
> > > > > critical
> > > > > > / blocker priority, receive more attention and eventually get
> > fixed.
> > > > > > Release managers are also responsible for looking for assignees
> for
> > > > such
> > > > > > issues. If a case is still not fixed soonish, even with all these
> > > > > efforts,
> > > > > > I'm not sure how setting a deadline can help this.
> > > > > >
> > > > > > 4. If we disable the respective tests temporarily, we also need a
> > > > > mechanism
> > > > > > > to ensure the issue would be continued to be investigated in
> the
> > > > > future.
> > > > > > >
> > > > > >
> > > > > > +1. As mentioned above, we may consider disabling such tests iff
> > > > someone
> > > > > is
> > > > > > actively working on it.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <beyond1920@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Xintong,
> > > > > > > +1 to the proposal.
> > > > > > > In order to better comply with the rule, it is necessary to
> > describe
> > > > > > what's
> > > > > > > best practice if encountering test failure which seems
> unrelated
> > with
> > > > > the
> > > > > > > current commits.
> > > > > > > How to avoid merging PR with test failures and not blocking
> code
> > > > > merging
> > > > > > > for a long time?
> > > > > > > I tried to think about the possible steps, and found there are
> > some
> > > > > > > detailed problems that need to be discussed in a step further:
> > > > > > > 1. Report the test failures in the JIRA.
> > > > > > > 2. Set a deadline to find out the root cause and solve the
> > failure
> > > > for
> > > > > > the
> > > > > > > new created JIRA  because we could not block other commit
> merges
> > for
> > > > a
> > > > > > long
> > > > > > > time
> > > > > > >     When is a reasonable deadline here?
> > > > > > > 3. What to do if the JIRA has not made significant progress
> when
> > > > > reached
> > > > > > to
> > > > > > > the deadline time?
> > > > > > >     There are several situations as follows, maybe different
> > cases
> > > > need
> > > > > > > different approaches.
> > > > > > >     1. the JIRA is non-assigned yet
> > > > > > >     2. not found the root cause yet
> > > > > > >     3. not found a good solution, but already found the root
> > cause
> > > > > > >     4. found a solution, but it needs more time to be done.
> > > > > > > 4. If we disable the respective tests temporarily, we also
> need a
> > > > > > mechanism
> > > > > > > to ensure the issue would be continued to be investigated in
> the
> > > > > future.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > JING ZHANG
> > > > > > >
> > > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > > > >
> > > > > > > > +1 to Xintong's proposal
> > > > > > > >
> > > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I would first try to not introduce the exception for local
> > > > builds.
> > > > > It
> > > > > > > > makes
> > > > > > > > > it quite hard for others to verify the build and to make
> sure
> > > > that
> > > > > > the
> > > > > > > > > right things were executed. If we see that this becomes an
> > issue
> > > > > then
> > > > > > > we
> > > > > > > > > can revisit this idea.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> > karmagyz@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for appending this to community guidelines for merging
> > PRs.
> > > > > > > > > >
> > > > > > > > > > @Till Rohrmann
> > > > > > > > > > I agree that with this approach unstable tests will not
> > block
> > > > > other
> > > > > > > > > > commit merges. However, it might be hard to prevent
> merging
> > > > > commits
> > > > > > > > > > that are related to those tests and should have been
> passed
> > > > them.
> > > > > > > It's
> > > > > > > > > > true that this judgment can be made by the committers,
> but
> > no
> > > > one
> > > > > > can
> > > > > > > > > > ensure the judgment is always precise and so that we have
> > this
> > > > > > > > > > discussion thread.
> > > > > > > > > >
> > > > > > > > > > Regarding the unstable tests, how about adding another
> > > > exception:
> > > > > > > > > > committers verify it in their local environment and
> > comment in
> > > > > such
> > > > > > > > > > cases?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> > liujiangangpeng@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > It is a good principle to run all tests successfully
> > with any
> > > > > > > change.
> > > > > > > > > > This
> > > > > > > > > > > means a lot for project's stability and development. I
> > am big
> > > > > +1
> > > > > > > for
> > > > > > > > > this
> > > > > > > > > > > proposal.
> > > > > > > > > > >
> > > > > > > > > > > Best
> > > > > > > > > > > liujiangang
> > > > > > > > > > >
> > > > > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二
> > 下午6:36写道：
> > > > > > > > > > >
> > > > > > > > > > > > One way to address the problem of regularly failing
> > tests
> > > > > that
> > > > > > > > block
> > > > > > > > > > > > merging of PRs is to disable the respective tests for
> > the
> > > > > time
> > > > > > > > being.
> > > > > > > > > > Of
> > > > > > > > > > > > course, the failing test then needs to be fixed. But
> at
> > > > least
> > > > > > > that
> > > > > > > > > way
> > > > > > > > > > we
> > > > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > > > arvid@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I think this is overall a good idea. So +1 from my
> > side.
> > > > > > > > > > > > > However, I'd like to put a higher priority on
> > > > > infrastructure
> > > > > > > > then,
> > > > > > > > > in
> > > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > > > > > trohrmann@apache.org
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for bringing this topic to our attention
> > > > Xintong.
> > > > > I
> > > > > > > > think
> > > > > > > > > > your
> > > > > > > > > > > > > > proposal makes a lot of sense and we should
> follow
> > it.
> > > > It
> > > > > > > will
> > > > > > > > > > give us
> > > > > > > > > > > > > > confidence that our changes are working and it
> > might
> > > > be a
> > > > > > > good
> > > > > > > > > > > > incentive
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Till
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In the past a couple of weeks, I've observed
> > several
> > > > > > times
> > > > > > > > that
> > > > > > > > > > PRs
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > merged without a green light from the CI tests,
> > where
> > > > > > > failure
> > > > > > > > > > cases
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > considered *unrelated*. This may not always
> cause
> > > > > > problems,
> > > > > > > > but
> > > > > > > > > > would
> > > > > > > > > > > > > > > increase the chance of breaking our code base.
> In
> > > > fact,
> > > > > > it
> > > > > > > > has
> > > > > > > > > > > > occurred
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > me twice in the past few weeks that I had to
> > revert a
> > > > > > > commit
> > > > > > > > > > which
> > > > > > > > > > > > > breaks
> > > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think it would be nicer to enforce a stricter
> > rule,
> > > > > > that
> > > > > > > no
> > > > > > > > > PRs
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The problems of merging PRs with "unrelated"
> test
> > > > > > failures
> > > > > > > > are:
> > > > > > > > > > > > > > > - It's not always straightforward to tell
> > whether a
> > > > > test
> > > > > > > > > > failures are
> > > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > > - It prevents subsequent test cases from being
> > > > > executed,
> > > > > > > > which
> > > > > > > > > > may
> > > > > > > > > > > > fail
> > > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To make things easier for the committers, the
> > > > following
> > > > > > > > > > exceptions
> > > > > > > > > > > > > might
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > > - The PR has passed CI in the contributor's
> > personal
> > > > > > > > workspace.
> > > > > > > > > > > > Please
> > > > > > > > > > > > > > post
> > > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > > - The CI tests have been triggered multiple
> > times, on
> > > > > the
> > > > > > > > same
> > > > > > > > > > > > commit,
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > each stage has at least passed for once. Please
> > also
> > > > > > > comment
> > > > > > > > in
> > > > > > > > > > such
> > > > > > > > > > > > > > cases.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we all agree on this, I'd update the
> community
> > > > > > > guidelines
> > > > > > > > > for
> > > > > > > > > > > > > merging
> > > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Yu Li <ca...@gmail.com>.

+1 for Xintong's proposal.

For me, resolving problems directly (fixing the infrastructure issue,
disabling unstable tests and creating blocker JIRAs to track the fix and
re-enable them asap, etc.) is (in most cases) better than working around
them (verify locally, manually check and judge the failure as "unrelated",
etc.), and I believe the proposal could help us pushing those more "real"
solutions forward.

Best Regards,
Yu


On Fri, 25 Jun 2021 at 10:58, Yangze Guo <ka...@gmail.com> wrote:

> Creating a blocker issue for the manually disabled tests sounds good to me.
>
> Minor: I'm still a bit worried about the commits merged before we fix
> the unstable tests can also break those tests. Instead of letting the
> assigners keep a look at all potentially related commits, they can
> maintain a branch that is periodically synced with the master branch
> while enabling the unstable test. So that they can catch the breaking
> changes asap.
>
> Best,
> Yangze Guo
>
> On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > I like the idea of creating a blocker issue for a disabled test. This
> will
> > force us to resolve it in a timely manner and it won't fall through the
> > cracks.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <ji...@gmail.com>
> wrote:
> >
> > > +1 to Xintong's proposal
> > >
> > > I also have some concerns about unstable cases.
> > >
> > > I think unstable cases can be divided into these types:
> > >
> > > - Force majeure: For example, network timeout, sudden environmental
> > > collapse, they are accidental and can always be solved by triggering
> azure
> > > again. Committers should wait for the next green azure.
> > >
> > > - Obvious mistakes: For example, some errors caused by obvious reasons
> may
> > > be repaired quickly. At this time, do we need to wait, or not wait and
> just
> > > ignore?
> > >
> > > - Difficult questions: These problems are very difficult to find. There
> > > will be no solution for a while and a half. We don't even know the
> reason.
> > > At this time, we should ignore it. (Maybe it's judged by the author of
> the
> > > case. But what about the old case whose author can't be found?)
> > >
> > > So, the ignored cases should be the block of the next release until the
> > > reason is found or the case is fixed?  We need to ensure that someone
> will
> > > take care of these cases, because there is no deepening of failed
> tests, no
> > > one may continue to pay attention to these cases.
> > >
> > > I think this guideline should consider these situations, and show how
> to
> > > solve them.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:
> > >
> > > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > > >
> > > > However, I think it's still not very clear how we address the
> unstable
> > > > tests.
> > > > I think this is a very important part of this new guideline.
> > > >
> > > > According to the discussion above, if some tests are unstable, we can
> > > > manually disable it.
> > > > But I have some questions in my mind:
> > > > 1) Is the instability judged by the committer themselves or by some
> > > > metrics?
> > > > 2) Should we log the disable commit in the corresponding issue and
> > > increase
> > > > the priority?
> > > > 3) What if nobody looks into this issue and this becomes some
> potential
> > > > bugs released with the new version?
> > > > 4) If no person is actively working on the issue, who should
> re-enable
> > > it?
> > > > Would it block PRs again?
> > > >
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks all for the feedback.
> > > > >
> > > > > @Till @Yangze
> > > > >
> > > > > I'm also not convinced by the idea of having an exception for local
> > > > builds.
> > > > > We need to execute the entire build (or at least the failing stage)
> > > > > locally, to make sure subsequent test cases prevented by the
> failure
> > > one
> > > > > are all executed. In that case, it's probably easier to rerun the
> build
> > > > on
> > > > > azure than locally.
> > > > >
> > > > > Concerning disabling unstable test cases that regularly block PRs
> from
> > > > > merging, maybe we can say that such cases can only be disabled when
> > > > someone
> > > > > is actively looking into it, likely the person who disabled the
> case.
> > > If
> > > > > this person is no longer actively working on it, he/she should
> enable
> > > the
> > > > > case again no matter if it is fixed or not.
> > > > >
> > > > > @Jing
> > > > >
> > > > > Thanks for the suggestions.
> > > > >
> > > > > +1 to provide guidelines on handling test failures.
> > > > >
> > > > > 1. Report the test failures in the JIRA.
> > > > > >
> > > > >
> > > > > +1 on this. Currently, the release managers are monitoring the ci
> and
> > > > cron
> > > > > build instabilities and reporting them on JIRA. We should also
> > > encourage
> > > > > other contributors to do that for PRs.
> > > > >
> > > > > 2. Set a deadline to find out the root cause and solve the failure
> for
> > > > the
> > > > > > new created JIRA  because we could not block other commit merges
> for
> > > a
> > > > > long
> > > > > > time
> > > > > >
> > > > > 3. What to do if the JIRA has not made significant progress when
> > > reached
> > > > to
> > > > > > the deadline time?
> > > > >
> > > > >
> > > > > I'm not sure about these two. It feels a bit against the voluntary
> > > nature
> > > > > of open source projects.
> > > > >
> > > > > IMHO, frequent instabilities are more likely to be upgraded to the
> > > > critical
> > > > > / blocker priority, receive more attention and eventually get
> fixed.
> > > > > Release managers are also responsible for looking for assignees for
> > > such
> > > > > issues. If a case is still not fixed soonish, even with all these
> > > > efforts,
> > > > > I'm not sure how setting a deadline can help this.
> > > > >
> > > > > 4. If we disable the respective tests temporarily, we also need a
> > > > mechanism
> > > > > > to ensure the issue would be continued to be investigated in the
> > > > future.
> > > > > >
> > > > >
> > > > > +1. As mentioned above, we may consider disabling such tests iff
> > > someone
> > > > is
> > > > > actively working on it.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Xintong,
> > > > > > +1 to the proposal.
> > > > > > In order to better comply with the rule, it is necessary to
> describe
> > > > > what's
> > > > > > best practice if encountering test failure which seems unrelated
> with
> > > > the
> > > > > > current commits.
> > > > > > How to avoid merging PR with test failures and not blocking code
> > > > merging
> > > > > > for a long time?
> > > > > > I tried to think about the possible steps, and found there are
> some
> > > > > > detailed problems that need to be discussed in a step further:
> > > > > > 1. Report the test failures in the JIRA.
> > > > > > 2. Set a deadline to find out the root cause and solve the
> failure
> > > for
> > > > > the
> > > > > > new created JIRA  because we could not block other commit merges
> for
> > > a
> > > > > long
> > > > > > time
> > > > > >     When is a reasonable deadline here?
> > > > > > 3. What to do if the JIRA has not made significant progress when
> > > > reached
> > > > > to
> > > > > > the deadline time?
> > > > > >     There are several situations as follows, maybe different
> cases
> > > need
> > > > > > different approaches.
> > > > > >     1. the JIRA is non-assigned yet
> > > > > >     2. not found the root cause yet
> > > > > >     3. not found a good solution, but already found the root
> cause
> > > > > >     4. found a solution, but it needs more time to be done.
> > > > > > 4. If we disable the respective tests temporarily, we also need a
> > > > > mechanism
> > > > > > to ensure the issue would be continued to be investigated in the
> > > > future.
> > > > > >
> > > > > > Best regards,
> > > > > > JING ZHANG
> > > > > >
> > > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > > >
> > > > > > > +1 to Xintong's proposal
> > > > > > >
> > > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I would first try to not introduce the exception for local
> > > builds.
> > > > It
> > > > > > > makes
> > > > > > > > it quite hard for others to verify the build and to make sure
> > > that
> > > > > the
> > > > > > > > right things were executed. If we see that this becomes an
> issue
> > > > then
> > > > > > we
> > > > > > > > can revisit this idea.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for appending this to community guidelines for merging
> PRs.
> > > > > > > > >
> > > > > > > > > @Till Rohrmann
> > > > > > > > > I agree that with this approach unstable tests will not
> block
> > > > other
> > > > > > > > > commit merges. However, it might be hard to prevent merging
> > > > commits
> > > > > > > > > that are related to those tests and should have been passed
> > > them.
> > > > > > It's
> > > > > > > > > true that this judgment can be made by the committers, but
> no
> > > one
> > > > > can
> > > > > > > > > ensure the judgment is always precise and so that we have
> this
> > > > > > > > > discussion thread.
> > > > > > > > >
> > > > > > > > > Regarding the unstable tests, how about adding another
> > > exception:
> > > > > > > > > committers verify it in their local environment and
> comment in
> > > > such
> > > > > > > > > cases?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <
> liujiangangpeng@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > It is a good principle to run all tests successfully
> with any
> > > > > > change.
> > > > > > > > > This
> > > > > > > > > > means a lot for project's stability and development. I
> am big
> > > > +1
> > > > > > for
> > > > > > > > this
> > > > > > > > > > proposal.
> > > > > > > > > >
> > > > > > > > > > Best
> > > > > > > > > > liujiangang
> > > > > > > > > >
> > > > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二
> 下午6:36写道：
> > > > > > > > > >
> > > > > > > > > > > One way to address the problem of regularly failing
> tests
> > > > that
> > > > > > > block
> > > > > > > > > > > merging of PRs is to disable the respective tests for
> the
> > > > time
> > > > > > > being.
> > > > > > > > > Of
> > > > > > > > > > > course, the failing test then needs to be fixed. But at
> > > least
> > > > > > that
> > > > > > > > way
> > > > > > > > > we
> > > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > > arvid@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I think this is overall a good idea. So +1 from my
> side.
> > > > > > > > > > > > However, I'd like to put a higher priority on
> > > > infrastructure
> > > > > > > then,
> > > > > > > > in
> > > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > > > > trohrmann@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for bringing this topic to our attention
> > > Xintong.
> > > > I
> > > > > > > think
> > > > > > > > > your
> > > > > > > > > > > > > proposal makes a lot of sense and we should follow
> it.
> > > It
> > > > > > will
> > > > > > > > > give us
> > > > > > > > > > > > > confidence that our changes are working and it
> might
> > > be a
> > > > > > good
> > > > > > > > > > > incentive
> > > > > > > > > > > > to
> > > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Till
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the past a couple of weeks, I've observed
> several
> > > > > times
> > > > > > > that
> > > > > > > > > PRs
> > > > > > > > > > > are
> > > > > > > > > > > > > > merged without a green light from the CI tests,
> where
> > > > > > failure
> > > > > > > > > cases
> > > > > > > > > > > are
> > > > > > > > > > > > > > considered *unrelated*. This may not always cause
> > > > > problems,
> > > > > > > but
> > > > > > > > > would
> > > > > > > > > > > > > > increase the chance of breaking our code base. In
> > > fact,
> > > > > it
> > > > > > > has
> > > > > > > > > > > occurred
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > me twice in the past few weeks that I had to
> revert a
> > > > > > commit
> > > > > > > > > which
> > > > > > > > > > > > breaks
> > > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it would be nicer to enforce a stricter
> rule,
> > > > > that
> > > > > > no
> > > > > > > > PRs
> > > > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The problems of merging PRs with "unrelated" test
> > > > > failures
> > > > > > > are:
> > > > > > > > > > > > > > - It's not always straightforward to tell
> whether a
> > > > test
> > > > > > > > > failures are
> > > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > > - It prevents subsequent test cases from being
> > > > executed,
> > > > > > > which
> > > > > > > > > may
> > > > > > > > > > > fail
> > > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To make things easier for the committers, the
> > > following
> > > > > > > > > exceptions
> > > > > > > > > > > > might
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > > - The PR has passed CI in the contributor's
> personal
> > > > > > > workspace.
> > > > > > > > > > > Please
> > > > > > > > > > > > > post
> > > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > > - The CI tests have been triggered multiple
> times, on
> > > > the
> > > > > > > same
> > > > > > > > > > > commit,
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > each stage has at least passed for once. Please
> also
> > > > > > comment
> > > > > > > in
> > > > > > > > > such
> > > > > > > > > > > > > cases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If we all agree on this, I'd update the community
> > > > > > guidelines
> > > > > > > > for
> > > > > > > > > > > > merging
> > > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Yangze Guo <ka...@gmail.com>.

Creating a blocker issue for the manually disabled tests sounds good to me.

Minor: I'm still a bit worried about the commits merged before we fix
the unstable tests can also break those tests. Instead of letting the
assigners keep a look at all potentially related commits, they can
maintain a branch that is periodically synced with the master branch
while enabling the unstable test. So that they can catch the breaking
changes asap.

Best,
Yangze Guo

On Thu, Jun 24, 2021 at 9:52 PM Till Rohrmann <tr...@apache.org> wrote:
>
> I like the idea of creating a blocker issue for a disabled test. This will
> force us to resolve it in a timely manner and it won't fall through the
> cracks.
>
> Cheers,
> Till
>
> On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <ji...@gmail.com> wrote:
>
> > +1 to Xintong's proposal
> >
> > I also have some concerns about unstable cases.
> >
> > I think unstable cases can be divided into these types:
> >
> > - Force majeure: For example, network timeout, sudden environmental
> > collapse, they are accidental and can always be solved by triggering azure
> > again. Committers should wait for the next green azure.
> >
> > - Obvious mistakes: For example, some errors caused by obvious reasons may
> > be repaired quickly. At this time, do we need to wait, or not wait and just
> > ignore?
> >
> > - Difficult questions: These problems are very difficult to find. There
> > will be no solution for a while and a half. We don't even know the reason.
> > At this time, we should ignore it. (Maybe it's judged by the author of the
> > case. But what about the old case whose author can't be found?)
> >
> > So, the ignored cases should be the block of the next release until the
> > reason is found or the case is fixed?  We need to ensure that someone will
> > take care of these cases, because there is no deepening of failed tests, no
> > one may continue to pay attention to these cases.
> >
> > I think this guideline should consider these situations, and show how to
> > solve them.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:
> >
> > > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> > >
> > > However, I think it's still not very clear how we address the unstable
> > > tests.
> > > I think this is a very important part of this new guideline.
> > >
> > > According to the discussion above, if some tests are unstable, we can
> > > manually disable it.
> > > But I have some questions in my mind:
> > > 1) Is the instability judged by the committer themselves or by some
> > > metrics?
> > > 2) Should we log the disable commit in the corresponding issue and
> > increase
> > > the priority?
> > > 3) What if nobody looks into this issue and this becomes some potential
> > > bugs released with the new version?
> > > 4) If no person is actively working on the issue, who should re-enable
> > it?
> > > Would it block PRs again?
> > >
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > Thanks all for the feedback.
> > > >
> > > > @Till @Yangze
> > > >
> > > > I'm also not convinced by the idea of having an exception for local
> > > builds.
> > > > We need to execute the entire build (or at least the failing stage)
> > > > locally, to make sure subsequent test cases prevented by the failure
> > one
> > > > are all executed. In that case, it's probably easier to rerun the build
> > > on
> > > > azure than locally.
> > > >
> > > > Concerning disabling unstable test cases that regularly block PRs from
> > > > merging, maybe we can say that such cases can only be disabled when
> > > someone
> > > > is actively looking into it, likely the person who disabled the case.
> > If
> > > > this person is no longer actively working on it, he/she should enable
> > the
> > > > case again no matter if it is fixed or not.
> > > >
> > > > @Jing
> > > >
> > > > Thanks for the suggestions.
> > > >
> > > > +1 to provide guidelines on handling test failures.
> > > >
> > > > 1. Report the test failures in the JIRA.
> > > > >
> > > >
> > > > +1 on this. Currently, the release managers are monitoring the ci and
> > > cron
> > > > build instabilities and reporting them on JIRA. We should also
> > encourage
> > > > other contributors to do that for PRs.
> > > >
> > > > 2. Set a deadline to find out the root cause and solve the failure for
> > > the
> > > > > new created JIRA  because we could not block other commit merges for
> > a
> > > > long
> > > > > time
> > > > >
> > > > 3. What to do if the JIRA has not made significant progress when
> > reached
> > > to
> > > > > the deadline time?
> > > >
> > > >
> > > > I'm not sure about these two. It feels a bit against the voluntary
> > nature
> > > > of open source projects.
> > > >
> > > > IMHO, frequent instabilities are more likely to be upgraded to the
> > > critical
> > > > / blocker priority, receive more attention and eventually get fixed.
> > > > Release managers are also responsible for looking for assignees for
> > such
> > > > issues. If a case is still not fixed soonish, even with all these
> > > efforts,
> > > > I'm not sure how setting a deadline can help this.
> > > >
> > > > 4. If we disable the respective tests temporarily, we also need a
> > > mechanism
> > > > > to ensure the issue would be continued to be investigated in the
> > > future.
> > > > >
> > > >
> > > > +1. As mentioned above, we may consider disabling such tests iff
> > someone
> > > is
> > > > actively working on it.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Xintong,
> > > > > +1 to the proposal.
> > > > > In order to better comply with the rule, it is necessary to describe
> > > > what's
> > > > > best practice if encountering test failure which seems unrelated with
> > > the
> > > > > current commits.
> > > > > How to avoid merging PR with test failures and not blocking code
> > > merging
> > > > > for a long time?
> > > > > I tried to think about the possible steps, and found there are some
> > > > > detailed problems that need to be discussed in a step further:
> > > > > 1. Report the test failures in the JIRA.
> > > > > 2. Set a deadline to find out the root cause and solve the failure
> > for
> > > > the
> > > > > new created JIRA  because we could not block other commit merges for
> > a
> > > > long
> > > > > time
> > > > >     When is a reasonable deadline here?
> > > > > 3. What to do if the JIRA has not made significant progress when
> > > reached
> > > > to
> > > > > the deadline time?
> > > > >     There are several situations as follows, maybe different cases
> > need
> > > > > different approaches.
> > > > >     1. the JIRA is non-assigned yet
> > > > >     2. not found the root cause yet
> > > > >     3. not found a good solution, but already found the root cause
> > > > >     4. found a solution, but it needs more time to be done.
> > > > > 4. If we disable the respective tests temporarily, we also need a
> > > > mechanism
> > > > > to ensure the issue would be continued to be investigated in the
> > > future.
> > > > >
> > > > > Best regards,
> > > > > JING ZHANG
> > > > >
> > > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > > >
> > > > > > +1 to Xintong's proposal
> > > > > >
> > > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I would first try to not introduce the exception for local
> > builds.
> > > It
> > > > > > makes
> > > > > > > it quite hard for others to verify the build and to make sure
> > that
> > > > the
> > > > > > > right things were executed. If we see that this becomes an issue
> > > then
> > > > > we
> > > > > > > can revisit this idea.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > +1 for appending this to community guidelines for merging PRs.
> > > > > > > >
> > > > > > > > @Till Rohrmann
> > > > > > > > I agree that with this approach unstable tests will not block
> > > other
> > > > > > > > commit merges. However, it might be hard to prevent merging
> > > commits
> > > > > > > > that are related to those tests and should have been passed
> > them.
> > > > > It's
> > > > > > > > true that this judgment can be made by the committers, but no
> > one
> > > > can
> > > > > > > > ensure the judgment is always precise and so that we have this
> > > > > > > > discussion thread.
> > > > > > > >
> > > > > > > > Regarding the unstable tests, how about adding another
> > exception:
> > > > > > > > committers verify it in their local environment and comment in
> > > such
> > > > > > > > cases?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <liujiangangpeng@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > It is a good principle to run all tests successfully with any
> > > > > change.
> > > > > > > > This
> > > > > > > > > means a lot for project's stability and development. I am big
> > > +1
> > > > > for
> > > > > > > this
> > > > > > > > > proposal.
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > liujiangang
> > > > > > > > >
> > > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > > > > > > >
> > > > > > > > > > One way to address the problem of regularly failing tests
> > > that
> > > > > > block
> > > > > > > > > > merging of PRs is to disable the respective tests for the
> > > time
> > > > > > being.
> > > > > > > > Of
> > > > > > > > > > course, the failing test then needs to be fixed. But at
> > least
> > > > > that
> > > > > > > way
> > > > > > > > we
> > > > > > > > > > would not block everyone from making progress.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > > arvid@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > > > > > However, I'd like to put a higher priority on
> > > infrastructure
> > > > > > then,
> > > > > > > in
> > > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > > > trohrmann@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for bringing this topic to our attention
> > Xintong.
> > > I
> > > > > > think
> > > > > > > > your
> > > > > > > > > > > > proposal makes a lot of sense and we should follow it.
> > It
> > > > > will
> > > > > > > > give us
> > > > > > > > > > > > confidence that our changes are working and it might
> > be a
> > > > > good
> > > > > > > > > > incentive
> > > > > > > > > > > to
> > > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Till
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > In the past a couple of weeks, I've observed several
> > > > times
> > > > > > that
> > > > > > > > PRs
> > > > > > > > > > are
> > > > > > > > > > > > > merged without a green light from the CI tests, where
> > > > > failure
> > > > > > > > cases
> > > > > > > > > > are
> > > > > > > > > > > > > considered *unrelated*. This may not always cause
> > > > problems,
> > > > > > but
> > > > > > > > would
> > > > > > > > > > > > > increase the chance of breaking our code base. In
> > fact,
> > > > it
> > > > > > has
> > > > > > > > > > occurred
> > > > > > > > > > > > to
> > > > > > > > > > > > > me twice in the past few weeks that I had to revert a
> > > > > commit
> > > > > > > > which
> > > > > > > > > > > breaks
> > > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it would be nicer to enforce a stricter rule,
> > > > that
> > > > > no
> > > > > > > PRs
> > > > > > > > > > > should
> > > > > > > > > > > > be
> > > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The problems of merging PRs with "unrelated" test
> > > > failures
> > > > > > are:
> > > > > > > > > > > > > - It's not always straightforward to tell whether a
> > > test
> > > > > > > > failures are
> > > > > > > > > > > > > related or not.
> > > > > > > > > > > > > - It prevents subsequent test cases from being
> > > executed,
> > > > > > which
> > > > > > > > may
> > > > > > > > > > fail
> > > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To make things easier for the committers, the
> > following
> > > > > > > > exceptions
> > > > > > > > > > > might
> > > > > > > > > > > > be
> > > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > > - The PR has passed CI in the contributor's personal
> > > > > > workspace.
> > > > > > > > > > Please
> > > > > > > > > > > > post
> > > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > > - The CI tests have been triggered multiple times, on
> > > the
> > > > > > same
> > > > > > > > > > commit,
> > > > > > > > > > > > and
> > > > > > > > > > > > > each stage has at least passed for once. Please also
> > > > > comment
> > > > > > in
> > > > > > > > such
> > > > > > > > > > > > cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we all agree on this, I'd update the community
> > > > > guidelines
> > > > > > > for
> > > > > > > > > > > merging
> > > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

I like the idea of creating a blocker issue for a disabled test. This will
force us to resolve it in a timely manner and it won't fall through the
cracks.

Cheers,
Till

On Thu, Jun 24, 2021 at 8:06 AM Jingsong Li <ji...@gmail.com> wrote:

> +1 to Xintong's proposal
>
> I also have some concerns about unstable cases.
>
> I think unstable cases can be divided into these types:
>
> - Force majeure: For example, network timeout, sudden environmental
> collapse, they are accidental and can always be solved by triggering azure
> again. Committers should wait for the next green azure.
>
> - Obvious mistakes: For example, some errors caused by obvious reasons may
> be repaired quickly. At this time, do we need to wait, or not wait and just
> ignore?
>
> - Difficult questions: These problems are very difficult to find. There
> will be no solution for a while and a half. We don't even know the reason.
> At this time, we should ignore it. (Maybe it's judged by the author of the
> case. But what about the old case whose author can't be found?)
>
> So, the ignored cases should be the block of the next release until the
> reason is found or the case is fixed?  We need to ensure that someone will
> take care of these cases, because there is no deepening of failed tests, no
> one may continue to pay attention to these cases.
>
> I think this guideline should consider these situations, and show how to
> solve them.
>
> Best,
> Jingsong
>
> On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:
>
> > Thanks to Xintong for bringing up this topic, I'm +1 in general.
> >
> > However, I think it's still not very clear how we address the unstable
> > tests.
> > I think this is a very important part of this new guideline.
> >
> > According to the discussion above, if some tests are unstable, we can
> > manually disable it.
> > But I have some questions in my mind:
> > 1) Is the instability judged by the committer themselves or by some
> > metrics?
> > 2) Should we log the disable commit in the corresponding issue and
> increase
> > the priority?
> > 3) What if nobody looks into this issue and this becomes some potential
> > bugs released with the new version?
> > 4) If no person is actively working on the issue, who should re-enable
> it?
> > Would it block PRs again?
> >
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks all for the feedback.
> > >
> > > @Till @Yangze
> > >
> > > I'm also not convinced by the idea of having an exception for local
> > builds.
> > > We need to execute the entire build (or at least the failing stage)
> > > locally, to make sure subsequent test cases prevented by the failure
> one
> > > are all executed. In that case, it's probably easier to rerun the build
> > on
> > > azure than locally.
> > >
> > > Concerning disabling unstable test cases that regularly block PRs from
> > > merging, maybe we can say that such cases can only be disabled when
> > someone
> > > is actively looking into it, likely the person who disabled the case.
> If
> > > this person is no longer actively working on it, he/she should enable
> the
> > > case again no matter if it is fixed or not.
> > >
> > > @Jing
> > >
> > > Thanks for the suggestions.
> > >
> > > +1 to provide guidelines on handling test failures.
> > >
> > > 1. Report the test failures in the JIRA.
> > > >
> > >
> > > +1 on this. Currently, the release managers are monitoring the ci and
> > cron
> > > build instabilities and reporting them on JIRA. We should also
> encourage
> > > other contributors to do that for PRs.
> > >
> > > 2. Set a deadline to find out the root cause and solve the failure for
> > the
> > > > new created JIRA  because we could not block other commit merges for
> a
> > > long
> > > > time
> > > >
> > > 3. What to do if the JIRA has not made significant progress when
> reached
> > to
> > > > the deadline time?
> > >
> > >
> > > I'm not sure about these two. It feels a bit against the voluntary
> nature
> > > of open source projects.
> > >
> > > IMHO, frequent instabilities are more likely to be upgraded to the
> > critical
> > > / blocker priority, receive more attention and eventually get fixed.
> > > Release managers are also responsible for looking for assignees for
> such
> > > issues. If a case is still not fixed soonish, even with all these
> > efforts,
> > > I'm not sure how setting a deadline can help this.
> > >
> > > 4. If we disable the respective tests temporarily, we also need a
> > mechanism
> > > > to ensure the issue would be continued to be investigated in the
> > future.
> > > >
> > >
> > > +1. As mentioned above, we may consider disabling such tests iff
> someone
> > is
> > > actively working on it.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com>
> wrote:
> > >
> > > > Hi Xintong,
> > > > +1 to the proposal.
> > > > In order to better comply with the rule, it is necessary to describe
> > > what's
> > > > best practice if encountering test failure which seems unrelated with
> > the
> > > > current commits.
> > > > How to avoid merging PR with test failures and not blocking code
> > merging
> > > > for a long time?
> > > > I tried to think about the possible steps, and found there are some
> > > > detailed problems that need to be discussed in a step further:
> > > > 1. Report the test failures in the JIRA.
> > > > 2. Set a deadline to find out the root cause and solve the failure
> for
> > > the
> > > > new created JIRA  because we could not block other commit merges for
> a
> > > long
> > > > time
> > > >     When is a reasonable deadline here?
> > > > 3. What to do if the JIRA has not made significant progress when
> > reached
> > > to
> > > > the deadline time?
> > > >     There are several situations as follows, maybe different cases
> need
> > > > different approaches.
> > > >     1. the JIRA is non-assigned yet
> > > >     2. not found the root cause yet
> > > >     3. not found a good solution, but already found the root cause
> > > >     4. found a solution, but it needs more time to be done.
> > > > 4. If we disable the respective tests temporarily, we also need a
> > > mechanism
> > > > to ensure the issue would be continued to be investigated in the
> > future.
> > > >
> > > > Best regards,
> > > > JING ZHANG
> > > >
> > > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > > >
> > > > > +1 to Xintong's proposal
> > > > >
> > > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <
> trohrmann@apache.org>
> > > > > wrote:
> > > > >
> > > > > > I would first try to not introduce the exception for local
> builds.
> > It
> > > > > makes
> > > > > > it quite hard for others to verify the build and to make sure
> that
> > > the
> > > > > > right things were executed. If we see that this becomes an issue
> > then
> > > > we
> > > > > > can revisit this idea.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > +1 for appending this to community guidelines for merging PRs.
> > > > > > >
> > > > > > > @Till Rohrmann
> > > > > > > I agree that with this approach unstable tests will not block
> > other
> > > > > > > commit merges. However, it might be hard to prevent merging
> > commits
> > > > > > > that are related to those tests and should have been passed
> them.
> > > > It's
> > > > > > > true that this judgment can be made by the committers, but no
> one
> > > can
> > > > > > > ensure the judgment is always precise and so that we have this
> > > > > > > discussion thread.
> > > > > > >
> > > > > > > Regarding the unstable tests, how about adding another
> exception:
> > > > > > > committers verify it in their local environment and comment in
> > such
> > > > > > > cases?
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <liujiangangpeng@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > It is a good principle to run all tests successfully with any
> > > > change.
> > > > > > > This
> > > > > > > > means a lot for project's stability and development. I am big
> > +1
> > > > for
> > > > > > this
> > > > > > > > proposal.
> > > > > > > >
> > > > > > > > Best
> > > > > > > > liujiangang
> > > > > > > >
> > > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > > > > > >
> > > > > > > > > One way to address the problem of regularly failing tests
> > that
> > > > > block
> > > > > > > > > merging of PRs is to disable the respective tests for the
> > time
> > > > > being.
> > > > > > > Of
> > > > > > > > > course, the failing test then needs to be fixed. But at
> least
> > > > that
> > > > > > way
> > > > > > > we
> > > > > > > > > would not block everyone from making progress.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> > arvid@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > > > > However, I'd like to put a higher priority on
> > infrastructure
> > > > > then,
> > > > > > in
> > > > > > > > > > particular docker image/artifact caches.
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > > trohrmann@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for bringing this topic to our attention
> Xintong.
> > I
> > > > > think
> > > > > > > your
> > > > > > > > > > > proposal makes a lot of sense and we should follow it.
> It
> > > > will
> > > > > > > give us
> > > > > > > > > > > confidence that our changes are working and it might
> be a
> > > > good
> > > > > > > > > incentive
> > > > > > > > > > to
> > > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > In the past a couple of weeks, I've observed several
> > > times
> > > > > that
> > > > > > > PRs
> > > > > > > > > are
> > > > > > > > > > > > merged without a green light from the CI tests, where
> > > > failure
> > > > > > > cases
> > > > > > > > > are
> > > > > > > > > > > > considered *unrelated*. This may not always cause
> > > problems,
> > > > > but
> > > > > > > would
> > > > > > > > > > > > increase the chance of breaking our code base. In
> fact,
> > > it
> > > > > has
> > > > > > > > > occurred
> > > > > > > > > > > to
> > > > > > > > > > > > me twice in the past few weeks that I had to revert a
> > > > commit
> > > > > > > which
> > > > > > > > > > breaks
> > > > > > > > > > > > the master branch due to this.
> > > > > > > > > > > >
> > > > > > > > > > > > I think it would be nicer to enforce a stricter rule,
> > > that
> > > > no
> > > > > > PRs
> > > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > merged without passing CI.
> > > > > > > > > > > >
> > > > > > > > > > > > The problems of merging PRs with "unrelated" test
> > > failures
> > > > > are:
> > > > > > > > > > > > - It's not always straightforward to tell whether a
> > test
> > > > > > > failures are
> > > > > > > > > > > > related or not.
> > > > > > > > > > > > - It prevents subsequent test cases from being
> > executed,
> > > > > which
> > > > > > > may
> > > > > > > > > fail
> > > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > > >
> > > > > > > > > > > > To make things easier for the committers, the
> following
> > > > > > > exceptions
> > > > > > > > > > might
> > > > > > > > > > > be
> > > > > > > > > > > > considered acceptable.
> > > > > > > > > > > > - The PR has passed CI in the contributor's personal
> > > > > workspace.
> > > > > > > > > Please
> > > > > > > > > > > post
> > > > > > > > > > > > the link in such cases.
> > > > > > > > > > > > - The CI tests have been triggered multiple times, on
> > the
> > > > > same
> > > > > > > > > commit,
> > > > > > > > > > > and
> > > > > > > > > > > > each stage has at least passed for once. Please also
> > > > comment
> > > > > in
> > > > > > > such
> > > > > > > > > > > cases.
> > > > > > > > > > > >
> > > > > > > > > > > > If we all agree on this, I'd update the community
> > > > guidelines
> > > > > > for
> > > > > > > > > > merging
> > > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > > >
> > > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Jingsong Li <ji...@gmail.com>.

+1 to Xintong's proposal

I also have some concerns about unstable cases.

I think unstable cases can be divided into these types:

- Force majeure: For example, network timeout, sudden environmental
collapse, they are accidental and can always be solved by triggering azure
again. Committers should wait for the next green azure.

- Obvious mistakes: For example, some errors caused by obvious reasons may
be repaired quickly. At this time, do we need to wait, or not wait and just
ignore?

- Difficult questions: These problems are very difficult to find. There
will be no solution for a while and a half. We don't even know the reason.
At this time, we should ignore it. (Maybe it's judged by the author of the
case. But what about the old case whose author can't be found?)

So, the ignored cases should be the block of the next release until the
reason is found or the case is fixed?  We need to ensure that someone will
take care of these cases, because there is no deepening of failed tests, no
one may continue to pay attention to these cases.

I think this guideline should consider these situations, and show how to
solve them.

Best,
Jingsong

On Thu, Jun 24, 2021 at 10:57 AM Jark Wu <im...@gmail.com> wrote:

> Thanks to Xintong for bringing up this topic, I'm +1 in general.
>
> However, I think it's still not very clear how we address the unstable
> tests.
> I think this is a very important part of this new guideline.
>
> According to the discussion above, if some tests are unstable, we can
> manually disable it.
> But I have some questions in my mind:
> 1) Is the instability judged by the committer themselves or by some
> metrics?
> 2) Should we log the disable commit in the corresponding issue and increase
> the priority?
> 3) What if nobody looks into this issue and this becomes some potential
> bugs released with the new version?
> 4) If no person is actively working on the issue, who should re-enable it?
> Would it block PRs again?
>
>
> Best,
> Jark
>
>
> On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com> wrote:
>
> > Thanks all for the feedback.
> >
> > @Till @Yangze
> >
> > I'm also not convinced by the idea of having an exception for local
> builds.
> > We need to execute the entire build (or at least the failing stage)
> > locally, to make sure subsequent test cases prevented by the failure one
> > are all executed. In that case, it's probably easier to rerun the build
> on
> > azure than locally.
> >
> > Concerning disabling unstable test cases that regularly block PRs from
> > merging, maybe we can say that such cases can only be disabled when
> someone
> > is actively looking into it, likely the person who disabled the case. If
> > this person is no longer actively working on it, he/she should enable the
> > case again no matter if it is fixed or not.
> >
> > @Jing
> >
> > Thanks for the suggestions.
> >
> > +1 to provide guidelines on handling test failures.
> >
> > 1. Report the test failures in the JIRA.
> > >
> >
> > +1 on this. Currently, the release managers are monitoring the ci and
> cron
> > build instabilities and reporting them on JIRA. We should also encourage
> > other contributors to do that for PRs.
> >
> > 2. Set a deadline to find out the root cause and solve the failure for
> the
> > > new created JIRA  because we could not block other commit merges for a
> > long
> > > time
> > >
> > 3. What to do if the JIRA has not made significant progress when reached
> to
> > > the deadline time?
> >
> >
> > I'm not sure about these two. It feels a bit against the voluntary nature
> > of open source projects.
> >
> > IMHO, frequent instabilities are more likely to be upgraded to the
> critical
> > / blocker priority, receive more attention and eventually get fixed.
> > Release managers are also responsible for looking for assignees for such
> > issues. If a case is still not fixed soonish, even with all these
> efforts,
> > I'm not sure how setting a deadline can help this.
> >
> > 4. If we disable the respective tests temporarily, we also need a
> mechanism
> > > to ensure the issue would be continued to be investigated in the
> future.
> > >
> >
> > +1. As mentioned above, we may consider disabling such tests iff someone
> is
> > actively working on it.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com> wrote:
> >
> > > Hi Xintong,
> > > +1 to the proposal.
> > > In order to better comply with the rule, it is necessary to describe
> > what's
> > > best practice if encountering test failure which seems unrelated with
> the
> > > current commits.
> > > How to avoid merging PR with test failures and not blocking code
> merging
> > > for a long time?
> > > I tried to think about the possible steps, and found there are some
> > > detailed problems that need to be discussed in a step further:
> > > 1. Report the test failures in the JIRA.
> > > 2. Set a deadline to find out the root cause and solve the failure for
> > the
> > > new created JIRA  because we could not block other commit merges for a
> > long
> > > time
> > >     When is a reasonable deadline here?
> > > 3. What to do if the JIRA has not made significant progress when
> reached
> > to
> > > the deadline time?
> > >     There are several situations as follows, maybe different cases need
> > > different approaches.
> > >     1. the JIRA is non-assigned yet
> > >     2. not found the root cause yet
> > >     3. not found a good solution, but already found the root cause
> > >     4. found a solution, but it needs more time to be done.
> > > 4. If we disable the respective tests temporarily, we also need a
> > mechanism
> > > to ensure the issue would be continued to be investigated in the
> future.
> > >
> > > Best regards,
> > > JING ZHANG
> > >
> > > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> > >
> > > > +1 to Xintong's proposal
> > > >
> > > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > >
> > > > > I would first try to not introduce the exception for local builds.
> It
> > > > makes
> > > > > it quite hard for others to verify the build and to make sure that
> > the
> > > > > right things were executed. If we see that this becomes an issue
> then
> > > we
> > > > > can revisit this idea.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > >
> > > > > > +1 for appending this to community guidelines for merging PRs.
> > > > > >
> > > > > > @Till Rohrmann
> > > > > > I agree that with this approach unstable tests will not block
> other
> > > > > > commit merges. However, it might be hard to prevent merging
> commits
> > > > > > that are related to those tests and should have been passed them.
> > > It's
> > > > > > true that this judgment can be made by the committers, but no one
> > can
> > > > > > ensure the judgment is always precise and so that we have this
> > > > > > discussion thread.
> > > > > >
> > > > > > Regarding the unstable tests, how about adding another exception:
> > > > > > committers verify it in their local environment and comment in
> such
> > > > > > cases?
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > It is a good principle to run all tests successfully with any
> > > change.
> > > > > > This
> > > > > > > means a lot for project's stability and development. I am big
> +1
> > > for
> > > > > this
> > > > > > > proposal.
> > > > > > >
> > > > > > > Best
> > > > > > > liujiangang
> > > > > > >
> > > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > > > > >
> > > > > > > > One way to address the problem of regularly failing tests
> that
> > > > block
> > > > > > > > merging of PRs is to disable the respective tests for the
> time
> > > > being.
> > > > > > Of
> > > > > > > > course, the failing test then needs to be fixed. But at least
> > > that
> > > > > way
> > > > > > we
> > > > > > > > would not block everyone from making progress.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <
> arvid@apache.org
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > > > However, I'd like to put a higher priority on
> infrastructure
> > > > then,
> > > > > in
> > > > > > > > > particular docker image/artifact caches.
> > > > > > > > >
> > > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > > trohrmann@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for bringing this topic to our attention Xintong.
> I
> > > > think
> > > > > > your
> > > > > > > > > > proposal makes a lot of sense and we should follow it. It
> > > will
> > > > > > give us
> > > > > > > > > > confidence that our changes are working and it might be a
> > > good
> > > > > > > > incentive
> > > > > > > > > to
> > > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > In the past a couple of weeks, I've observed several
> > times
> > > > that
> > > > > > PRs
> > > > > > > > are
> > > > > > > > > > > merged without a green light from the CI tests, where
> > > failure
> > > > > > cases
> > > > > > > > are
> > > > > > > > > > > considered *unrelated*. This may not always cause
> > problems,
> > > > but
> > > > > > would
> > > > > > > > > > > increase the chance of breaking our code base. In fact,
> > it
> > > > has
> > > > > > > > occurred
> > > > > > > > > > to
> > > > > > > > > > > me twice in the past few weeks that I had to revert a
> > > commit
> > > > > > which
> > > > > > > > > breaks
> > > > > > > > > > > the master branch due to this.
> > > > > > > > > > >
> > > > > > > > > > > I think it would be nicer to enforce a stricter rule,
> > that
> > > no
> > > > > PRs
> > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > merged without passing CI.
> > > > > > > > > > >
> > > > > > > > > > > The problems of merging PRs with "unrelated" test
> > failures
> > > > are:
> > > > > > > > > > > - It's not always straightforward to tell whether a
> test
> > > > > > failures are
> > > > > > > > > > > related or not.
> > > > > > > > > > > - It prevents subsequent test cases from being
> executed,
> > > > which
> > > > > > may
> > > > > > > > fail
> > > > > > > > > > > relating to the PR changes.
> > > > > > > > > > >
> > > > > > > > > > > To make things easier for the committers, the following
> > > > > > exceptions
> > > > > > > > > might
> > > > > > > > > > be
> > > > > > > > > > > considered acceptable.
> > > > > > > > > > > - The PR has passed CI in the contributor's personal
> > > > workspace.
> > > > > > > > Please
> > > > > > > > > > post
> > > > > > > > > > > the link in such cases.
> > > > > > > > > > > - The CI tests have been triggered multiple times, on
> the
> > > > same
> > > > > > > > commit,
> > > > > > > > > > and
> > > > > > > > > > > each stage has at least passed for once. Please also
> > > comment
> > > > in
> > > > > > such
> > > > > > > > > > cases.
> > > > > > > > > > >
> > > > > > > > > > > If we all agree on this, I'd update the community
> > > guidelines
> > > > > for
> > > > > > > > > merging
> > > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > > >
> > > > > > > > > > > Please let me know what do you think.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best, Jingsong Lee

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Jark Wu <im...@gmail.com>.

Thanks to Xintong for bringing up this topic, I'm +1 in general.

However, I think it's still not very clear how we address the unstable
tests.
I think this is a very important part of this new guideline.

According to the discussion above, if some tests are unstable, we can
manually disable it.
But I have some questions in my mind:
1) Is the instability judged by the committer themselves or by some
metrics?
2) Should we log the disable commit in the corresponding issue and increase
the priority?
3) What if nobody looks into this issue and this becomes some potential
bugs released with the new version?
4) If no person is actively working on the issue, who should re-enable it?
Would it block PRs again?


Best,
Jark


On Thu, 24 Jun 2021 at 10:04, Xintong Song <to...@gmail.com> wrote:

> Thanks all for the feedback.
>
> @Till @Yangze
>
> I'm also not convinced by the idea of having an exception for local builds.
> We need to execute the entire build (or at least the failing stage)
> locally, to make sure subsequent test cases prevented by the failure one
> are all executed. In that case, it's probably easier to rerun the build on
> azure than locally.
>
> Concerning disabling unstable test cases that regularly block PRs from
> merging, maybe we can say that such cases can only be disabled when someone
> is actively looking into it, likely the person who disabled the case. If
> this person is no longer actively working on it, he/she should enable the
> case again no matter if it is fixed or not.
>
> @Jing
>
> Thanks for the suggestions.
>
> +1 to provide guidelines on handling test failures.
>
> 1. Report the test failures in the JIRA.
> >
>
> +1 on this. Currently, the release managers are monitoring the ci and cron
> build instabilities and reporting them on JIRA. We should also encourage
> other contributors to do that for PRs.
>
> 2. Set a deadline to find out the root cause and solve the failure for the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> >
> 3. What to do if the JIRA has not made significant progress when reached to
> > the deadline time?
>
>
> I'm not sure about these two. It feels a bit against the voluntary nature
> of open source projects.
>
> IMHO, frequent instabilities are more likely to be upgraded to the critical
> / blocker priority, receive more attention and eventually get fixed.
> Release managers are also responsible for looking for assignees for such
> issues. If a case is still not fixed soonish, even with all these efforts,
> I'm not sure how setting a deadline can help this.
>
> 4. If we disable the respective tests temporarily, we also need a mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
>
> +1. As mentioned above, we may consider disabling such tests iff someone is
> actively working on it.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com> wrote:
>
> > Hi Xintong,
> > +1 to the proposal.
> > In order to better comply with the rule, it is necessary to describe
> what's
> > best practice if encountering test failure which seems unrelated with the
> > current commits.
> > How to avoid merging PR with test failures and not blocking code merging
> > for a long time?
> > I tried to think about the possible steps, and found there are some
> > detailed problems that need to be discussed in a step further:
> > 1. Report the test failures in the JIRA.
> > 2. Set a deadline to find out the root cause and solve the failure for
> the
> > new created JIRA  because we could not block other commit merges for a
> long
> > time
> >     When is a reasonable deadline here?
> > 3. What to do if the JIRA has not made significant progress when reached
> to
> > the deadline time?
> >     There are several situations as follows, maybe different cases need
> > different approaches.
> >     1. the JIRA is non-assigned yet
> >     2. not found the root cause yet
> >     3. not found a good solution, but already found the root cause
> >     4. found a solution, but it needs more time to be done.
> > 4. If we disable the respective tests temporarily, we also need a
> mechanism
> > to ensure the issue would be continued to be investigated in the future.
> >
> > Best regards,
> > JING ZHANG
> >
> > Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
> >
> > > +1 to Xintong's proposal
> > >
> > > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > > > I would first try to not introduce the exception for local builds. It
> > > makes
> > > > it quite hard for others to verify the build and to make sure that
> the
> > > > right things were executed. If we see that this becomes an issue then
> > we
> > > > can revisit this idea.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > >
> > > > > +1 for appending this to community guidelines for merging PRs.
> > > > >
> > > > > @Till Rohrmann
> > > > > I agree that with this approach unstable tests will not block other
> > > > > commit merges. However, it might be hard to prevent merging commits
> > > > > that are related to those tests and should have been passed them.
> > It's
> > > > > true that this judgment can be made by the committers, but no one
> can
> > > > > ensure the judgment is always precise and so that we have this
> > > > > discussion thread.
> > > > >
> > > > > Regarding the unstable tests, how about adding another exception:
> > > > > committers verify it in their local environment and comment in such
> > > > > cases?
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com>
> > wrote:
> > > > > >
> > > > > > It is a good principle to run all tests successfully with any
> > change.
> > > > > This
> > > > > > means a lot for project's stability and development. I am big +1
> > for
> > > > this
> > > > > > proposal.
> > > > > >
> > > > > > Best
> > > > > > liujiangang
> > > > > >
> > > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > > > >
> > > > > > > One way to address the problem of regularly failing tests that
> > > block
> > > > > > > merging of PRs is to disable the respective tests for the time
> > > being.
> > > > > Of
> > > > > > > course, the failing test then needs to be fixed. But at least
> > that
> > > > way
> > > > > we
> > > > > > > would not block everyone from making progress.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <arvid@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > > However, I'd like to put a higher priority on infrastructure
> > > then,
> > > > in
> > > > > > > > particular docker image/artifact caches.
> > > > > > > >
> > > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > > trohrmann@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for bringing this topic to our attention Xintong. I
> > > think
> > > > > your
> > > > > > > > > proposal makes a lot of sense and we should follow it. It
> > will
> > > > > give us
> > > > > > > > > confidence that our changes are working and it might be a
> > good
> > > > > > > incentive
> > > > > > > > to
> > > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > In the past a couple of weeks, I've observed several
> times
> > > that
> > > > > PRs
> > > > > > > are
> > > > > > > > > > merged without a green light from the CI tests, where
> > failure
> > > > > cases
> > > > > > > are
> > > > > > > > > > considered *unrelated*. This may not always cause
> problems,
> > > but
> > > > > would
> > > > > > > > > > increase the chance of breaking our code base. In fact,
> it
> > > has
> > > > > > > occurred
> > > > > > > > > to
> > > > > > > > > > me twice in the past few weeks that I had to revert a
> > commit
> > > > > which
> > > > > > > > breaks
> > > > > > > > > > the master branch due to this.
> > > > > > > > > >
> > > > > > > > > > I think it would be nicer to enforce a stricter rule,
> that
> > no
> > > > PRs
> > > > > > > > should
> > > > > > > > > be
> > > > > > > > > > merged without passing CI.
> > > > > > > > > >
> > > > > > > > > > The problems of merging PRs with "unrelated" test
> failures
> > > are:
> > > > > > > > > > - It's not always straightforward to tell whether a test
> > > > > failures are
> > > > > > > > > > related or not.
> > > > > > > > > > - It prevents subsequent test cases from being executed,
> > > which
> > > > > may
> > > > > > > fail
> > > > > > > > > > relating to the PR changes.
> > > > > > > > > >
> > > > > > > > > > To make things easier for the committers, the following
> > > > > exceptions
> > > > > > > > might
> > > > > > > > > be
> > > > > > > > > > considered acceptable.
> > > > > > > > > > - The PR has passed CI in the contributor's personal
> > > workspace.
> > > > > > > Please
> > > > > > > > > post
> > > > > > > > > > the link in such cases.
> > > > > > > > > > - The CI tests have been triggered multiple times, on the
> > > same
> > > > > > > commit,
> > > > > > > > > and
> > > > > > > > > > each stage has at least passed for once. Please also
> > comment
> > > in
> > > > > such
> > > > > > > > > cases.
> > > > > > > > > >
> > > > > > > > > > If we all agree on this, I'd update the community
> > guidelines
> > > > for
> > > > > > > > merging
> > > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > > >
> > > > > > > > > > Please let me know what do you think.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Xintong Song <to...@gmail.com>.

Thanks all for the feedback.

@Till @Yangze

I'm also not convinced by the idea of having an exception for local builds.
We need to execute the entire build (or at least the failing stage)
locally, to make sure subsequent test cases prevented by the failure one
are all executed. In that case, it's probably easier to rerun the build on
azure than locally.

Concerning disabling unstable test cases that regularly block PRs from
merging, maybe we can say that such cases can only be disabled when someone
is actively looking into it, likely the person who disabled the case. If
this person is no longer actively working on it, he/she should enable the
case again no matter if it is fixed or not.

@Jing

Thanks for the suggestions.

+1 to provide guidelines on handling test failures.

1. Report the test failures in the JIRA.
>

+1 on this. Currently, the release managers are monitoring the ci and cron
build instabilities and reporting them on JIRA. We should also encourage
other contributors to do that for PRs.

2. Set a deadline to find out the root cause and solve the failure for the
> new created JIRA  because we could not block other commit merges for a long
> time
>
3. What to do if the JIRA has not made significant progress when reached to
> the deadline time?


I'm not sure about these two. It feels a bit against the voluntary nature
of open source projects.

IMHO, frequent instabilities are more likely to be upgraded to the critical
/ blocker priority, receive more attention and eventually get fixed.
Release managers are also responsible for looking for assignees for such
issues. If a case is still not fixed soonish, even with all these efforts,
I'm not sure how setting a deadline can help this.

4. If we disable the respective tests temporarily, we also need a mechanism
> to ensure the issue would be continued to be investigated in the future.
>

+1. As mentioned above, we may consider disabling such tests iff someone is
actively working on it.

Thank you~

Xintong Song



On Wed, Jun 23, 2021 at 9:56 PM JING ZHANG <be...@gmail.com> wrote:

> Hi Xintong,
> +1 to the proposal.
> In order to better comply with the rule, it is necessary to describe what's
> best practice if encountering test failure which seems unrelated with the
> current commits.
> How to avoid merging PR with test failures and not blocking code merging
> for a long time?
> I tried to think about the possible steps, and found there are some
> detailed problems that need to be discussed in a step further:
> 1. Report the test failures in the JIRA.
> 2. Set a deadline to find out the root cause and solve the failure for the
> new created JIRA  because we could not block other commit merges for a long
> time
>     When is a reasonable deadline here?
> 3. What to do if the JIRA has not made significant progress when reached to
> the deadline time?
>     There are several situations as follows, maybe different cases need
> different approaches.
>     1. the JIRA is non-assigned yet
>     2. not found the root cause yet
>     3. not found a good solution, but already found the root cause
>     4. found a solution, but it needs more time to be done.
> 4. If we disable the respective tests temporarily, we also need a mechanism
> to ensure the issue would be continued to be investigated in the future.
>
> Best regards,
> JING ZHANG
>
> Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：
>
> > +1 to Xintong's proposal
> >
> > On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> > > I would first try to not introduce the exception for local builds. It
> > makes
> > > it quite hard for others to verify the build and to make sure that the
> > > right things were executed. If we see that this becomes an issue then
> we
> > > can revisit this idea.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > +1 for appending this to community guidelines for merging PRs.
> > > >
> > > > @Till Rohrmann
> > > > I agree that with this approach unstable tests will not block other
> > > > commit merges. However, it might be hard to prevent merging commits
> > > > that are related to those tests and should have been passed them.
> It's
> > > > true that this judgment can be made by the committers, but no one can
> > > > ensure the judgment is always precise and so that we have this
> > > > discussion thread.
> > > >
> > > > Regarding the unstable tests, how about adding another exception:
> > > > committers verify it in their local environment and comment in such
> > > > cases?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com>
> wrote:
> > > > >
> > > > > It is a good principle to run all tests successfully with any
> change.
> > > > This
> > > > > means a lot for project's stability and development. I am big +1
> for
> > > this
> > > > > proposal.
> > > > >
> > > > > Best
> > > > > liujiangang
> > > > >
> > > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > > >
> > > > > > One way to address the problem of regularly failing tests that
> > block
> > > > > > merging of PRs is to disable the respective tests for the time
> > being.
> > > > Of
> > > > > > course, the failing test then needs to be fixed. But at least
> that
> > > way
> > > > we
> > > > > > would not block everyone from making progress.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > > However, I'd like to put a higher priority on infrastructure
> > then,
> > > in
> > > > > > > particular docker image/artifact caches.
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > > trohrmann@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for bringing this topic to our attention Xintong. I
> > think
> > > > your
> > > > > > > > proposal makes a lot of sense and we should follow it. It
> will
> > > > give us
> > > > > > > > confidence that our changes are working and it might be a
> good
> > > > > > incentive
> > > > > > > to
> > > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > In the past a couple of weeks, I've observed several times
> > that
> > > > PRs
> > > > > > are
> > > > > > > > > merged without a green light from the CI tests, where
> failure
> > > > cases
> > > > > > are
> > > > > > > > > considered *unrelated*. This may not always cause problems,
> > but
> > > > would
> > > > > > > > > increase the chance of breaking our code base. In fact, it
> > has
> > > > > > occurred
> > > > > > > > to
> > > > > > > > > me twice in the past few weeks that I had to revert a
> commit
> > > > which
> > > > > > > breaks
> > > > > > > > > the master branch due to this.
> > > > > > > > >
> > > > > > > > > I think it would be nicer to enforce a stricter rule, that
> no
> > > PRs
> > > > > > > should
> > > > > > > > be
> > > > > > > > > merged without passing CI.
> > > > > > > > >
> > > > > > > > > The problems of merging PRs with "unrelated" test failures
> > are:
> > > > > > > > > - It's not always straightforward to tell whether a test
> > > > failures are
> > > > > > > > > related or not.
> > > > > > > > > - It prevents subsequent test cases from being executed,
> > which
> > > > may
> > > > > > fail
> > > > > > > > > relating to the PR changes.
> > > > > > > > >
> > > > > > > > > To make things easier for the committers, the following
> > > > exceptions
> > > > > > > might
> > > > > > > > be
> > > > > > > > > considered acceptable.
> > > > > > > > > - The PR has passed CI in the contributor's personal
> > workspace.
> > > > > > Please
> > > > > > > > post
> > > > > > > > > the link in such cases.
> > > > > > > > > - The CI tests have been triggered multiple times, on the
> > same
> > > > > > commit,
> > > > > > > > and
> > > > > > > > > each stage has at least passed for once. Please also
> comment
> > in
> > > > such
> > > > > > > > cases.
> > > > > > > > >
> > > > > > > > > If we all agree on this, I'd update the community
> guidelines
> > > for
> > > > > > > merging
> > > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > > >
> > > > > > > > > Please let me know what do you think.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by JING ZHANG <be...@gmail.com>.

Hi Xintong,
+1 to the proposal.
In order to better comply with the rule, it is necessary to describe what's
best practice if encountering test failure which seems unrelated with the
current commits.
How to avoid merging PR with test failures and not blocking code merging
for a long time?
I tried to think about the possible steps, and found there are some
detailed problems that need to be discussed in a step further:
1. Report the test failures in the JIRA.
2. Set a deadline to find out the root cause and solve the failure for the
new created JIRA  because we could not block other commit merges for a long
time
    When is a reasonable deadline here?
3. What to do if the JIRA has not made significant progress when reached to
the deadline time?
    There are several situations as follows, maybe different cases need
different approaches.
    1. the JIRA is non-assigned yet
    2. not found the root cause yet
    3. not found a good solution, but already found the root cause
    4. found a solution, but it needs more time to be done.
4. If we disable the respective tests temporarily, we also need a mechanism
to ensure the issue would be continued to be investigated in the future.

Best regards,
JING ZHANG

Stephan Ewen <se...@apache.org> 于2021年6月23日周三 下午8:16写道：

> +1 to Xintong's proposal
>
> On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <tr...@apache.org>
> wrote:
>
> > I would first try to not introduce the exception for local builds. It
> makes
> > it quite hard for others to verify the build and to make sure that the
> > right things were executed. If we see that this becomes an issue then we
> > can revisit this idea.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > +1 for appending this to community guidelines for merging PRs.
> > >
> > > @Till Rohrmann
> > > I agree that with this approach unstable tests will not block other
> > > commit merges. However, it might be hard to prevent merging commits
> > > that are related to those tests and should have been passed them. It's
> > > true that this judgment can be made by the committers, but no one can
> > > ensure the judgment is always precise and so that we have this
> > > discussion thread.
> > >
> > > Regarding the unstable tests, how about adding another exception:
> > > committers verify it in their local environment and comment in such
> > > cases?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com> wrote:
> > > >
> > > > It is a good principle to run all tests successfully with any change.
> > > This
> > > > means a lot for project's stability and development. I am big +1 for
> > this
> > > > proposal.
> > > >
> > > > Best
> > > > liujiangang
> > > >
> > > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > > >
> > > > > One way to address the problem of regularly failing tests that
> block
> > > > > merging of PRs is to disable the respective tests for the time
> being.
> > > Of
> > > > > course, the failing test then needs to be fixed. But at least that
> > way
> > > we
> > > > > would not block everyone from making progress.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org>
> > wrote:
> > > > >
> > > > > > I think this is overall a good idea. So +1 from my side.
> > > > > > However, I'd like to put a higher priority on infrastructure
> then,
> > in
> > > > > > particular docker image/artifact caches.
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> > trohrmann@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for bringing this topic to our attention Xintong. I
> think
> > > your
> > > > > > > proposal makes a lot of sense and we should follow it. It will
> > > give us
> > > > > > > confidence that our changes are working and it might be a good
> > > > > incentive
> > > > > > to
> > > > > > > quickly fix build instabilities. Hence, +1.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > In the past a couple of weeks, I've observed several times
> that
> > > PRs
> > > > > are
> > > > > > > > merged without a green light from the CI tests, where failure
> > > cases
> > > > > are
> > > > > > > > considered *unrelated*. This may not always cause problems,
> but
> > > would
> > > > > > > > increase the chance of breaking our code base. In fact, it
> has
> > > > > occurred
> > > > > > > to
> > > > > > > > me twice in the past few weeks that I had to revert a commit
> > > which
> > > > > > breaks
> > > > > > > > the master branch due to this.
> > > > > > > >
> > > > > > > > I think it would be nicer to enforce a stricter rule, that no
> > PRs
> > > > > > should
> > > > > > > be
> > > > > > > > merged without passing CI.
> > > > > > > >
> > > > > > > > The problems of merging PRs with "unrelated" test failures
> are:
> > > > > > > > - It's not always straightforward to tell whether a test
> > > failures are
> > > > > > > > related or not.
> > > > > > > > - It prevents subsequent test cases from being executed,
> which
> > > may
> > > > > fail
> > > > > > > > relating to the PR changes.
> > > > > > > >
> > > > > > > > To make things easier for the committers, the following
> > > exceptions
> > > > > > might
> > > > > > > be
> > > > > > > > considered acceptable.
> > > > > > > > - The PR has passed CI in the contributor's personal
> workspace.
> > > > > Please
> > > > > > > post
> > > > > > > > the link in such cases.
> > > > > > > > - The CI tests have been triggered multiple times, on the
> same
> > > > > commit,
> > > > > > > and
> > > > > > > > each stage has at least passed for once. Please also comment
> in
> > > such
> > > > > > > cases.
> > > > > > > >
> > > > > > > > If we all agree on this, I'd update the community guidelines
> > for
> > > > > > merging
> > > > > > > > PRs wrt. this proposal. [1]
> > > > > > > >
> > > > > > > > Please let me know what do you think.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Stephan Ewen <se...@apache.org>.

+1 to Xintong's proposal

On Wed, Jun 23, 2021 at 1:53 PM Till Rohrmann <tr...@apache.org> wrote:

> I would first try to not introduce the exception for local builds. It makes
> it quite hard for others to verify the build and to make sure that the
> right things were executed. If we see that this becomes an issue then we
> can revisit this idea.
>
> Cheers,
> Till
>
> On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > +1 for appending this to community guidelines for merging PRs.
> >
> > @Till Rohrmann
> > I agree that with this approach unstable tests will not block other
> > commit merges. However, it might be hard to prevent merging commits
> > that are related to those tests and should have been passed them. It's
> > true that this judgment can be made by the committers, but no one can
> > ensure the judgment is always precise and so that we have this
> > discussion thread.
> >
> > Regarding the unstable tests, how about adding another exception:
> > committers verify it in their local environment and comment in such
> > cases?
> >
> > Best,
> > Yangze Guo
> >
> > On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com> wrote:
> > >
> > > It is a good principle to run all tests successfully with any change.
> > This
> > > means a lot for project's stability and development. I am big +1 for
> this
> > > proposal.
> > >
> > > Best
> > > liujiangang
> > >
> > > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> > >
> > > > One way to address the problem of regularly failing tests that block
> > > > merging of PRs is to disable the respective tests for the time being.
> > Of
> > > > course, the failing test then needs to be fixed. But at least that
> way
> > we
> > > > would not block everyone from making progress.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org>
> wrote:
> > > >
> > > > > I think this is overall a good idea. So +1 from my side.
> > > > > However, I'd like to put a higher priority on infrastructure then,
> in
> > > > > particular docker image/artifact caches.
> > > > >
> > > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <
> trohrmann@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for bringing this topic to our attention Xintong. I think
> > your
> > > > > > proposal makes a lot of sense and we should follow it. It will
> > give us
> > > > > > confidence that our changes are working and it might be a good
> > > > incentive
> > > > > to
> > > > > > quickly fix build instabilities. Hence, +1.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> > tonysong820@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > In the past a couple of weeks, I've observed several times that
> > PRs
> > > > are
> > > > > > > merged without a green light from the CI tests, where failure
> > cases
> > > > are
> > > > > > > considered *unrelated*. This may not always cause problems, but
> > would
> > > > > > > increase the chance of breaking our code base. In fact, it has
> > > > occurred
> > > > > > to
> > > > > > > me twice in the past few weeks that I had to revert a commit
> > which
> > > > > breaks
> > > > > > > the master branch due to this.
> > > > > > >
> > > > > > > I think it would be nicer to enforce a stricter rule, that no
> PRs
> > > > > should
> > > > > > be
> > > > > > > merged without passing CI.
> > > > > > >
> > > > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > > > - It's not always straightforward to tell whether a test
> > failures are
> > > > > > > related or not.
> > > > > > > - It prevents subsequent test cases from being executed, which
> > may
> > > > fail
> > > > > > > relating to the PR changes.
> > > > > > >
> > > > > > > To make things easier for the committers, the following
> > exceptions
> > > > > might
> > > > > > be
> > > > > > > considered acceptable.
> > > > > > > - The PR has passed CI in the contributor's personal workspace.
> > > > Please
> > > > > > post
> > > > > > > the link in such cases.
> > > > > > > - The CI tests have been triggered multiple times, on the same
> > > > commit,
> > > > > > and
> > > > > > > each stage has at least passed for once. Please also comment in
> > such
> > > > > > cases.
> > > > > > >
> > > > > > > If we all agree on this, I'd update the community guidelines
> for
> > > > > merging
> > > > > > > PRs wrt. this proposal. [1]
> > > > > > >
> > > > > > > Please let me know what do you think.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

I would first try to not introduce the exception for local builds. It makes
it quite hard for others to verify the build and to make sure that the
right things were executed. If we see that this becomes an issue then we
can revisit this idea.

Cheers,
Till

On Wed, Jun 23, 2021 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:

> +1 for appending this to community guidelines for merging PRs.
>
> @Till Rohrmann
> I agree that with this approach unstable tests will not block other
> commit merges. However, it might be hard to prevent merging commits
> that are related to those tests and should have been passed them. It's
> true that this judgment can be made by the committers, but no one can
> ensure the judgment is always precise and so that we have this
> discussion thread.
>
> Regarding the unstable tests, how about adding another exception:
> committers verify it in their local environment and comment in such
> cases?
>
> Best,
> Yangze Guo
>
> On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com> wrote:
> >
> > It is a good principle to run all tests successfully with any change.
> This
> > means a lot for project's stability and development. I am big +1 for this
> > proposal.
> >
> > Best
> > liujiangang
> >
> > Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
> >
> > > One way to address the problem of regularly failing tests that block
> > > merging of PRs is to disable the respective tests for the time being.
> Of
> > > course, the failing test then needs to be fixed. But at least that way
> we
> > > would not block everyone from making progress.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org> wrote:
> > >
> > > > I think this is overall a good idea. So +1 from my side.
> > > > However, I'd like to put a higher priority on infrastructure then, in
> > > > particular docker image/artifact caches.
> > > >
> > > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <trohrmann@apache.org
> >
> > > > wrote:
> > > >
> > > > > Thanks for bringing this topic to our attention Xintong. I think
> your
> > > > > proposal makes a lot of sense and we should follow it. It will
> give us
> > > > > confidence that our changes are working and it might be a good
> > > incentive
> > > > to
> > > > > quickly fix build instabilities. Hence, +1.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <
> tonysong820@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > In the past a couple of weeks, I've observed several times that
> PRs
> > > are
> > > > > > merged without a green light from the CI tests, where failure
> cases
> > > are
> > > > > > considered *unrelated*. This may not always cause problems, but
> would
> > > > > > increase the chance of breaking our code base. In fact, it has
> > > occurred
> > > > > to
> > > > > > me twice in the past few weeks that I had to revert a commit
> which
> > > > breaks
> > > > > > the master branch due to this.
> > > > > >
> > > > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > > > should
> > > > > be
> > > > > > merged without passing CI.
> > > > > >
> > > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > > - It's not always straightforward to tell whether a test
> failures are
> > > > > > related or not.
> > > > > > - It prevents subsequent test cases from being executed, which
> may
> > > fail
> > > > > > relating to the PR changes.
> > > > > >
> > > > > > To make things easier for the committers, the following
> exceptions
> > > > might
> > > > > be
> > > > > > considered acceptable.
> > > > > > - The PR has passed CI in the contributor's personal workspace.
> > > Please
> > > > > post
> > > > > > the link in such cases.
> > > > > > - The CI tests have been triggered multiple times, on the same
> > > commit,
> > > > > and
> > > > > > each stage has at least passed for once. Please also comment in
> such
> > > > > cases.
> > > > > >
> > > > > > If we all agree on this, I'd update the community guidelines for
> > > > merging
> > > > > > PRs wrt. this proposal. [1]
> > > > > >
> > > > > > Please let me know what do you think.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Yangze Guo <ka...@gmail.com>.

+1 for appending this to community guidelines for merging PRs.

@Till Rohrmann
I agree that with this approach unstable tests will not block other
commit merges. However, it might be hard to prevent merging commits
that are related to those tests and should have been passed them. It's
true that this judgment can be made by the committers, but no one can
ensure the judgment is always precise and so that we have this
discussion thread.

Regarding the unstable tests, how about adding another exception:
committers verify it in their local environment and comment in such
cases?

Best,
Yangze Guo

On Tue, Jun 22, 2021 at 8:23 PM 刘建刚 <li...@gmail.com> wrote:
>
> It is a good principle to run all tests successfully with any change.  This
> means a lot for project's stability and development. I am big +1 for this
> proposal.
>
> Best
> liujiangang
>
> Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：
>
> > One way to address the problem of regularly failing tests that block
> > merging of PRs is to disable the respective tests for the time being. Of
> > course, the failing test then needs to be fixed. But at least that way we
> > would not block everyone from making progress.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org> wrote:
> >
> > > I think this is overall a good idea. So +1 from my side.
> > > However, I'd like to put a higher priority on infrastructure then, in
> > > particular docker image/artifact caches.
> > >
> > > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > > > Thanks for bringing this topic to our attention Xintong. I think your
> > > > proposal makes a lot of sense and we should follow it. It will give us
> > > > confidence that our changes are working and it might be a good
> > incentive
> > > to
> > > > quickly fix build instabilities. Hence, +1.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > In the past a couple of weeks, I've observed several times that PRs
> > are
> > > > > merged without a green light from the CI tests, where failure cases
> > are
> > > > > considered *unrelated*. This may not always cause problems, but would
> > > > > increase the chance of breaking our code base. In fact, it has
> > occurred
> > > > to
> > > > > me twice in the past few weeks that I had to revert a commit which
> > > breaks
> > > > > the master branch due to this.
> > > > >
> > > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > > should
> > > > be
> > > > > merged without passing CI.
> > > > >
> > > > > The problems of merging PRs with "unrelated" test failures are:
> > > > > - It's not always straightforward to tell whether a test failures are
> > > > > related or not.
> > > > > - It prevents subsequent test cases from being executed, which may
> > fail
> > > > > relating to the PR changes.
> > > > >
> > > > > To make things easier for the committers, the following exceptions
> > > might
> > > > be
> > > > > considered acceptable.
> > > > > - The PR has passed CI in the contributor's personal workspace.
> > Please
> > > > post
> > > > > the link in such cases.
> > > > > - The CI tests have been triggered multiple times, on the same
> > commit,
> > > > and
> > > > > each stage has at least passed for once. Please also comment in such
> > > > cases.
> > > > >
> > > > > If we all agree on this, I'd update the community guidelines for
> > > merging
> > > > > PRs wrt. this proposal. [1]
> > > > >
> > > > > Please let me know what do you think.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by 刘建刚 <li...@gmail.com>.

It is a good principle to run all tests successfully with any change.  This
means a lot for project's stability and development. I am big +1 for this
proposal.

Best
liujiangang

Till Rohrmann <tr...@apache.org> 于2021年6月22日周二 下午6:36写道：

> One way to address the problem of regularly failing tests that block
> merging of PRs is to disable the respective tests for the time being. Of
> course, the failing test then needs to be fixed. But at least that way we
> would not block everyone from making progress.
>
> Cheers,
> Till
>
> On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org> wrote:
>
> > I think this is overall a good idea. So +1 from my side.
> > However, I'd like to put a higher priority on infrastructure then, in
> > particular docker image/artifact caches.
> >
> > On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> > > Thanks for bringing this topic to our attention Xintong. I think your
> > > proposal makes a lot of sense and we should follow it. It will give us
> > > confidence that our changes are working and it might be a good
> incentive
> > to
> > > quickly fix build instabilities. Hence, +1.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > In the past a couple of weeks, I've observed several times that PRs
> are
> > > > merged without a green light from the CI tests, where failure cases
> are
> > > > considered *unrelated*. This may not always cause problems, but would
> > > > increase the chance of breaking our code base. In fact, it has
> occurred
> > > to
> > > > me twice in the past few weeks that I had to revert a commit which
> > breaks
> > > > the master branch due to this.
> > > >
> > > > I think it would be nicer to enforce a stricter rule, that no PRs
> > should
> > > be
> > > > merged without passing CI.
> > > >
> > > > The problems of merging PRs with "unrelated" test failures are:
> > > > - It's not always straightforward to tell whether a test failures are
> > > > related or not.
> > > > - It prevents subsequent test cases from being executed, which may
> fail
> > > > relating to the PR changes.
> > > >
> > > > To make things easier for the committers, the following exceptions
> > might
> > > be
> > > > considered acceptable.
> > > > - The PR has passed CI in the contributor's personal workspace.
> Please
> > > post
> > > > the link in such cases.
> > > > - The CI tests have been triggered multiple times, on the same
> commit,
> > > and
> > > > each stage has at least passed for once. Please also comment in such
> > > cases.
> > > >
> > > > If we all agree on this, I'd update the community guidelines for
> > merging
> > > > PRs wrt. this proposal. [1]
> > > >
> > > > Please let me know what do you think.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > >
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

One way to address the problem of regularly failing tests that block
merging of PRs is to disable the respective tests for the time being. Of
course, the failing test then needs to be fixed. But at least that way we
would not block everyone from making progress.

Cheers,
Till

On Tue, Jun 22, 2021 at 12:00 PM Arvid Heise <ar...@apache.org> wrote:

> I think this is overall a good idea. So +1 from my side.
> However, I'd like to put a higher priority on infrastructure then, in
> particular docker image/artifact caches.
>
> On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
> > Thanks for bringing this topic to our attention Xintong. I think your
> > proposal makes a lot of sense and we should follow it. It will give us
> > confidence that our changes are working and it might be a good incentive
> to
> > quickly fix build instabilities. Hence, +1.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > In the past a couple of weeks, I've observed several times that PRs are
> > > merged without a green light from the CI tests, where failure cases are
> > > considered *unrelated*. This may not always cause problems, but would
> > > increase the chance of breaking our code base. In fact, it has occurred
> > to
> > > me twice in the past few weeks that I had to revert a commit which
> breaks
> > > the master branch due to this.
> > >
> > > I think it would be nicer to enforce a stricter rule, that no PRs
> should
> > be
> > > merged without passing CI.
> > >
> > > The problems of merging PRs with "unrelated" test failures are:
> > > - It's not always straightforward to tell whether a test failures are
> > > related or not.
> > > - It prevents subsequent test cases from being executed, which may fail
> > > relating to the PR changes.
> > >
> > > To make things easier for the committers, the following exceptions
> might
> > be
> > > considered acceptable.
> > > - The PR has passed CI in the contributor's personal workspace. Please
> > post
> > > the link in such cases.
> > > - The CI tests have been triggered multiple times, on the same commit,
> > and
> > > each stage has at least passed for once. Please also comment in such
> > cases.
> > >
> > > If we all agree on this, I'd update the community guidelines for
> merging
> > > PRs wrt. this proposal. [1]
> > >
> > > Please let me know what do you think.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > >
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Arvid Heise <ar...@apache.org>.

I think this is overall a good idea. So +1 from my side.
However, I'd like to put a higher priority on infrastructure then, in
particular docker image/artifact caches.

On Tue, Jun 22, 2021 at 11:50 AM Till Rohrmann <tr...@apache.org> wrote:

> Thanks for bringing this topic to our attention Xintong. I think your
> proposal makes a lot of sense and we should follow it. It will give us
> confidence that our changes are working and it might be a good incentive to
> quickly fix build instabilities. Hence, +1.
>
> Cheers,
> Till
>
> On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <to...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > In the past a couple of weeks, I've observed several times that PRs are
> > merged without a green light from the CI tests, where failure cases are
> > considered *unrelated*. This may not always cause problems, but would
> > increase the chance of breaking our code base. In fact, it has occurred
> to
> > me twice in the past few weeks that I had to revert a commit which breaks
> > the master branch due to this.
> >
> > I think it would be nicer to enforce a stricter rule, that no PRs should
> be
> > merged without passing CI.
> >
> > The problems of merging PRs with "unrelated" test failures are:
> > - It's not always straightforward to tell whether a test failures are
> > related or not.
> > - It prevents subsequent test cases from being executed, which may fail
> > relating to the PR changes.
> >
> > To make things easier for the committers, the following exceptions might
> be
> > considered acceptable.
> > - The PR has passed CI in the contributor's personal workspace. Please
> post
> > the link in such cases.
> > - The CI tests have been triggered multiple times, on the same commit,
> and
> > each stage has at least passed for once. Please also comment in such
> cases.
> >
> > If we all agree on this, I'd update the community guidelines for merging
> > PRs wrt. this proposal. [1]
> >
> > Please let me know what do you think.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
>

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for bringing this topic to our attention Xintong. I think your
proposal makes a lot of sense and we should follow it. It will give us
confidence that our changes are working and it might be a good incentive to
quickly fix build instabilities. Hence, +1.

Cheers,
Till

On Tue, Jun 22, 2021 at 11:12 AM Xintong Song <to...@gmail.com> wrote:

> Hi everyone,
>
> In the past a couple of weeks, I've observed several times that PRs are
> merged without a green light from the CI tests, where failure cases are
> considered *unrelated*. This may not always cause problems, but would
> increase the chance of breaking our code base. In fact, it has occurred to
> me twice in the past few weeks that I had to revert a commit which breaks
> the master branch due to this.
>
> I think it would be nicer to enforce a stricter rule, that no PRs should be
> merged without passing CI.
>
> The problems of merging PRs with "unrelated" test failures are:
> - It's not always straightforward to tell whether a test failures are
> related or not.
> - It prevents subsequent test cases from being executed, which may fail
> relating to the PR changes.
>
> To make things easier for the committers, the following exceptions might be
> considered acceptable.
> - The PR has passed CI in the contributor's personal workspace. Please post
> the link in such cases.
> - The CI tests have been triggered multiple times, on the same commit, and
> each stage has at least passed for once. Please also comment in such cases.
>
> If we all agree on this, I'd update the community guidelines for merging
> PRs wrt. this proposal. [1]
>
> Please let me know what do you think.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>