You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Dmitrii Ryabov <so...@gmail.com> on 2018/10/22 12:18:07 UTC

Re: Switching to real FailureHandler in tests

I tried to replace default no-op handler by handler stopping node and
failing the test.

I've returned the no-op handler in many classes because critical
situations are expected behavior. But PR still have a lot of failed
tests and suites. In some tests, I can't understand a failure reason.

I'm not finished to check failures, but after several RunAll runs, I
see new flaky tests (1 or 2 fails) appeared because of new handler.

I think we should keep no-op handler as default, but add new handler
for a few classes, where critical situations aren't expected.

пт, 21 сент. 2018 г. в 17:03, Andrey Kuznetsov <st...@gmail.com>:
>
> Thanks to all for participating the discussion.
>
> I've updated [1]: now it requires new handler from [2] for completion.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> [2] https://issues.apache.org/jira/browse/IGNITE-8227
>
> чт, 20 сент. 2018 г. в 21:56, Vladimir Ozerov <vo...@gridgain.com>:
>
> > Stop node handler is not very good choice. Some test will continue work as
> > usual even if some node failed. E.g. SQL queries with backups may continue
> > function in some cases, especially if these are test with REPLICATED cache.
> >
> > New test-scope handler looks like a better candidate to me.
> >
> > чт, 20 сент. 2018 г. в 21:22, Andrey Kuznetsov <st...@gmail.com>:
> >
> > > I meant the first comment in [1]. We are to decide first whether we'll do
> > > it or not.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-8227
> > > <
> > >
> > https://issues.apache.org/jira/browse/IGNITE-8227?focusedCommentId=16435298&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16435298
> > > >
> > >
> > > чт, 20 сент. 2018 г. в 21:18, Dmitriy Pavlov <dp...@gmail.com>:
> > >
> > > > Sorry, incomplete message.
> > > >
> > > > Why do you think there is no consensus?
> > > >
> > > > I have no clue what can be a reason for another approach.
> > > > By default failure handler should fail all test.
> > > >
> > > > Failure handlers test will be always a minority of tests, so fail
> > handler
> > > > call is something abnormal.
> > > >
> > > > чт, 20 сент. 2018 г. в 21:15, Dmitriy Pavlov <dp...@gmail.com>:
> > > >
> > > > > Why do you think there is no consensus?
> > > > >
> > > > > I have no clue that by default failure handler should fail all test.
> > > > >
> > > > > чт, 20 сент. 2018 г. в 21:10, Andrey Kuznetsov <st...@gmail.com>:
> > > > >
> > > > >> I've created [1] to address this.
> > > > >>
> > > > >> Dmitriy, I like your idea of creating special test-scope handler.
> > But
> > > > >> there
> > > > >> is no consensus about it, so I don't want to rely on that potential
> > > > >> handler
> > > > >> right now. We can switch to it later, of course.
> > > > >>
> > > > >> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > > > >>
> > > > >> чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov <ma...@gmail.com>:
> > > > >>
> > > > >> > Andrey,
> > > > >> >
> > > > >> > I like your idea.
> > > > >> >
> > > > >> > After changing the default node failure handler to the new one we
> > > > should
> > > > >> > carefully review the whole new test failures. For instance,
> > calling
> > > > this
> > > > >> > method in tests should not lead test to the node being stopped:
> > > > >> >
> > > > >> > <strong>FOR TEST ONLY!!!</strong>
> > > > >> > TcpDiscoverySpi#simulateNodeFailure
> > > > >> >
> > > > >> > BTW, I would like to remove this method at all from production
> > code.
> > > > >> >
> > > > >> > On Thu, 20 Sep 2018 at 19:43 Dmitriy Pavlov <
> > dpavlov.spb@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > But the totally ideal situation would be finding a way to fail
> > the
> > > > >> test
> > > > >> > by
> > > > >> > > default, not only stopping a node.
> > > > >> > >
> > > > >> > > Some time ago I've created
> > > > >> > > https://issues.apache.org/jira/browse/IGNITE-8227 to
> > > > >> > > find out a way to do so.
> > > > >> > >
> > > > >> > > чт, 20 сент. 2018 г. в 19:40, Dmitriy Pavlov <
> > > dpavlov.spb@gmail.com
> > > > >:
> > > > >> > >
> > > > >> > > > ++1
> > > > >> > > >
> > > > >> > > > чт, 20 сент. 2018 г. в 19:39, Andrey Kuznetsov <
> > > stkuzma@gmail.com
> > > > >:
> > > > >> > > >
> > > > >> > > >> Igniters,
> > > > >> > > >>
> > > > >> > > >> While running tests I see a lot of ignored critical failures
> > > > >> caused by
> > > > >> > > >> {{NoOpFailureHandler}} that we use by default. In some tests,
> > > of
> > > > >> > cource,
> > > > >> > > >> critical failures are the part of normal workflow, but in the
> > > > >> majority
> > > > >> > > of
> > > > >> > > >> tests they indicate bugs. By using {{NoOpFailureHandler}} we
> > > just
> > > > >> hide
> > > > >> > > >> these bugs from ourselves.
> > > > >> > > >>
> > > > >> > > >> What do you think about changing default handler to
> > > > >> > > >> {{StopNodeFailureHandler}} at {{GridAbstractTest}} level?
> > This
> > > > >> could
> > > > >> > be
> > > > >> > > >> overridden in subclasses.
> > > > >> > > >>
> > > > >> > > >> --
> > > > >> > > >> Best regards,
> > > > >> > > >>   Andrey Kuznetsov.
> > > > >> > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> > --
> > > > >> > --
> > > > >> > Maxim Muzafarov
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best regards,
> > > > >>   Andrey Kuznetsov.
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> > >
> >
>
>
> --
> Best regards,
>   Andrey Kuznetsov.

Re: Switching to real FailureHandler in tests

Posted by Andrey Kuznetsov <st...@gmail.com>.
Hi, Dmitrii!

As for [1], I think your suggestion is good to complete the change.

As for [2], I tend to disagree: in future tests default no-op handler can
hide bugs of new Ignite functionality. If it is really needed, developer
can explicitly mention that critical failure is a normal part of test
operation.

[1] https://issues.apache.org/jira/browse/IGNITE-8227
[2] https://issues.apache.org/jira/browse/IGNITE-9660

пн, 22 окт. 2018 г. в 15:18, Dmitrii Ryabov <so...@gmail.com>:

> I tried to replace default no-op handler by handler stopping node and
> failing the test.
>
> I've returned the no-op handler in many classes because critical
> situations are expected behavior. But PR still have a lot of failed
> tests and suites. In some tests, I can't understand a failure reason.
>
> I'm not finished to check failures, but after several RunAll runs, I
> see new flaky tests (1 or 2 fails) appeared because of new handler.
>
> I think we should keep no-op handler as default, but add new handler
> for a few classes, where critical situations aren't expected.
>
> пт, 21 сент. 2018 г. в 17:03, Andrey Kuznetsov <st...@gmail.com>:
> >
> > Thanks to all for participating the discussion.
> >
> > I've updated [1]: now it requires new handler from [2] for completion.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > [2] https://issues.apache.org/jira/browse/IGNITE-8227
> >
> > чт, 20 сент. 2018 г. в 21:56, Vladimir Ozerov <vo...@gridgain.com>:
> >
> > > Stop node handler is not very good choice. Some test will continue
> work as
> > > usual even if some node failed. E.g. SQL queries with backups may
> continue
> > > function in some cases, especially if these are test with REPLICATED
> cache.
> > >
> > > New test-scope handler looks like a better candidate to me.
> > >
> > > чт, 20 сент. 2018 г. в 21:22, Andrey Kuznetsov <st...@gmail.com>:
> > >
> > > > I meant the first comment in [1]. We are to decide first whether
> we'll do
> > > > it or not.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-8227
> > > > <
> > > >
> > >
> https://issues.apache.org/jira/browse/IGNITE-8227?focusedCommentId=16435298&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16435298
> > > > >
> > > >
> > > > чт, 20 сент. 2018 г. в 21:18, Dmitriy Pavlov <dpavlov.spb@gmail.com
> >:
> > > >
> > > > > Sorry, incomplete message.
> > > > >
> > > > > Why do you think there is no consensus?
> > > > >
> > > > > I have no clue what can be a reason for another approach.
> > > > > By default failure handler should fail all test.
> > > > >
> > > > > Failure handlers test will be always a minority of tests, so fail
> > > handler
> > > > > call is something abnormal.
> > > > >
> > > > > чт, 20 сент. 2018 г. в 21:15, Dmitriy Pavlov <
> dpavlov.spb@gmail.com>:
> > > > >
> > > > > > Why do you think there is no consensus?
> > > > > >
> > > > > > I have no clue that by default failure handler should fail all
> test.
> > > > > >
> > > > > > чт, 20 сент. 2018 г. в 21:10, Andrey Kuznetsov <
> stkuzma@gmail.com>:
> > > > > >
> > > > > >> I've created [1] to address this.
> > > > > >>
> > > > > >> Dmitriy, I like your idea of creating special test-scope
> handler.
> > > But
> > > > > >> there
> > > > > >> is no consensus about it, so I don't want to rely on that
> potential
> > > > > >> handler
> > > > > >> right now. We can switch to it later, of course.
> > > > > >>
> > > > > >> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > > > > >>
> > > > > >> чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov <
> maxmuzaf@gmail.com>:
> > > > > >>
> > > > > >> > Andrey,
> > > > > >> >
> > > > > >> > I like your idea.
> > > > > >> >
> > > > > >> > After changing the default node failure handler to the new
> one we
> > > > > should
> > > > > >> > carefully review the whole new test failures. For instance,
> > > calling
> > > > > this
> > > > > >> > method in tests should not lead test to the node being
> stopped:
> > > > > >> >
> > > > > >> > <strong>FOR TEST ONLY!!!</strong>
> > > > > >> > TcpDiscoverySpi#simulateNodeFailure
> > > > > >> >
> > > > > >> > BTW, I would like to remove this method at all from production
> > > code.
> > > > > >> >
> > > > > >> > On Thu, 20 Sep 2018 at 19:43 Dmitriy Pavlov <
> > > dpavlov.spb@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > But the totally ideal situation would be finding a way to
> fail
> > > the
> > > > > >> test
> > > > > >> > by
> > > > > >> > > default, not only stopping a node.
> > > > > >> > >
> > > > > >> > > Some time ago I've created
> > > > > >> > > https://issues.apache.org/jira/browse/IGNITE-8227 to
> > > > > >> > > find out a way to do so.
> > > > > >> > >
> > > > > >> > > чт, 20 сент. 2018 г. в 19:40, Dmitriy Pavlov <
> > > > dpavlov.spb@gmail.com
> > > > > >:
> > > > > >> > >
> > > > > >> > > > ++1
> > > > > >> > > >
> > > > > >> > > > чт, 20 сент. 2018 г. в 19:39, Andrey Kuznetsov <
> > > > stkuzma@gmail.com
> > > > > >:
> > > > > >> > > >
> > > > > >> > > >> Igniters,
> > > > > >> > > >>
> > > > > >> > > >> While running tests I see a lot of ignored critical
> failures
> > > > > >> caused by
> > > > > >> > > >> {{NoOpFailureHandler}} that we use by default. In some
> tests,
> > > > of
> > > > > >> > cource,
> > > > > >> > > >> critical failures are the part of normal workflow, but
> in the
> > > > > >> majority
> > > > > >> > > of
> > > > > >> > > >> tests they indicate bugs. By using
> {{NoOpFailureHandler}} we
> > > > just
> > > > > >> hide
> > > > > >> > > >> these bugs from ourselves.
> > > > > >> > > >>
> > > > > >> > > >> What do you think about changing default handler to
> > > > > >> > > >> {{StopNodeFailureHandler}} at {{GridAbstractTest}} level?
> > > This
> > > > > >> could
> > > > > >> > be
> > > > > >> > > >> overridden in subclasses.
> > > > > >> > > >>
> > > > > >> > > >> --
> > > > > >> > > >> Best regards,
> > > > > >> > > >>   Andrey Kuznetsov.
> > > > > >> > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > --
> > > > > >> > --
> > > > > >> > Maxim Muzafarov
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best regards,
> > > > > >>   Andrey Kuznetsov.
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > > >
> > >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>


-- 
Best regards,
  Andrey Kuznetsov.