You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2018/09/10 07:53:35 UTC

[DISCUSS] reasonable duration for tests in the Calcite codebase

Hi all,

I wonder what is a maximum duration for newly added test case that will
still be "reasonable".
I mean a test case that is executed in Travis to validate pull requests,
and a test case that is executed as a part of regular `mvn test` / `mvn
install`.

Could you please pick maximum test duration that would still be reasonable
from your point of view?
I would be glad to hear explanations as well.

1sec
5sec <-- I think this is the maximum reasonable test duration (for a single
test)
10sec
30sec
60sec (1m)
120sec (2m)
240sec (4m)
480sec (8m)
960sec (16m)
1920sec (32m)

Note: the duration might depend on CPU performance, however could we just
ignore that?

For instance: suppose I implement a fix for  "CALCITE-2506 RexSimplify:
coalesce(unaryPlus(nullInt), unaryPlus(vInt())) results in AssertionError:
result mismatch",
and I add a test case that takes 60 seconds to execute. Is that reasonable?
Then I fix "CALCITE-2527 Simplify OR with comparisons leads to
AssertionError: result mismatch" and add a test case that takes yet another
60 seconds.
Then I fix "CALCITE-2526 Simplify OR with comparisons leads to NPE" and add
a test that takes yet another 60 seconds.

As for me, 60 seconds for a single test is unreasonable since it would
easily bump test execution times to heaven, especially in Travis that uses
one thread only.

Note: there are cases like "it takes 20-30 seconds to start Cassandra",
however I assume Cassandra is started only once, then all the relevant
tests are run.
Then "restart Cassandra for each test method" is unreasonable, while
starting it once is fine.

Vladimir

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Kenneth Knowles <ke...@apache.org>.

FWIW Beam has "pre-commit" and "post-commit" suites [1], with the latter
being quite a lot longer and including lots of integration tests with e.g.
cloud data processing engines and storage systems, so at least the tests
have a home. You get "quick" feedback on pull requests (let's ignore how
slow our pre-commit is...) and for things that take hours and you find out
in post-commit you can always roll back [2]. You can do a bit of
engineering with the Jenkins groovy DSL to make this not so painful to
administer [3].

Kenn

[1] https://builds.apache.org/view/A-D/view/Beam/
[2] https://beam.apache.org/contribute/postcommits-policies/
[3] https://github.com/apache/beam/tree/master/.test-infra/jenkins

On Wed, Oct 24, 2018 at 8:02 PM Julian Hyde <jh...@apache.org> wrote:

> You exaggerate a little. It does test Calcite code, not just hsqldb.
>
> We should add a “slow regression test”, and add some of the tests in
> LatticeTest to it. We can disable it in the regular suite once that
> regression test is up and running regularly.
>
> Julian
>
>
> > On Oct 24, 2018, at 12:29 PM, Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
> >
> > I'm inclined to do something with LatticeTest (== mark it as disabled or
> > something like that)
> > It fails too often, and the only thing it does it tests HSQL (or H2?)
> > ability to perform join queries. It has nothing to do with Calcite.
> >
> > Current LatticeTests definitely harms Calcite development and it makes
> > "Calcite-Master - Build # 944 - Still Failing" messages virtually
> useless.
> >
> > Vladimir
>
>

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Julian Hyde <jh...@apache.org>.

You exaggerate a little. It does test Calcite code, not just hsqldb.

We should add a “slow regression test”, and add some of the tests in LatticeTest to it. We can disable it in the regular suite once that regression test is up and running regularly.

Julian

> On Oct 24, 2018, at 12:29 PM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> I'm inclined to do something with LatticeTest (== mark it as disabled or
> something like that)
> It fails too often, and the only thing it does it tests HSQL (or H2?)
> ability to perform join queries. It has nothing to do with Calcite.
> 
> Current LatticeTests definitely harms Calcite development and it makes
> "Calcite-Master - Build # 944 - Still Failing" messages virtually useless.
> 
> Vladimir

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Vladimir Sitnikov <si...@gmail.com>.

I'm inclined to do something with LatticeTest (== mark it as disabled or
something like that)
It fails too often, and the only thing it does it tests HSQL (or H2?)
ability to perform join queries. It has nothing to do with Calcite.

Current LatticeTests definitely harms Calcite development and it makes
"Calcite-Master - Build # 944 - Still Failing" messages virtually useless.

Vladimir

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Vladimir Sitnikov <si...@gmail.com>.

>
> Andrew>My biggest pain point when working with calcite is the size of the
> core

Andrew>project. Individual tests are mostly fast, but just building core
> takes 100
> Andrew>seconds. It would also be easier to run just the relevant tests if
> there
> Andrew> were several smaller maven projects.
>

Let's pick an example.
Current core contains "MaterializationTest 160sec" and "LatticeTest
212sec", and I doubt that functionality changes often.
Suppose we move the features to calcite-materialization and calcite-lattice
submodules (just pretend there's no circular dependencies, etc, etc)

Will it make "core" build faster? Of course it will.
Will it make testing "hacking Calcite core" faster? I doubt so.
Changes to "core" might ripple to other submodules. For instance, a change
in some rule might defeat calcite-materialization.
That means you would still need to run calcite-materialization tests even
though you alter core.
One can cut corners by building "core" only, however it is effectively the
same as marking certain tests as "to be executed with -Dlong.tests=true
only".

On top of that, Maven builds modules sequentially, so moving materialiation
and lattice to separate modules would even hurt end-to-end build and test
time.

Gradle might help there as it can build modules concurrently.

Vladimir

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Julian Hyde <jh...@apache.org>.

Calcite is a DBMS. Most DBMSs I know have test suites that take several hours to run. If our test suite took 50x longer I would start to get worried.

There are separate questions about whether developers should be required to run the full suite every commit, whether the CI system should run all tests every commit, and whether we should continue to use a free Travis instance that has strict time limits.

Julian


> On Sep 10, 2018, at 9:56 AM, Andrew Pilloud <ap...@google.com.INVALID> wrote:
> 
> I would expect acceptable test run time to be somewhat bimodal: maximum
> around 100ms for unit tests (which should be the majority of tests) and
> minutes for functional and integration tests. It would be good for Travis
> to run all these tests on every PR, but it would be nice if I could limit
> my local test runs to only the unit tests
> 
> My biggest pain point when working with calcite is the size of the core
> project. Individual tests are mostly fast, but just building core takes 100
> seconds. It would also be easier to run just the relevant tests if there
> were several smaller maven projects.
> 
> Andrew
> 
> On Mon, Sep 10, 2018 at 4:53 AM Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
> 
>> Michael>just because a test is slow of course doesn't mean it's not useful
>> 
>> Does "slow test" impact your development experience in a bad way?
>> Does "slow test" impact contributor's experience in a bad way?
>> Do things like "MaterializationTest 160sec" and "LatticeTest 212sec" make
>> materialization/lattice much stronger?
>> 
>> Spoiler: yes, yes, no. In the latter case, most of the time is spent not in
>> materialization/lattice engine, but most of the time is spent in data
>> loading.
>> That is the tests cover very little cases, and they spend too much time in
>> testing HSQLDB (or whatever DB is used to load the data).
>> In other words, current MaterializationTest/LatticeTest are much less
>> related to Calcite than they relate to H*DB.
>> 
>> Michael>I'm not sure it makes sense to have one single duration that's
>> acceptable
>> for all tests
>> 
>> Of course there are always exceptions. Like the one I have listed with
>> "Cassandra startup".
>> 
>> Let me rephrase: "what test duration would make your eyebrow to raise?"
>> What test duration would raise both of your eyebrows?
>> 
>> Michael>We could also put some slower tests behind a define since
>> just because a test is slow of course doesn't mean it's not useful.
>> 
>> Calcite is an intermediate framework. It is not a database. It is not a
>> training framework for machine learning.
>> Thus Calcite should not take much time to perform its duties.
>> 
>> For instance, if simple query fetches 100 rows by primary key literals
>> takes 80 seconds to "prepare" in Calcite, I would definitely call it a bug.
>> It does not mean that that case is "useful to run in a day-to-day"
>> codebase.
>> We can have it with @Ignore("the test is too slow, see CALCITE-XXXX"),
>> however it makes absolutely no sense in running the test on each test
>> execution.
>> 
>> Michael>If we are going to pick a single duration,
>> 
>> Note: I'm not suggesting to veto tests based on its duration.
>> 
>> Suppose I'm reviewing a PR, and I see that the added test there takes 95
>> seconds on my machine.
>> Is it reasonable to ask the author to reduce test duration?
>> 
>> Michael>then I think it should be much higher than 5 seconds
>> 
>> What's your idea on the reasonable duration for a single test?
>> 
>> Just in case. https://issues.apache.org/jira/browse/CALCITE-2509 shows
>> that
>> a simple
>> coalesce(vInt(0), vInt(1), vInt(2), vInt(3), vInt(4), vInt(5), vInt(6),
>> vInt(7), vInt(8), vInt(9), vIntNotNull(0))
>> takes 30+seconds in RexSimplifyTest.
>> 
>> Would you mind if I add that kind of enabled-by-default test?
>> 
>> Vladimir
>>

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Andrew Pilloud <ap...@google.com.INVALID>.

I would expect acceptable test run time to be somewhat bimodal: maximum
around 100ms for unit tests (which should be the majority of tests) and
minutes for functional and integration tests. It would be good for Travis
to run all these tests on every PR, but it would be nice if I could limit
my local test runs to only the unit tests.

My biggest pain point when working with calcite is the size of the core
project. Individual tests are mostly fast, but just building core takes 100
seconds. It would also be easier to run just the relevant tests if there
were several smaller maven projects.

Andrew

On Mon, Sep 10, 2018 at 4:53 AM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Michael>just because a test is slow of course doesn't mean it's not useful
>
> Does "slow test" impact your development experience in a bad way?
> Does "slow test" impact contributor's experience in a bad way?
> Do things like "MaterializationTest 160sec" and "LatticeTest 212sec" make
> materialization/lattice much stronger?
>
> Spoiler: yes, yes, no. In the latter case, most of the time is spent not in
> materialization/lattice engine, but most of the time is spent in data
> loading.
> That is the tests cover very little cases, and they spend too much time in
> testing HSQLDB (or whatever DB is used to load the data).
> In other words, current MaterializationTest/LatticeTest are much less
> related to Calcite than they relate to H*DB.
>
> Michael>I'm not sure it makes sense to have one single duration that's
> acceptable
> for all tests
>
> Of course there are always exceptions. Like the one I have listed with
> "Cassandra startup".
>
> Let me rephrase: "what test duration would make your eyebrow to raise?"
> What test duration would raise both of your eyebrows?
>
> Michael>We could also put some slower tests behind a define since
> just because a test is slow of course doesn't mean it's not useful.
>
> Calcite is an intermediate framework. It is not a database. It is not a
> training framework for machine learning.
> Thus Calcite should not take much time to perform its duties.
>
> For instance, if simple query fetches 100 rows by primary key literals
> takes 80 seconds to "prepare" in Calcite, I would definitely call it a bug.
> It does not mean that that case is "useful to run in a day-to-day"
> codebase.
> We can have it with @Ignore("the test is too slow, see CALCITE-XXXX"),
> however it makes absolutely no sense in running the test on each test
> execution.
>
> Michael>If we are going to pick a single duration,
>
> Note: I'm not suggesting to veto tests based on its duration.
>
> Suppose I'm reviewing a PR, and I see that the added test there takes 95
> seconds on my machine.
> Is it reasonable to ask the author to reduce test duration?
>
> Michael>then I think it should be much higher than 5 seconds
>
> What's your idea on the reasonable duration for a single test?
>
> Just in case. https://issues.apache.org/jira/browse/CALCITE-2509 shows
> that
> a simple
> coalesce(vInt(0), vInt(1), vInt(2), vInt(3), vInt(4), vInt(5), vInt(6),
> vInt(7), vInt(8), vInt(9), vIntNotNull(0))
> takes 30+seconds in RexSimplifyTest.
>
> Would you mind if I add that kind of enabled-by-default test?
>
> Vladimir
>

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Michael Mior <mm...@uwaterloo.ca>.

> Calcite is an intermediate framework. It is not a database. It is not a
> training framework for machine learning.
> Thus Calcite should not take much time to perform its duties.

I don't think the runtime of Calcite should necessarily be tied to the
runtime of tests. I agree that Calcite should have minimal overhead. I
don't think that means that the fully test suite is necessarily fast. If
the ability of the test suite to catch important bugs scales with the test
runtime, that's a good trade-off.

> Would you mind if I add that kind of enabled-by-default test?

Probably not, but I refer to my earlier statement that if the marginal
ability of that test to catch bugs is significant, then it might be a good
decision.

Bottom line is that I agree with you that test runtime is important and we
should try to minimize it. But the objective obviously isn't to minimize
test runtime or we'd delete all the tests. It seems like what we're really
trying to come to a consensus on is how to judge the value of a test.

--
Michael Mior
mmior@uwaterloo.ca


Le lun. 10 sept. 2018 à 07:53, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Michael>just because a test is slow of course doesn't mean it's not useful
>
> Does "slow test" impact your development experience in a bad way?
> Does "slow test" impact contributor's experience in a bad way?
> Do things like "MaterializationTest 160sec" and "LatticeTest 212sec" make
> materialization/lattice much stronger?
>
> Spoiler: yes, yes, no. In the latter case, most of the time is spent not in
> materialization/lattice engine, but most of the time is spent in data
> loading.
> That is the tests cover very little cases, and they spend too much time in
> testing HSQLDB (or whatever DB is used to load the data).
> In other words, current MaterializationTest/LatticeTest are much less
> related to Calcite than they relate to H*DB.
>
> Michael>I'm not sure it makes sense to have one single duration that's
> acceptable
> for all tests
>
> Of course there are always exceptions. Like the one I have listed with
> "Cassandra startup".
>
> Let me rephrase: "what test duration would make your eyebrow to raise?"
> What test duration would raise both of your eyebrows?
>
> Michael>We could also put some slower tests behind a define since
> just because a test is slow of course doesn't mean it's not useful.
>
> Calcite is an intermediate framework. It is not a database. It is not a
> training framework for machine learning.
> Thus Calcite should not take much time to perform its duties.
>
> For instance, if simple query fetches 100 rows by primary key literals
> takes 80 seconds to "prepare" in Calcite, I would definitely call it a bug.
> It does not mean that that case is "useful to run in a day-to-day"
> codebase.
> We can have it with @Ignore("the test is too slow, see CALCITE-XXXX"),
> however it makes absolutely no sense in running the test on each test
> execution.
>
> Michael>If we are going to pick a single duration,
>
> Note: I'm not suggesting to veto tests based on its duration.
>
> Suppose I'm reviewing a PR, and I see that the added test there takes 95
> seconds on my machine.
> Is it reasonable to ask the author to reduce test duration?
>
> Michael>then I think it should be much higher than 5 seconds
>
> What's your idea on the reasonable duration for a single test?
>
> Just in case. https://issues.apache.org/jira/browse/CALCITE-2509 shows
> that
> a simple
> coalesce(vInt(0), vInt(1), vInt(2), vInt(3), vInt(4), vInt(5), vInt(6),
> vInt(7), vInt(8), vInt(9), vIntNotNull(0))
> takes 30+seconds in RexSimplifyTest.
>
> Would you mind if I add that kind of enabled-by-default test?
>
> Vladimir
>

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Vladimir Sitnikov <si...@gmail.com>.

Michael>just because a test is slow of course doesn't mean it's not useful

Does "slow test" impact your development experience in a bad way?
Does "slow test" impact contributor's experience in a bad way?
Do things like "MaterializationTest 160sec" and "LatticeTest 212sec" make
materialization/lattice much stronger?

Spoiler: yes, yes, no. In the latter case, most of the time is spent not in
materialization/lattice engine, but most of the time is spent in data
loading.
That is the tests cover very little cases, and they spend too much time in
testing HSQLDB (or whatever DB is used to load the data).
In other words, current MaterializationTest/LatticeTest are much less
related to Calcite than they relate to H*DB.

Michael>I'm not sure it makes sense to have one single duration that's
acceptable
for all tests

Of course there are always exceptions. Like the one I have listed with
"Cassandra startup".

Let me rephrase: "what test duration would make your eyebrow to raise?"
What test duration would raise both of your eyebrows?

Michael>We could also put some slower tests behind a define since
just because a test is slow of course doesn't mean it's not useful.

Calcite is an intermediate framework. It is not a database. It is not a
training framework for machine learning.
Thus Calcite should not take much time to perform its duties.

For instance, if simple query fetches 100 rows by primary key literals
takes 80 seconds to "prepare" in Calcite, I would definitely call it a bug.
It does not mean that that case is "useful to run in a day-to-day" codebase.
We can have it with @Ignore("the test is too slow, see CALCITE-XXXX"),
however it makes absolutely no sense in running the test on each test
execution.

Michael>If we are going to pick a single duration,

Note: I'm not suggesting to veto tests based on its duration.

Suppose I'm reviewing a PR, and I see that the added test there takes 95
seconds on my machine.
Is it reasonable to ask the author to reduce test duration?

Michael>then I think it should be much higher than 5 seconds

What's your idea on the reasonable duration for a single test?

Just in case. https://issues.apache.org/jira/browse/CALCITE-2509 shows that
a simple
coalesce(vInt(0), vInt(1), vInt(2), vInt(3), vInt(4), vInt(5), vInt(6),
vInt(7), vInt(8), vInt(9), vIntNotNull(0))
takes 30+seconds in RexSimplifyTest.

Would you mind if I add that kind of enabled-by-default test?

Vladimir

Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

Posted by Michael Mior <mm...@apache.org>.

I'm not sure it makes sense to have one single duration that's acceptable
for all tests. This ignores the value for each test in how likely it is to
catch bugs which is certainly different for each test. If we are going to
pick a single duration, then I think it should be much higher than 5
seconds. Of course, we also need to define what environment these runtimes
are measured in. We could also put some slower tests behind a define since
just because a test is slow of course doesn't mean it's not useful.

--
Michael Mior
mmior@apache.org


Le lun. 10 sept. 2018 à 03:53, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Hi all,
>
> I wonder what is a maximum duration for newly added test case that will
> still be "reasonable".
> I mean a test case that is executed in Travis to validate pull requests,
> and a test case that is executed as a part of regular `mvn test` / `mvn
> install`.
>
> Could you please pick maximum test duration that would still be reasonable
> from your point of view?
> I would be glad to hear explanations as well.
>
> 1sec
> 5sec <-- I think this is the maximum reasonable test duration (for a single
> test)
> 10sec
> 30sec
> 60sec (1m)
> 120sec (2m)
> 240sec (4m)
> 480sec (8m)
> 960sec (16m)
> 1920sec (32m)
>
> Note: the duration might depend on CPU performance, however could we just
> ignore that?
>
> For instance: suppose I implement a fix for  "CALCITE-2506 RexSimplify:
> coalesce(unaryPlus(nullInt), unaryPlus(vInt())) results in AssertionError:
> result mismatch",
> and I add a test case that takes 60 seconds to execute. Is that reasonable?
> Then I fix "CALCITE-2527 Simplify OR with comparisons leads to
> AssertionError: result mismatch" and add a test case that takes yet another
> 60 seconds.
> Then I fix "CALCITE-2526 Simplify OR with comparisons leads to NPE" and add
> a test that takes yet another 60 seconds.
>
> As for me, 60 seconds for a single test is unreasonable since it would
> easily bump test execution times to heaven, especially in Travis that uses
> one thread only.
>
> Note: there are cases like "it takes 20-30 seconds to start Cassandra",
> however I assume Cassandra is started only once, then all the relevant
> tests are run.
> Then "restart Cassandra for each test method" is unreasonable, while
> starting it once is fine.
>
> Vladimir
>