You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2019/12/28 22:05:22 UTC

Concurrent execution of tests methods

Hi,

I've filed a PR to activate concurrent test execution by default:
https://github.com/apache/calcite/pull/1702

It results in concurrent execution of both methods and classes.
Note: it was something that was present in Maven, and now it will be there
in Gradle as well.

It looks to work on my machine (which is 1s4c8t), however, it might be
there are still concurrency issues left.

I noticed two major issues:
1)
https://github.com/apache/calcite/commit/d32ee5c320938b5c34ce09df2276c9570c27a301#diff-2b9a6c719e7c1c69c76dccbbc1654ae8R2593
That commit added Hook.REL_BUILDER_SIMPLIFY.add(Hook.propertyJ(false))
which was basically a global hook, so
that test induced other test failures.

2) [CALCITE-3285] EnumerableMergeJoin should support non-equi join
conditions
https://github.com/apache/calcite/commit/a0931b784c60683eb6845b3087d79f33160fc868#diff-d746ae553342ceeb0806df4a03e1e51eR44
It added a static field to EnumerableConvention which induced failures in
concurrent execution.
I've reworked CALCITE-3285 (see PR 1702), and hopefully, it would work
better.

In the meantime please feel free to try concurrent execution and let me
know how that works.

Vladimir

Re: Concurrent execution of tests methods

Posted by Vladimir Sitnikov <si...@gmail.com>.
It turned out to be more complicated than I thought.

The fix of EnumerableMergeJoin uncovered a well-known infinite planning
time issue https://issues.apache.org/jira/browse/CALCITE-2223

The thing is previously the rule did not even try to sort its inputs, thus
it was producing value only for the case when
the inputs **happen to be** sorted.

I thought it would make sense to consider plans like MergeJoin(Sort(..),
Sort(..)) as well, so I altered the rule
so it creates Sort nodes in case the input is not sorted.

So far so good, however, a trivial case like JdbcTest#testJoinManyWay
degrades significantly because now it has much more
nodes to explore.

In current master, testJoinManyWay completes in 7 seconds (see
https://travis-ci.org/apache/calcite/jobs/630585211#L1624 )

However, the test does not complete if I activate MergeJoin rule.
I did try to fix that, and I have a couple of commits to make it a bit more
stable (see commits in https://github.com/apache/calcite/pull/1702)

Basically
1) I ensure RexNodes are canonicalized at construction time (e.g. "=($1,
$0)" is always created as "=($0, $1)")
^^ This seems to produce lots of test failures, however, it would make test
output more stable which is good.
2) I add a couple of call.rel(0).getConvention() ==
call.rel(1).getConvention(); checks to rules
like FilterProjectTransposeRule.

The net result is testJoinManyWay takes 140 seconds on my machine. It is
not perfect, but at least it manages to complete for 6 joins.

An alternative option is to have **two** (or three) flavours of
MergeJoinRule:
A) A rule that just reuses sorted inputs if there are any (less sort nodes
=> less planning space => faster planning)
B) A rule that tries adding Sort nodes (more sort nodes => longer planning,
but it might happen to produce creative plans)
C) A rule that tries adding Sort node only in case one of its inputs is
already sorted appropriately (a mix between A and B)

Any thoughts?

Vladimir