You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2020/01/11 09:19:18 UTC

[DISCUSS] Randomize VolcanoRule execution order for better test coverage

Hi,

I've ran into RuleMatchImportanceComparator issue (see
https://issues.apache.org/jira/browse/CALCITE-2356 )

As a fun experiment, I've replaced the comparator with
Random#nextBoolean(), and it identified a bug:
https://github.com/apache/calcite/pull/1747/checks?check_run_id=384628081#step:5:789

The key here is the perfectly valid code produces the wrong SQL, so it is a
bug either in a rule or in rel-to-sql.

What do you think if we add a randomized CI job to the test suite?
Of course, randomized execution might produce slightly different plans, so
we need to have a system property
that would refrain from failing the test if an execution plan is different.
On the other hand, if the test method throws an unexpected exception, then
it is a true bug identified by randomization.


JdbcFrontJdbcBackLinqMiddleTest > testJoinGroupByOrderBy() FAILED

    java.sql.SQLException: Error while executing SQL "select count(*),
c."state_province",
      sum(s."unit_sales") as s
    from "foodmart"."sales_fact_1997" as s
      join "foodmart"."customer" as c
      on s."customer_id" = c."customer_id"
    group by c."state_province"
    order by c."state_province"": While executing SQL [SELECT COUNT(*),
"t2"."state_province", "t2"."S"
    FROM (SELECT "t0"."state_province", COUNT(*),
COALESCE(SUM("t"."unit_sales"), 0) AS "S"
    FROM (SELECT "customer_id", "unit_sales"
    FROM "foodmart"."sales_fact_1997") AS "t"
    INNER JOIN (SELECT "customer_id", "state_province"
    FROM "foodmart"."customer") AS "t0" ON "t"."customer_id" =
"t0"."customer_id"
    GROUP BY "t0"."state_province"
    ORDER BY "t0"."state_province" NULLS LAST) AS "t2"] on JDBC sub-schema


        Caused by:
        java.lang.RuntimeException: While executing SQL [SELECT COUNT(*),
"t2"."state_province", "t2"."S"
        FROM (SELECT "t0"."state_province", COUNT(*),
COALESCE(SUM("t"."unit_sales"), 0) AS "S"
        FROM (SELECT "customer_id", "unit_sales"
        FROM "foodmart"."sales_fact_1997") AS "t"
        INNER JOIN (SELECT "customer_id", "state_province"
        FROM "foodmart"."customer") AS "t0" ON "t"."customer_id" =
"t0"."customer_id"
        GROUP BY "t0"."state_province"
        ORDER BY "t0"."state_province" NULLS LAST) AS "t2"] on JDBC
sub-schema

            java.sql.SQLSyntaxErrorException: expression not in aggregate
or GROUP BY columns: "t2"."state_province"
                at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)

Vladimir

Re: [DISCUSS] Randomize VolcanoRule execution order for better test coverage

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hello,

I like the idea of adding more tests but I am also afraid of having many
false positives.

I was under the impression that some kind of rules should always be
executed in a certain way. For instance, implementation rules (Logical ->
Enumerable) should always go top-down so that trait demand works correctly.
As it came up other projects may indeed write rules that they rely on a
particular order of executing the rules for having an executable plan so in
the end I am not sure if the randomization will help a lot.

I didn't thoroughly think on it so I am leaving the decision up to Vladimir.

Best,
Stamatis

On Sat, Jan 11, 2020 at 9:39 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Michael>I'm not sure I see how the property
> Michael>for passing tests with different plans would work
>
> E.g. @Tag("skipWhenRandomizedRules") and CalciteAssert#explainContains
> could become a no-op when a special property is passed.
>
> Of course, the randomization should be for test purposes, not for
> production.
>
> Vladimir
>

Re: [DISCUSS] Randomize VolcanoRule execution order for better test coverage

Posted by Vladimir Sitnikov <si...@gmail.com>.

Michael>I'm not sure I see how the property
Michael>for passing tests with different plans would work

E.g. @Tag("skipWhenRandomizedRules") and CalciteAssert#explainContains
could become a no-op when a special property is passed.

Of course, the randomization should be for test purposes, not for
production.

Vladimir

Re: [DISCUSS] Randomize VolcanoRule execution order for better test coverage

Posted by Michael Mior <mm...@uwaterloo.ca>.

I could see how this would be helpful. I'm not sure I see how the property
for passing tests with different plans would work. Why not just have
randomization disabled by default? In production, predictability is
probably preferable anyway.

On Sat, Jan 11, 2020, 04:19 Vladimir Sitnikov <si...@gmail.com>
wrote:

> Hi,
>
> I've ran into RuleMatchImportanceComparator issue (see
> https://issues.apache.org/jira/browse/CALCITE-2356 )
>
> As a fun experiment, I've replaced the comparator with
> Random#nextBoolean(), and it identified a bug:
>
> https://github.com/apache/calcite/pull/1747/checks?check_run_id=384628081#step:5:789
>
> The key here is the perfectly valid code produces the wrong SQL, so it is a
> bug either in a rule or in rel-to-sql.
>
> What do you think if we add a randomized CI job to the test suite?
> Of course, randomized execution might produce slightly different plans, so
> we need to have a system property
> that would refrain from failing the test if an execution plan is different.
> On the other hand, if the test method throws an unexpected exception, then
> it is a true bug identified by randomization.
>
>
> JdbcFrontJdbcBackLinqMiddleTest > testJoinGroupByOrderBy() FAILED
>
>     java.sql.SQLException: Error while executing SQL "select count(*),
> c."state_province",
>       sum(s."unit_sales") as s
>     from "foodmart"."sales_fact_1997" as s
>       join "foodmart"."customer" as c
>       on s."customer_id" = c."customer_id"
>     group by c."state_province"
>     order by c."state_province"": While executing SQL [SELECT COUNT(*),
> "t2"."state_province", "t2"."S"
>     FROM (SELECT "t0"."state_province", COUNT(*),
> COALESCE(SUM("t"."unit_sales"), 0) AS "S"
>     FROM (SELECT "customer_id", "unit_sales"
>     FROM "foodmart"."sales_fact_1997") AS "t"
>     INNER JOIN (SELECT "customer_id", "state_province"
>     FROM "foodmart"."customer") AS "t0" ON "t"."customer_id" =
> "t0"."customer_id"
>     GROUP BY "t0"."state_province"
>     ORDER BY "t0"."state_province" NULLS LAST) AS "t2"] on JDBC sub-schema
>
>
>         Caused by:
>         java.lang.RuntimeException: While executing SQL [SELECT COUNT(*),
> "t2"."state_province", "t2"."S"
>         FROM (SELECT "t0"."state_province", COUNT(*),
> COALESCE(SUM("t"."unit_sales"), 0) AS "S"
>         FROM (SELECT "customer_id", "unit_sales"
>         FROM "foodmart"."sales_fact_1997") AS "t"
>         INNER JOIN (SELECT "customer_id", "state_province"
>         FROM "foodmart"."customer") AS "t0" ON "t"."customer_id" =
> "t0"."customer_id"
>         GROUP BY "t0"."state_province"
>         ORDER BY "t0"."state_province" NULLS LAST) AS "t2"] on JDBC
> sub-schema
>
>             java.sql.SQLSyntaxErrorException: expression not in aggregate
> or GROUP BY columns: "t2"."state_province"
>                 at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
>
> Vladimir
>