You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2022/04/28 15:59:00 UTC

[jira] [Updated] (CALCITE-5003) MergeUnion on types with different collators produces wrong result

     [ https://issues.apache.org/jira/browse/CALCITE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruben Q L updated CALCITE-5003:
-------------------------------
    Fix Version/s: 1.31.0

> MergeUnion on types with different collators produces wrong result
> ------------------------------------------------------------------
>
>                 Key: CALCITE-5003
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5003
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.27.0
>            Reporter: Ruben Q L
>            Assignee: Ruben Q L
>            Priority: Minor
>             Fix For: 1.31.0
>
>
> MergeUnion on types with different collators produces wrong result.
> Problem can be reproduced with the following test (in {{EnumerableStringComparisonTest}}):
> {code}
>   @Test void testMergeUnionOnStringDifferentCollation() {
>     tester()
>         .query("?")
>         .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner ->
>             planner.removeRule(EnumerableRules.ENUMERABLE_UNION_RULE))
>         .withRel(b -> {
>           final RelBuilder builder = b.transform(c -> c.withSimplifyValues(false));
>           return builder
>               .values(builder.getTypeFactory().builder()
>                       .add("name",
>                           builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)).build(),
>                   "facilities", "HR", "administration", "Marketing")
>               .values(createRecordVarcharSpecialCollation(builder),
>                   "Marketing", "administration", "presales", "HR")
>               .union(false)
>               .sort(0)
>               .build();
>         })
>         .explainHookMatches("" // It is important that we have MergeUnion in the plan
>             + "EnumerableMergeUnion(all=[false])\n"
>             + "  EnumerableSort(sort0=[$0], dir0=[ASC])\n"
>             + "    EnumerableValues(tuples=[[{ 'facilities' }, { 'HR' }, { 'administration' }, { 'Marketing' }]])\n"
>             + "  EnumerableSort(sort0=[$0], dir0=[ASC])\n"
>             + "    EnumerableValues(tuples=[[{ 'Marketing' }, { 'administration' }, { 'presales' }, { 'HR' }]])\n")
>         .returnsOrdered("name=administration\n"
>             + "name=facilities\n"
>             + "name=HR\n"
>             + "name=Marketing\n"
>             + "name=presales");
>   }
> {code}
> which fails with:
> {noformat}
> java.lang.AssertionError: 
> Expected: "name=administration\nname=facilities\nname=HR\nname=Marketing\nname=presales"
>      but: was "name=administration\nname=HR\nname=Marketing\nname=administration\nname=facilities\nname=Marketing\nname=presales"
> {noformat}
> The problem is that, in case of different collators, the pre-requisite of the the MergeUnion (inputs sorted) is not fulfilled, since inputs are technically sorted, but not using the same sorting collator, so they are not comparable by the MergeUnion algorithm.
> A possible solution could be not applying EnumerableMergeUnionRule in this case.
> A more clever solution could be achieved if the rule pushes a Sort + Cast + input (and not just Sort + input) in case the input's key type differs collation-wise with the union's result type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)