You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2020/03/06 17:00:05 UTC

[jira] [Created] (CALCITE-3846) EnumerableMergeJoin: wrong comparison of composite key with null values

Ruben Q L created CALCITE-3846:
----------------------------------

             Summary: EnumerableMergeJoin: wrong comparison of composite key with null values
                 Key: CALCITE-3846
                 URL: https://issues.apache.org/jira/browse/CALCITE-3846
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.22.0
            Reporter: Ruben Q L


The problem can be reproduced with the following test in EnumerablesTest.java:
{code}
  @Test public void testMergeJoinWithCompositeKeyAndNull() {
    assertThat(
        EnumerableDefaults.mergeJoin(
            Linq4j.asEnumerable(
                Arrays.asList(
                    new Emp(10, "A"),
                    new Emp(10, "B"),
                    new Emp(10, "C"),
                    new Emp(10, "D"),
                    new Emp(40, "X"),
                    new Emp(50, "A"))),
            Linq4j.asEnumerable(
                Arrays.asList(
                    new Dept(10, "C"),
                    new Dept(10, null),
                    new Dept(30, "A"),
                    new Dept(40, "X"))),
            e -> (Comparable) FlatLists.of(e.deptno, e.name),
            d -> (Comparable) FlatLists.of(d.deptno, d.name),
            (v0, v1) -> v0 + ", " + v1, false, false).toList().toString(),
        equalTo("[Emp(10, C), Dept(10, C),"
            + " Emp(40, X), Dept(40, X)]"));
  }
{code}

The test fails with the following exception:
{code}
java.lang.IllegalStateException: mergeJoin assumes input sorted in ascending order, however [10, C] is greater than [10, null]
{code}

The problem is that EnumerableMergeJoin implementation (i.e. EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending order, nulls last (see EnumerableMergeJoinRule). In case of a composite key, EnumerableMergeJoin will represent keys as JavaRowFormat.LIST, which is a comparable list, whose comparison is implemented via FlatLists.ComparableListImpl#compare. This method will compare both lists, item by item, but in will consider that a null item is less than a non-null item. This is a de-facto nulls-first collation, which contradicts the pre-requisite of the mergeJoin algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)