You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2019/12/12 16:27:00 UTC
[jira] [Comment Edited] (CALCITE-3598) EnumerableTableScan: wrong JavaRowFormat for elementType String

    [ https://issues.apache.org/jira/browse/CALCITE-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994737#comment-16994737 ] 

Ruben Q L edited comment on CALCITE-3598 at 12/12/19 4:26 PM:
--------------------------------------------------------------

Thanks for the hint [~donnyzone]! it seems you are right.
In these these tests we arrive at {{EnumerableTableScan#format}} with {{elementType=String.class}}, and indeed it returns {{JavaRowFormat.CUSTOM}}
{code}
  private JavaRowFormat format() {
    int fieldCount = getRowType().getFieldCount();
    ...
    if (fieldCount == 1 && (Object.class == elementType
          || Primitive.is(elementType)
          || Number.class.isAssignableFrom(elementType))) {
      return JavaRowFormat.SCALAR;
    }
    return JavaRowFormat.CUSTOM;
  }
{code}

I suspect there is a missing condition for SCALAR (precisely when elementType is a String), by adding it, both tests are executed successfully!
{code}
  private JavaRowFormat format() {
    int fieldCount = getRowType().getFieldCount();
    ...
    if (fieldCount == 1 && (Object.class == elementType
          || Primitive.is(elementType)
          || Number.class.isAssignableFrom(elementType)
          || String.class == elementType)) {  // NEW CODE!
      return JavaRowFormat.SCALAR;
    }
    return JavaRowFormat.CUSTOM;
  }
{code}
In fact, the CUSTOM vs SCALAR situation causes some cascade effects that end up with the "row.CASE_INSENSITIVE_ORDER" situation. Concretely, the problem occurs in the {{JavaRowFormat#field}} method. In the case of SCALAR, it just returns the expression (which seems the right behavior for a String):
{code}
  SCALAR {
    ...
    public Expression field(Expression expression, int field, Type fromType,Type fieldType) {
      assert field == 0;
      return expression;
    }
{code}
Whereas for CUSTOM (with a String inside), it creates an expression to access the nth (in this case the 0-th) field of the String, and it seems that this generates the String.CASE_INSENSITIVE_ORDER:
{code}
  CUSTOM {
    ...
    public MemberExpression field(Expression expression, int field, Type fromType, Type fieldType) {
      ...
      return Expressions.field(expression, Types.nthField(field, type));
    }
{code}
I'll verify that this does not cause any regression, and create a PR for the fix.


was (Author: rubenql):
Thanks for the hint [~donnyzone]! it seems you are right.
In these these tests we arrive at {{EnumerableTableScan#format}} with {{elementType=String.class}}, and indeed it returns {{JavaRowFormat.CUSTOM}}
{code}
  private JavaRowFormat format() {
    int fieldCount = getRowType().getFieldCount();
    ...
    if (fieldCount == 1 && (Object.class == elementType
          || Primitive.is(elementType)
          || Number.class.isAssignableFrom(elementType))) {
      return JavaRowFormat.SCALAR;
    }
    return JavaRowFormat.CUSTOM;
  }
{code}

I suspect there is a missing condition for SCALAR (precisely when elementType is a String), by adding it, both tests are executed successfully!
{code}
  private JavaRowFormat format() {
    int fieldCount = getRowType().getFieldCount();
    ...
    if (fieldCount == 1 && (Object.class == elementType
          || Primitive.is(elementType)
          || Number.class.isAssignableFrom(elementType)
          || String.class == elementType)) {  // NEW CODE!
      return JavaRowFormat.SCALAR;
    }
    return JavaRowFormat.CUSTOM;
  }
{code}
I'll verify that this does not cause any regression, and create a PR for the fix.

> EnumerableTableScan: wrong JavaRowFormat for elementType String
> ---------------------------------------------------------------
>
>                 Key: CALCITE-3598
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3598
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.21.0
>            Reporter: Ruben Q L
>            Assignee: Ruben Q L
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>         Attachments: codeHashJoin.txt, codeNestedLoopJoin.txt
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Problem unveiled by CALCITE-3535, and also separately by CALCITE-3576.
> When CALCITE-3535 was committed, it made {{MaterializationTest#testJoinMaterialization8}} and {{MaterializationTest#testJoinMaterialization9}} change their execution plan from hashJoin to nestedLoopJoin. This caused an exception
> {code}
> java.lang.ClassCastException: java.lang.String$CaseInsensitiveComparator cannot be cast to java.lang.String
> {code}
> which seems unrelated to CALCITE-3535 (or CALCITE-3576), so the tests were temporarily disabled.
> The goal of this ticket is to investigate the root cause of this issue and re-activate both tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)