You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2019/12/12 16:27:00 UTC
[jira] [Comment Edited] (CALCITE-3598) EnumerableTableScan: wrong
JavaRowFormat for elementType String
[ https://issues.apache.org/jira/browse/CALCITE-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994737#comment-16994737 ]
Ruben Q L edited comment on CALCITE-3598 at 12/12/19 4:26 PM:
--------------------------------------------------------------
Thanks for the hint [~donnyzone]! it seems you are right.
In these these tests we arrive at {{EnumerableTableScan#format}} with {{elementType=String.class}}, and indeed it returns {{JavaRowFormat.CUSTOM}}
{code}
private JavaRowFormat format() {
int fieldCount = getRowType().getFieldCount();
...
if (fieldCount == 1 && (Object.class == elementType
|| Primitive.is(elementType)
|| Number.class.isAssignableFrom(elementType))) {
return JavaRowFormat.SCALAR;
}
return JavaRowFormat.CUSTOM;
}
{code}
I suspect there is a missing condition for SCALAR (precisely when elementType is a String), by adding it, both tests are executed successfully!
{code}
private JavaRowFormat format() {
int fieldCount = getRowType().getFieldCount();
...
if (fieldCount == 1 && (Object.class == elementType
|| Primitive.is(elementType)
|| Number.class.isAssignableFrom(elementType)
|| String.class == elementType)) { // NEW CODE!
return JavaRowFormat.SCALAR;
}
return JavaRowFormat.CUSTOM;
}
{code}
In fact, the CUSTOM vs SCALAR situation causes some cascade effects that end up with the "row.CASE_INSENSITIVE_ORDER" situation. Concretely, the problem occurs in the {{JavaRowFormat#field}} method. In the case of SCALAR, it just returns the expression (which seems the right behavior for a String):
{code}
SCALAR {
...
public Expression field(Expression expression, int field, Type fromType,Type fieldType) {
assert field == 0;
return expression;
}
{code}
Whereas for CUSTOM (with a String inside), it creates an expression to access the nth (in this case the 0-th) field of the String, and it seems that this generates the String.CASE_INSENSITIVE_ORDER:
{code}
CUSTOM {
...
public MemberExpression field(Expression expression, int field, Type fromType, Type fieldType) {
...
return Expressions.field(expression, Types.nthField(field, type));
}
{code}
I'll verify that this does not cause any regression, and create a PR for the fix.
was (Author: rubenql):
Thanks for the hint [~donnyzone]! it seems you are right.
In these these tests we arrive at {{EnumerableTableScan#format}} with {{elementType=String.class}}, and indeed it returns {{JavaRowFormat.CUSTOM}}
{code}
private JavaRowFormat format() {
int fieldCount = getRowType().getFieldCount();
...
if (fieldCount == 1 && (Object.class == elementType
|| Primitive.is(elementType)
|| Number.class.isAssignableFrom(elementType))) {
return JavaRowFormat.SCALAR;
}
return JavaRowFormat.CUSTOM;
}
{code}
I suspect there is a missing condition for SCALAR (precisely when elementType is a String), by adding it, both tests are executed successfully!
{code}
private JavaRowFormat format() {
int fieldCount = getRowType().getFieldCount();
...
if (fieldCount == 1 && (Object.class == elementType
|| Primitive.is(elementType)
|| Number.class.isAssignableFrom(elementType)
|| String.class == elementType)) { // NEW CODE!
return JavaRowFormat.SCALAR;
}
return JavaRowFormat.CUSTOM;
}
{code}
I'll verify that this does not cause any regression, and create a PR for the fix.
> EnumerableTableScan: wrong JavaRowFormat for elementType String
> ---------------------------------------------------------------
>
> Key: CALCITE-3598
> URL: https://issues.apache.org/jira/browse/CALCITE-3598
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.21.0
> Reporter: Ruben Q L
> Assignee: Ruben Q L
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.22.0
>
> Attachments: codeHashJoin.txt, codeNestedLoopJoin.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Problem unveiled by CALCITE-3535, and also separately by CALCITE-3576.
> When CALCITE-3535 was committed, it made {{MaterializationTest#testJoinMaterialization8}} and {{MaterializationTest#testJoinMaterialization9}} change their execution plan from hashJoin to nestedLoopJoin. This caused an exception
> {code}
> java.lang.ClassCastException: java.lang.String$CaseInsensitiveComparator cannot be cast to java.lang.String
> {code}
> which seems unrelated to CALCITE-3535 (or CALCITE-3576), so the tests were temporarily disabled.
> The goal of this ticket is to investigate the root cause of this issue and re-activate both tests.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)