You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2020/10/01 14:16:00 UTC

[jira] [Updated] (CALCITE-4300) EnumerableBatchNestedLoopJoin dynamic code generation can lead to variable name issues if two EBNLJ are nested

     [ https://issues.apache.org/jira/browse/CALCITE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruben Q L updated CALCITE-4300:
-------------------------------
    Description: 
{{EnumerableBatchNestedLoopJoin#implement}} method defines a variable named {{corrList}} in the dynamic code (which will store the correlating variables of the EBNLJ operator). Under certain circumstances (virtually impossible to reproduce on Calcite core, but feasible on downstream projects with further optimizations like IndexScan), this variable naming can lead to issues if two EBNLJ are nested:
{code}
/*   5 */   final com.onwbp.org.apache.calcite.linq4j.Enumerable _inputEnumerable = com.onwbp.org.apache.calcite.linq4j.EnumerableDefaults.correlateBatchJoin(..., ..., new com.onwbp.org.apache.calcite.linq4j.function.Function1() {
/*   6 */     public com.onwbp.org.apache.calcite.linq4j.AbstractEnumerable apply(final java.util.List corrList) { // corrList1
/*   7 */       {
...
/*  11 */         final com.onwbp.org.apache.calcite.linq4j.Enumerable _inputEnumerable = com.onwbp.org.apache.calcite.linq4j.EnumerableDefaults.correlateBatchJoin(..., ..., new com.onwbp.org.apache.calcite.linq4j.function.Function1() {
/*  12 */           public com.onwbp.org.apache.calcite.linq4j.Enumerable apply(final java.util.List corrList) { // corrList2
/*  13 */             {
...
/*  16 */                 myContext.putCorrelatingValue("$cor10.0", ((Object[]) corrList.get(0))[0]); // here it refers to corrList1, problem!
/*  17 */                 myContext.putCorrelatingValue("$cor11.0", ((Object[]) corrList.get(1))[0]); // here it refers to corrList1, problem!
/*  18 */                 myContext.putCorrelatingValue("$cor34.0", (String) corrList.get(0)); // here it refers to corrList2, works by chance
/*  19 */                 myContext.putCorrelatingValue("$cor35.0", (String) corrList.get(1)); // here it refers to corrList2, works by chance
.
{code}

Notice how dynamic code involves two "corrList" (lines 6 and 12); however when they are referenced, the second one is always used, since they share the same name.
The fix is simple, each {{EnumerableBatchNestedLoopJoin}} must guarantee a unique name for its {{corrList}} in the dynamic code.


  was:EnumerableBatchNestedLoopJoin dynamic code generation can lead to variable name issues if two EBNLJ are nested


> EnumerableBatchNestedLoopJoin dynamic code generation can lead to variable name issues if two EBNLJ are nested
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4300
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4300
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Ruben Q L
>            Assignee: Ruben Q L
>            Priority: Major
>             Fix For: 1.26.0
>
>
> {{EnumerableBatchNestedLoopJoin#implement}} method defines a variable named {{corrList}} in the dynamic code (which will store the correlating variables of the EBNLJ operator). Under certain circumstances (virtually impossible to reproduce on Calcite core, but feasible on downstream projects with further optimizations like IndexScan), this variable naming can lead to issues if two EBNLJ are nested:
> {code}
> /*   5 */   final com.onwbp.org.apache.calcite.linq4j.Enumerable _inputEnumerable = com.onwbp.org.apache.calcite.linq4j.EnumerableDefaults.correlateBatchJoin(..., ..., new com.onwbp.org.apache.calcite.linq4j.function.Function1() {
> /*   6 */     public com.onwbp.org.apache.calcite.linq4j.AbstractEnumerable apply(final java.util.List corrList) { // corrList1
> /*   7 */       {
> ...
> /*  11 */         final com.onwbp.org.apache.calcite.linq4j.Enumerable _inputEnumerable = com.onwbp.org.apache.calcite.linq4j.EnumerableDefaults.correlateBatchJoin(..., ..., new com.onwbp.org.apache.calcite.linq4j.function.Function1() {
> /*  12 */           public com.onwbp.org.apache.calcite.linq4j.Enumerable apply(final java.util.List corrList) { // corrList2
> /*  13 */             {
> ...
> /*  16 */                 myContext.putCorrelatingValue("$cor10.0", ((Object[]) corrList.get(0))[0]); // here it refers to corrList1, problem!
> /*  17 */                 myContext.putCorrelatingValue("$cor11.0", ((Object[]) corrList.get(1))[0]); // here it refers to corrList1, problem!
> /*  18 */                 myContext.putCorrelatingValue("$cor34.0", (String) corrList.get(0)); // here it refers to corrList2, works by chance
> /*  19 */                 myContext.putCorrelatingValue("$cor35.0", (String) corrList.get(1)); // here it refers to corrList2, works by chance
> .
> {code}
> Notice how dynamic code involves two "corrList" (lines 6 and 12); however when they are referenced, the second one is always used, since they share the same name.
> The fix is simple, each {{EnumerableBatchNestedLoopJoin}} must guarantee a unique name for its {{corrList}} in the dynamic code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)