You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2017/06/12 22:43:01 UTC

[jira] [Commented] (CALCITE-1584) ProjectRemoveRule loses field names

    [ https://issues.apache.org/jira/browse/CALCITE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047157#comment-16047157 ] 

Jinfeng Ni commented on CALCITE-1584:
-------------------------------------

[~julianhyde] is right that Drill relies heavily on field name in execution, since Drill's planner does not have knowledge of table schema. Because of this, Drill execution has to use name-based resolution. However, this particular issue is not caused by Drill's reliance on field name. It seems to be caused by the difference how Calcite's converter works, and how Drill's planner works. 

Here is what Calcite's plan looks like using Calcite's sqlline (on today's master branch).
{code}
explain plan for select "full_name" as MYNAME from "foodmart"."employee";
+------+
| PLAN |
+------+
| JdbcToEnumerableConverter
  JdbcProject(full_name=[$1])
    JdbcTableScan(table=[[foodmart, employee]])
{code}

Notice that JdbcProject is project $1 to a field named as {{full_name}}, not {{MYNAME}}. From Drill's perspective, such plan would cause the exact problem as reported in DRILL-5538, while Calcite seems to produce the correct result.  

Turns out that Calcite's Converter is not only returning the converted RelNode (which may not have Project with "MYNAME", or have a Project with a different name), but also returning the {{validatedNoteType}} \[1\]. The {{validatedNodeType}} contains the field name {{MYNAME}}.  The underlying assumption of using Calcite Sql2RelConverter is to use both the converted RelNode and {{validatedNodeType}}.  Using the first part only would lose field name, just like what we saw in DRILL-5538.

In Drill, because of the above assumption made in Calcite, after Converter returns a converted relnode, we also created a TopProject based on the {{validatedNodeType}}, during logical planning. I believe the reason DRILL-5538 exposes the problem, is because {{ProjectRemoveRule}} is added in physical planning, and we may lost such TopProject. In that sense, the Calcite's converter assumption also implies that if people uses PlannerImpl \[2\] to do planning, not the Calcite's default planner, they have to remember to use the {{validatedNodeType}}, after all planning phases have completed. Otherwise, they might run into problems like DRILL-5538.

For DRILL-5538, seems to me the fix is to move the creating of TopProject with {{validatedNodeType}} after physical planning is done. That probably would solve the problem. 


1. https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L576-L578
2. https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/PlannerImpl.java



> ProjectRemoveRule loses field names
> -----------------------------------
>
>                 Key: CALCITE-1584
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1584
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Jess Balint
>            Assignee: Julian Hyde
>            Priority: Minor
>
> the rule doesn't properly identify a child {{Project}} node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)