You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2019/11/20 07:00:00 UTC

[jira] [Updated] (DRILL-7451) Planner inserts "trivial" top project node for simple query

     [ https://issues.apache.org/jira/browse/DRILL-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-7451:
-------------------------------
    Summary: Planner inserts "trivial" top project node for simple query  (was: Planner inserts project node even if scan handles project push-down)

> Planner inserts "trivial" top project node for simple query
> -----------------------------------------------------------
>
>                 Key: DRILL-7451
>                 URL: https://issues.apache.org/jira/browse/DRILL-7451
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Priority: Minor
>
> I created a "dummy" storage plugin for testing. The test does a simple query:
> {code:sql}
> SELECT a, b, c from dummy.myTable
> {code}
> The first test is to mark the plugin's group scan as supporting projection push down. However, Drill still creates a projection node in the logical plan:
> {code:json}
>   "graph" : [ {
>     "pop" : "DummyGroupScan",
>     "@id" : 2,
>     "columns" : [ "`**`" ],
>     "userName" : "progers",
>     "cost" : {
>       "memoryCost" : 1.6777216E7,
>       "outputRowCount" : 10000.0
>     }
>   }, {
>     "pop" : "project",
>     "@id" : 1,
>     "exprs" : [ {
>       "ref" : "`a`",
>       "expr" : "`a`"
>     }, {
>       "ref" : "`b`",
>       "expr" : "`b`"
>     }, {
>       "ref" : "`c`",
>       "expr" : "`c`"
>     } ],
>     "child" : 2,
>     "outputProj" : true,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : {
>       "memoryCost" : 1.6777216E7,
>       "outputRowCount" : 10000.0
>     }
>   }, {
>     "pop" : "screen",
>     "@id" : 0,
>     "child" : 1,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : {
>       "memoryCost" : 1.6777216E7,
>       "outputRowCount" : 10000.0
>     }
>   } ]
> {code}
> There is [a comment in the code|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java#L109] that suggests the project should be removed:
> {code:java}
>         // project above scan may be removed in ProjectRemoveRule for
>         // the case when it is trivial
> {code}
> As shown in the example, the project is trivial. There is a subtlety: it may be that the scan, unknown to the planner, produce additional columns, say {{d}} and {{e}} which the project operator is needed to remove.
> If this is the reason the project remains, perhaps we can add a flag of some kind where the group scan can insist that not only does it handle projection, it will not insert additional columns. At that point, the project is completely unnecessary in this case.
> This is not a functional bug; just a performance issue: we exercise the machinery of the project operator to do exactly nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)