You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2016/03/04 12:20:41 UTC

[jira] [Comment Edited] (DRILL-4467) Invalid projection created using PrelUtil.getColumns

    [ https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179759#comment-15179759 ] 

Jacques Nadeau edited comment on DRILL-4467 at 3/4/16 11:20 AM:
----------------------------------------------------------------

This lack of stability also is causing incorrect plans, for example, the plan for this regression test is invalid (but may execute correctly because Drill resolves using names rather than ordinals):

https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/hbase/hbase_pushdown/plan/pushdown_p3.e_tsv

{code:title=Plan in current test framework (wrong, current master)}
    Screen
      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
        Project(row_key=[$1], ITEM=[ITEM($2, 'age')], ITEM2=[ITEM($0, 'gpa')])
          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student, startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter (GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

But once we apply the desiredFields LinkedHashSet fix, we see stability/correct ordinals in the project above the Scan:
{code:title=Plan using LinkedHashSet fix}
00-00    Screen
00-01      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
00-02        Project(row_key=[$0], ITEM=[ITEM($1, 'age')], ITEM2=[ITEM($2, 'gpa')])
00-03          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student, startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter (GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

Note how the columns list in the scan is now consistent with the field indices in the project.


was (Author: jnadeau):
This lack of stability also is causing incorrect plans, for example, the plan for this regression test is invalid (but may execute correctly because Drill resolves using names rather than ordinals):

https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/hbase/hbase_pushdown/plan/pushdown_p3.e_tsv

{code:title=PlanWithoutStablity (wrong)}
    Screen
      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
        Project(row_key=[$1], ITEM=[ITEM($2, 'age')], ITEM2=[ITEM($0, 'gpa')])
          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student, startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter (GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

But once we apply the desiredFields LinkedHashSet fix, we see stability/correct ordinals in the project above the Scan:
{code:title=PlanWithStability}
00-00    Screen
00-01      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
00-02        Project(row_key=[$0], ITEM=[ITEM($1, 'age')], ITEM2=[ITEM($2, 'gpa')])
00-03          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student, startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter (GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}


> Invalid projection created using PrelUtil.getColumns
> ----------------------------------------------------
>
>                 Key: DRILL-4467
>                 URL: https://issues.apache.org/jira/browse/DRILL-4467
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Laurent Goujon
>            Assignee: Jacques Nadeau
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> In {{DrillPushProjIntoScan}}, a new scan and a new projection are created using {{PrelUtil#getColumn(RelDataType, List<RexNode>)}}.
> The returned {{ProjectPushInfo}} instance has several fields, one of them is {{desiredFields}} which is the list of projected fields. There's one instance per {{RexNode}} but because instances were initially added to a set, they might not be in the same order as the order they were created.
> The issue happens in the following code:
> {code:java}
>       List<RexNode> newProjects = Lists.newArrayList();
>       for (RexNode n : proj.getChildExps()) {
>         newProjects.add(n.accept(columnInfo.getInputRewriter()));
>       }
> {code}
> This code creates a new list of projects out of the initial ones, by mapping the indices from the old projects to the new projects, but the indices of the new RexNode instances might be out of order (because of the ordering of desiredFields). And if indices are out of order, the check {{ProjectRemoveRule.isTrivial(newProj)}} will fail.
> My guess is that desiredFields ordering should be preserved when instances are added, to satisfy the condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)