You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/21 00:50:58 UTC

[jira] [Created] (DRILL-5143) Planner creates redundant sort for nested query on single fragment

Paul Rogers created DRILL-5143:
----------------------------------

             Summary: Planner creates redundant sort for nested query on single fragment
                 Key: DRILL-5143
                 URL: https://issues.apache.org/jira/browse/DRILL-5143
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor


The unit test {{TestWindowFrame.testUnboundedFollowing}} identified an optimization opportunity in the planner. It uses the following query:
{code}
SELECT
  position_id,
  employee_id,
  MAX(employee_id) OVER(PARTITION BY position_id) AS `last_value`
FROM (
  SELECT *
  FROM dfs_test.`%s/window/b4.p4`
  ORDER BY position_id, employee_id )
{code}
Which produces the following (heavily elided) plan:
{code}
  "graph" : [ {
    "pop" : "fs-scan",
    ...
  }, {
    "pop" : "project",
    ...
  }, {
    "pop" : "external-sort",
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    }, {
      "order" : "ASC",
      "expr" : "`employee_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "project",
    ...
   "exprs" : [ {
      "ref" : "`$0`",
      "expr" : "`T0¦¦position_id`"
    }...
  }, {
    "pop" : "external-sort",
    ...
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`$0`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
   ...
  }, {
    "pop" : "window",
    ...
    "aggregations" : [ {
      "ref" : "`w0$o0`",
      "expr" : "max(`$1`) "
    } ],
    ...
  }, {
    "pop" : "project",
    ...
  }, {
    "pop" : "screen",
    ...
  } ]
{code}
Note that two sorts are stacked one atop the other. This is a "degenerate" plan; normally a shuffle operation would occur between the two sorts in a distributed query. But, because the query is so small, it runs on a single node and has a redundant sort.

Either:
* Note that the data is already sorted and omit the downstream sort, or
* Note that the inner query sort is ignored by downstream operators and discard the inner sort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)