You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/01/16 18:58:26 UTC
[jira] [Created] (DRILL-5199) Planner inserts three projects when one will do

Paul Rogers created DRILL-5199:
----------------------------------

             Summary: Planner inserts three projects when one will do
                 Key: DRILL-5199
                 URL: https://issues.apache.org/jira/browse/DRILL-5199
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor


See the query and description for DRILL-5198. The plan in that query has a number of opportunities for improvement. This bug touches on a minor issue: the plan has a series of three project operators in series when a single project would probably work just as well (and would be somewhat more efficient.)

Here is the subset of the plan in question:

{code}
02-01                        UnorderedMuxExchange : rowType = RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, cumulative cost = {7.17696212E8 rows, 1.973664583E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 449
03-01                          Project(T0¦¦*=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 448
03-02                            Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : rowType = RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {3.58848106E8 rows, 1.076544318E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 447
03-03                              Project(T0¦¦*=[$0], columns=[$1]) : rowType = RecordType(ANY T0¦¦*, ANY columns): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
03-04                                Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]]) : rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8 rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445
{code}

This issue is minor because project is a relatively inexpensive operation (insert or remove a vector, done batch-by-batch, rather than a row-by-row operation.) Still, every little bit of optimization helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)