You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2019/03/19 23:15:00 UTC

[jira] [Created] (DRILL-7121) TPCH 4 takes longer

Robert Hou created DRILL-7121:
---------------------------------

             Summary: TPCH 4 takes longer
                 Key: DRILL-7121
                 URL: https://issues.apache.org/jira/browse/DRILL-7121
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning &amp; Optimization
    Affects Versions: 1.16.0
            Reporter: Robert Hou
            Assignee: Gautam Parai
             Fix For: 1.16.0


Here is TPCH 4 with sf 100:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
    select
      *
    from
      lineitem l
    where
      l.l_orderkey = o.o_orderkey
      and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

The plan has changed when Statistics is disabled.   A Hash Agg and a Broadcast Exchange have been added.  These two operators expand the number of rows from the lineitem table from 137M to 9B rows.   This forces the hash join to use 6GB of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)