You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2019/03/27 20:48:00 UTC

[jira] [Created] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

Robert Hou created DRILL-7136:
---------------------------------

             Summary: Num_buckets for HashAgg in profile may be inaccurate
                 Key: DRILL-7136
                 URL: https://issues.apache.org/jira/browse/DRILL-7136
             Project: Apache Drill
          Issue Type: Bug
          Components: Tools, Build &amp; Test
    Affects Versions: 1.16.0
            Reporter: Robert Hou
            Assignee: Pritesh Maker
             Fix For: 1.16.0
         Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill

I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
    select
      0.2 * avg(l2.l_quantity)
    from
      lineitem l2
    where
      l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is (04-02).
{noformat}
Operator Metrics
Minor Fragment	NUM_BUCKETS	NUM_ENTRIES	NUM_RESIZING	RESIZING_TIME_MS	NUM_PARTITIONS	SPILLED_PARTITIONS	SPILL_MB	SPILL_CYCLE	INPUT_BATCH_COUNT	AVG_INPUT_BATCH_BYTES	AVG_INPUT_ROW_BYTES	INPUT_RECORD_COUNT	OUTPUT_BATCH_COUNT	AVG_OUTPUT_BATCH_BYTES	AVG_OUTPUT_ROW_BYTES	OUTPUT_RECORD_COUNT
04-00-02	65,536	748,746	6	364	1		582	0	813	582,653	18	26,316,456	401	1,631,943	25	26,176,350
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)