You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (Jira)" <ji...@apache.org> on 2019/11/05 03:31:00 UTC

[jira] [Commented] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

    [ https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967202#comment-16967202 ] 

Boaz Ben-Zvi commented on DRILL-7136:
-------------------------------------

When a Hash-Aggr partition is spilled, its hash-table is reset (i.e. reallocated at the default size of 64K), but the prior number of times resizing happened is left as is, as well as the resizing time; hence these stats show the total (across possible multiple iterations of reset-build-spill). So when the stats are reported, they show the *current* hash-table size, and the *accumulated* resizing stats. (Hash-Join does not have this issue, as the hash-table is built only when the partition is whole in memory).

[~rhou] - should these stats be reported differently ?

 

> Num_buckets for HashAgg in profile may be inaccurate
> ----------------------------------------------------
>
>                 Key: DRILL-7136
>                 URL: https://issues.apache.org/jira/browse/DRILL-7136
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Tools, Build &amp; Test
>    Affects Versions: 1.16.0
>            Reporter: Robert Hou
>            Assignee: Boaz Ben-Zvi
>            Priority: Major
>         Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
>
>
> I ran TPCH query 17 with sf 1000.  Here is the query:
> {noformat}
> select
>   sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem l,
>   part p
> where
>   p.p_partkey = l.l_partkey
>   and p.p_brand = 'Brand#13'
>   and p.p_container = 'JUMBO CAN'
>   and l.l_quantity < (
>     select
>       0.2 * avg(l2.l_quantity)
>     from
>       lineitem l2
>     where
>       l2.l_partkey = p.p_partkey
>   );
> {noformat}
> One of the hash agg operators has resized 6 times.  It should have 4M buckets.  But the profile shows it has 64K buckets.
> I have attached a sample profile.  In this profile, the hash agg operator is (04-02).
> {noformat}
> Operator Metrics
> Minor Fragment	NUM_BUCKETS	NUM_ENTRIES	NUM_RESIZING	RESIZING_TIME_MS	NUM_PARTITIONS	SPILLED_PARTITIONS	SPILL_MB	SPILL_CYCLE	INPUT_BATCH_COUNT	AVG_INPUT_BATCH_BYTES	AVG_INPUT_ROW_BYTES	INPUT_RECORD_COUNT	OUTPUT_BATCH_COUNT	AVG_OUTPUT_BATCH_BYTES	AVG_OUTPUT_ROW_BYTES	OUTPUT_RECORD_COUNT
> 04-00-02	65,536	           748,746	6	364	1		582	0	813	582,653	18	26,316,456	401	1,631,943	25	26,176,350
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)