You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/21 18:27:00 UTC

[jira] [Commented] (TRAFODION-2965) Hash partial groupby does not report a row count in operator statistics

    [ https://issues.apache.org/jira/browse/TRAFODION-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371813#comment-16371813 ] 

ASF GitHub Bot commented on TRAFODION-2965:
-------------------------------------------

GitHub user zellerh opened a pull request:

    https://github.com/apache/trafodion/pull/1451

    [TRAFODION-2965] Fix row count stats for partial groupby

    Hash partial groupbys now report their row count in operator-level
    statistics. These are not considered BMOs, so they need to use the
    generic stats entry, not the BMO stats, to report the row count.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zellerh/trafodion bug/R23

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1451
    
----
commit 39a3f2047a847768ddcc84b7a549eb322b7b4c48
Author: Hans Zeller <hz...@...>
Date:   2018-02-21T18:23:20Z

    [TRAFODION-2965] Fix row count stats for partial groupby
    
    Hash partial groupbys now report their row count in operator-level
    statistics. These are not considered BMOs, so they need to use the
    generic stats entry, not the BMO stats, to report the row count.

----


> Hash partial groupby does not report a row count in operator statistics
> -----------------------------------------------------------------------
>
>                 Key: TRAFODION-2965
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2965
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>         Environment: any
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>            Priority: Major
>             Fix For: 2.4
>
>
> Here is a test case that demonstrates this:
> {noformat}
> update statistics for table hive.hive.time_dim on every column;
> control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
> prepare s from
> select count(distinct t_time_id) from hive.hive.time_dim; 
> explain options 'f' s;
> execute s;
> get statistics for qid current default;
> {noformat}
> The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest EX_HASH_GRBY with id 2:
> {noformat}
>    LC   RC   Id PaId ExId Frag TDB Name                   DOP   Dispatches        OperCpuTime        EstRowsUsed        ActRowsUsed        ActDataUsed    Details
>    13    .   14    .    8    0 EX_ROOT                      1            2                 37                  0                  1                  8 3658
>    12    .   13   14    7    0 EX_SORT_GRBY                 1            5                 61                  1                  1                  8
>    11    .   12   13    6    0 EX_SPLIT_TOP                 1            8                171                  1                  4                 32
>    10    .   11   12    6    0 EX_SEND_TOP                  4           16              3,389                  1                  4                 64
>     9    .   10   11    6    2 EX_SEND_BOTTOM               4           12                683                  1                  4                 64
>     8    .    9   10    6    2 EX_SPLIT_BOTTOM              4           19                597                  1                  4                 32 833874
>     7    .    8    9    5    2 EX_SORT_GRBY                 4        2,259            289,200                  1                  4                 32
>     6    .    7    8    4    2 EX_HASH_GRBY                 4        2,259            394,235               1451             86,400      2,765,059,200 0|0|0
>     5    .    6    7    3    2 EX_SPLIT_TOP                 4        2,211             51,034             110007             86,400      2,765,059,200
>     4    .    5    6    3    2 EX_SEND_TOP                  8        2,436             98,125             110007             86,400      2,766,528,000
>     3    .    4    5    3    3 EX_SEND_BOTTOM               8       10,658            196,714             110007             86,400      2,766,528,000
>     2    .    3    4    3    3 EX_SPLIT_BOTTOM              2        2,663             95,246             110007             86,400      2,765,059,200 1547521
>     1    .    2    3    2    3 EX_HASH_GRBY                 2        4,521            303,319            55003.5                  0                  0
>     .    .    1    2    1    3 EX_HDFS_SCAN                 2        2,650            952,242             116085             86,400      2,765,059,200 HIVE.HIVE.TIME_DIM|86400|5288324
> {noformat}
> The reason is that the hash groupby reports its row count in the BMO stats. However, a partial hash groupby is not considered a BMO (Big Memory Operator), so no rowcount gets reported. The fix is to increment the rowcount in the generic stats entry that is present in both partial and full groupby operators.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)