You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/21 11:56:00 UTC

[jira] [Commented] (HIVE-20954) Vector RS operator is not using uniform hash function for TPC-DS query 95

    [ https://issues.apache.org/jira/browse/HIVE-20954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694601#comment-16694601 ] 

ASF GitHub Bot commented on HIVE-20954:
---------------------------------------

GitHub user pudidic opened a pull request:

    https://github.com/apache/hive/pull/492

    HIVE-20954: Vector RS operator is not using uniform hash function for…

    … TPC-DS query 95 (Teddy Choi)
    
    Change-Id: Ia23b5ddefc2b35cda9ed7d817bdbd767ec7f7671

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pudidic/hive HIVE-20954

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #492
    
----
commit 17dd49c160eeea7b5a511e9a6801e2e9b298ac1a
Author: Teddy Choi <tc...@...>
Date:   2018-11-21T11:55:04Z

    HIVE-20954: Vector RS operator is not using uniform hash function for TPC-DS query 95 (Teddy Choi)
    
    Change-Id: Ia23b5ddefc2b35cda9ed7d817bdbd767ec7f7671

----


> Vector RS operator is not using uniform hash function for TPC-DS query 95
> -------------------------------------------------------------------------
>
>                 Key: HIVE-20954
>                 URL: https://issues.apache.org/jira/browse/HIVE-20954
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-20954.1.patch
>
>
> Distribution of rows is skewed in DHJ causing slowdown.
> Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator and VectorReduceSinkLongOperator.
> {code}
> |                     Select Operator                |
> |                       expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
> |                       outputColumnNames: _col0, _col1 |
> |                       Select Vectorization:        |
> |                           className: VectorSelectOperator |
> |                           native: true             |
> |                           projectedOutputColumnNums: [14, 16] |
> |                       Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                       Reduce Output Operator       |
> |                         key expressions: _col1 (type: bigint) |
> |                         sort order: +              |
> |                         Map-reduce partition columns: _col1 (type: bigint) |
> |                         Reduce Sink Vectorization: |
> |                             className: VectorReduceSinkObjectHashOperator |
> |                             keyColumnNums: [16]    |
> |                             native: true           |
> |                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
> |                             partitionColumnNums: [16] |
> |                             valueColumnNums: [14]  |
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> |                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                         value expressions: _col0 (type: bigint) |
> |                       Reduce Output Operator       |
> |                         key expressions: _col1 (type: bigint) |
> |                         sort order: +              |
> |                         Map-reduce partition columns: _col1 (type: bigint) |
> |                         Reduce Sink Vectorization: |
> |                             className: VectorReduceSinkLongOperator |
> |                             keyColumnNums: [16]    |
> |                             native: true           |
> |                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
> |                             valueColumnNums: [14]  |
> |                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                         value expressions: _col0 (type: bigint) |
> |             Execution mode: vectorized, llap       |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)