You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2018/11/21 11:27:00 UTC

[jira] [Created] (HIVE-20954) Vector RS operator is not using uniform hash function for TPC-DS query 95

Teddy Choi created HIVE-20954:
---------------------------------

             Summary: Vector RS operator is not using uniform hash function for TPC-DS query 95
                 Key: HIVE-20954
                 URL: https://issues.apache.org/jira/browse/HIVE-20954
             Project: Hive
          Issue Type: Improvement
            Reporter: Teddy Choi
            Assignee: Teddy Choi


Distribution of rows is skewed in DHJ causing slowdown.

Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator and VectorReduceSinkLongOperator.

{code}
|                     Select Operator                |
|                       expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
|                       outputColumnNames: _col0, _col1 |
|                       Select Vectorization:        |
|                           className: VectorSelectOperator |
|                           native: true             |
|                           projectedOutputColumnNums: [14, 16] |
|                       Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
|                       Reduce Output Operator       |
|                         key expressions: _col1 (type: bigint) |
|                         sort order: +              |
|                         Map-reduce partition columns: _col1 (type: bigint) |
|                         Reduce Sink Vectorization: |
|                             className: VectorReduceSinkObjectHashOperator |
|                             keyColumnNums: [16]    |
|                             native: true           |
|                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
|                             partitionColumnNums: [16] |
|                             valueColumnNums: [14]  |
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
|                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
|                         value expressions: _col0 (type: bigint) |
|                       Reduce Output Operator       |
|                         key expressions: _col1 (type: bigint) |
|                         sort order: +              |
|                         Map-reduce partition columns: _col1 (type: bigint) |
|                         Reduce Sink Vectorization: |
|                             className: VectorReduceSinkLongOperator |
|                             keyColumnNums: [16]    |
|                             native: true           |
|                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
|                             valueColumnNums: [14]  |
|                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
|                         value expressions: _col0 (type: bigint) |
|             Execution mode: vectorized, llap       |
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)