You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2018/11/21 11:27:00 UTC
[jira] [Created] (HIVE-20954) Vector RS operator is not using
uniform hash function for TPC-DS query 95
Teddy Choi created HIVE-20954:
---------------------------------
Summary: Vector RS operator is not using uniform hash function for TPC-DS query 95
Key: HIVE-20954
URL: https://issues.apache.org/jira/browse/HIVE-20954
Project: Hive
Issue Type: Improvement
Reporter: Teddy Choi
Assignee: Teddy Choi
Distribution of rows is skewed in DHJ causing slowdown.
Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator and VectorReduceSinkLongOperator.
{code}
| Select Operator |
| expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
| outputColumnNames: _col0, _col1 |
| Select Vectorization: |
| className: VectorSelectOperator |
| native: true |
| projectedOutputColumnNums: [14, 16] |
| Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col1 (type: bigint) |
| sort order: + |
| Map-reduce partition columns: _col1 (type: bigint) |
| Reduce Sink Vectorization: |
| className: VectorReduceSinkObjectHashOperator |
| keyColumnNums: [16] |
| native: true |
| nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
| partitionColumnNums: [16] |
| valueColumnNums: [14] |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Reduce Output Operator |
| key expressions: _col1 (type: bigint) |
| sort order: + |
| Map-reduce partition columns: _col1 (type: bigint) |
| Reduce Sink Vectorization: |
| className: VectorReduceSinkLongOperator |
| keyColumnNums: [16] |
| native: true |
| nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
| valueColumnNums: [14] |
| Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Execution mode: vectorized, llap |
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)