You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/12/15 18:28:59 UTC
[jira] [Updated] (HIVE-15122) Hive: Upcasting types should not
obscure stats (min/max/ndv)
[ https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-15122:
-------------------------------------------
Attachment: HIVE-15122.patch
> Hive: Upcasting types should not obscure stats (min/max/ndv)
> ------------------------------------------------------------
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
> Issue Type: Bug
> Reporter: Siddharth Seth
> Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS: |
> | Stage: Stage-1 |
> | Tez |
> | DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 |
> | Edges: |
> | Map 2 <- Map 1 (BROADCAST_EDGE) |
> | Map 3 <- Map 2 (BROADCAST_EDGE) |
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE) |
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE) |
> | DagName: |
> | Vertices: |
> | Map 1 |
> | Map Operator Tree: |
> | TableScan |
> | alias: supplier |
> | filterExpr: (s_suppkey is not null and s_nationkey is not null) (type: boolean) |
> | Statistics: Num rows: 10000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE |
> | Filter Operator |
> | predicate: (s_suppkey is not null and s_nationkey is not null) (type: boolean) |
> | Statistics: Num rows: 10000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE |
> | Select Operator |
> | expressions: s_suppkey (type: bigint), s_nationkey (type: bigint) |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 10000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> | key expressions: _col0 (type: bigint) |
> | sort order: + |
> | Map-reduce partition columns: _col0 (type: bigint) |
> | Statistics: Num rows: 10000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE |
> | value expressions: _col1 (type: bigint) |
> | Execution mode: vectorized, llap |
> | LLAP IO: all inputs |
> | Map 2 |
> | Map Operator Tree: |
> | TableScan |
> | alias: lineitem |
> | filterExpr: (l_suppkey is not null and l_orderkey is not null) (type: boolean) |
> | Statistics: Num rows: 2285121364 Data size: 63983407882 Basic stats: COMPLETE Column stats: PARTIAL |
> | Filter Operator |
> | predicate: (l_suppkey is not null and l_orderkey is not null) (type: boolean) |
> | Statistics: Num rows: 2285121364 Data size: 127966796384 Basic stats: COMPLETE Column stats: PARTIAL |
> | Select Operator |
> | expressions: l_orderkey (type: bigint), l_suppkey (type: int), l_extendedprice (type: double), l_discount (type: double), l_shipdate (type: date) |
> | outputColumnNames: _col0, _col1, _col2, _col3, _col4 |
> | Statistics: Num rows: 2285121364 Data size: 127966796384 Basic stats: COMPLETE Column stats: PARTIAL |
> | Map Join Operator |
> | condition map: |
> | Inner Join 0 to 1 |
> | keys: |
> | 0 _col0 (type: bigint) |
> | 1 UDFToLong(_col1) (type: bigint) |
> | outputColumnNames: _col1, _col2, _col4, _col5, _col6 |
> | input vertices: |
> | 0 Map 1 |
> | Statistics: Num rows: 10000000 Data size: 880000000 Basic stats: COMPLETE Column stats: PARTIAL |
> | Reduce Output Operator |
> | key expressions: _col2 (type: bigint) |
> | sort order: + |
> | Map-reduce partition columns: _col2 (type: bigint) |
> | Statistics: Num rows: 10000000 Data size: 880000000 Basic stats: COMPLETE Column stats: PARTIAL |
> | value expressions: _col1 (type: bigint), _col4 (type: double), _col5 (type: double), _col6 (type: date) |
> | Execution mode: vectorized, llap |
> | LLAP IO: all inputs |
> | Map 3 |
> | Map Operator Tree: |
> | TableScan |
> | alias: orders |
> | filterExpr: (o_orderkey is not null and o_custkey is not null) (type: boolean) |
> | Statistics: Num rows: 4318801126 Data size: 51825626753 Basic stats: COMPLETE Column stats: NONE |
> | Filter Operator |
> | predicate: (o_orderkey is not null and o_custkey is not null) (type: boolean) |
> | Statistics: Num rows: 4318801126 Data size: 51825626753 Basic stats: COMPLETE Column stats: NONE |
> | Select Operator |
> | expressions: o_orderkey (type: int), o_custkey (type: bigint) |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 4318801126 Data size: 51825626753 Basic stats: COMPLETE Column stats: NONE |
> | Map Join Operator |
> | condition map: |
> | Inner Join 0 to 1 |
> | keys: |
> | 0 _col2 (type: bigint) |
> | 1 UDFToLong(_col0) (type: bigint) |
> | outputColumnNames: _col1, _col4, _col5, _col6, _col8 |
> | input vertices: |
> | 0 Map 2 |
> | Statistics: Num rows: 4750681341 Data size: 57008190663 Basic stats: COMPLETE Column stats: NONE |
> | Reduce Output Operator |
> | key expressions: _col8 (type: bigint) |
> | sort order: + |
> | Map-reduce partition columns: _col8 (type: bigint) |
> | Statistics: Num rows: 4750681341 Data size: 57008190663 Basic stats: COMPLETE Column stats: NONE |
> | value expressions: _col1 (type: bigint), _col4 (type: double), _col5 (type: double), _col6 (type: date) |
> | Execution mode: vectorized, llap |
> | LLAP IO: all inputs |
> | Map 7
> {code}
> Note the Map2 to Map3 output.
> This causes a rather large join (120GB) to be categorized as a map-join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)