You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2015/04/25 02:23:38 UTC
[jira] [Commented] (HIVE-10484) Vectorization : RuntimeException
"Big Table Retained Mapping duplicate column"
[ https://issues.apache.org/jira/browse/HIVE-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512088#comment-14512088 ]
Matt McCline commented on HIVE-10484:
-------------------------------------
I was able to vectorize the query. I'm wondering what environment variables are different that cause the issue you reported.
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Map 3 <- Map 1 (BROADCAST_EDGE)
Reducer 4 <- Map 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
Filter Operator
predicate: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ss_store_sk (type: int)
1 s_store_sk (type: int)
outputColumnNames: _col0, _col21, _col22, _col26, _col50
input vertices:
1 Map 2
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
HybridGraceHashJoin: true
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
value expressions: _col21 (type: decimal(7,2)), _col22 (type: int), _col26 (type: int), _col50 (type: string)
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: store
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
Filter Operator
predicate: s_store_sk is not null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
Reduce Output Operator
key expressions: s_store_sk (type: int)
sort order: +
Map-reduce partition columns: s_store_sk (type: int)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
value expressions: s_state (type: string)
Execution mode: vectorized
Map 3
Map Operator Tree:
TableScan
alias: date_dim
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (d_date_sk is not null and d_month_seq BETWEEN 1193 AND 1204) (type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 d_date_sk (type: int)
outputColumnNames: _col0, _col21, _col22, _col26, _col50, _col58, _col61
input vertices:
0 Map 1
Statistics: Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Filter Operator
predicate: ((_col61 BETWEEN 1193 AND 1204 and (_col58 = _col0)) and (_col26 = _col22)) (type: boolean)
Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col50 (type: string), _col21 (type: decimal(7,2))
outputColumnNames: _col50, _col21
Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(_col21)
keys: _col50 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: decimal(17,2))
Execution mode: vectorized
Reducer 4
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: decimal(17,2))
sort order: +-
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: decimal(17,2))
Execution mode: vectorized
Reducer 5
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: decimal(17,2))
outputColumnNames: _col0, _col1
Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col0: string, _col1: decimal(17,2)
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: _col1(DESC)
partition by: _col0
raw input shape:
window functions:
window function definition
alias: rank_window_0
arguments: _col1
name: rank
window function: GenericUDAFRankEvaluator
window frame: PRECEDING(MAX)~FOLLOWING(MAX)
isPivotResult: true
Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (rank_window_0 <= 5) (type: boolean)
Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
> Vectorization : RuntimeException "Big Table Retained Mapping duplicate column"
> ------------------------------------------------------------------------------
>
> Key: HIVE-10484
> URL: https://issues.apache.org/jira/browse/HIVE-10484
> Project: Hive
> Issue Type: Bug
> Components: Tez, Vectorization
> Affects Versions: 1.2.0
> Reporter: Mostafa Mokhtar
> Assignee: Matt McCline
> Fix For: 1.2.0
>
>
> With vectorization and tez enabled TPC-DS Q70 fails with
> {code}
> Caused by: java.lang.RuntimeException: Big Table Retained Mapping duplicate column 6 in ordered column map {6=(value column: 6, type name: int), 21=(value column: 21, type name: float), 22=(value column: 22, type name: int)} when adding value column 6, type int
> at org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:97)
> at org.apache.hadoop.hive.ql.exec.vector.VectorColumnOutputMapping.add(VectorColumnOutputMapping.java:40)
> at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.determineCommonInfo(VectorMapJoinCommonOperator.java:320)
> at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.<init>(VectorMapJoinCommonOperator.java:254)
> at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.<init>(VectorMapJoinGenerateResultOperator.java:89)
> at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.<init>(VectorMapJoinInnerGenerateResultOperator.java:97)
> at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.<init>(VectorMapJoinInnerLongOperator.java:79)
> ... 49 more
> {code}
> Query
> {code}
> select s_state
> from (select s_state as s_state, sum(ss_net_profit),
> rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking
> from store_sales, store, date_dim
> where d_month_seq between 1193 and 1193+11
> and date_dim.d_date_sk = store_sales.ss_sold_date_sk
> and store.s_store_sk = store_sales.ss_store_sk
> group by s_state
> ) tmp1
> where ranking <= 5
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)