You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wei Zheng (JIRA)" <ji...@apache.org> on 2015/04/22 23:20:58 UTC
[jira] [Commented] (HIVE-10446) Hybrid Hybrid Grace Hash Join :
java.lang.IllegalArgumentException in Kryo while spilling big table
[ https://issues.apache.org/jira/browse/HIVE-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507926#comment-14507926 ]
Wei Zheng commented on HIVE-10446:
----------------------------------
Will take a look shortly
> Hybrid Hybrid Grace Hash Join : java.lang.IllegalArgumentException in Kryo while spilling big table
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-10446
> URL: https://issues.apache.org/jira/browse/HIVE-10446
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.2.0
> Reporter: Mostafa Mokhtar
> Assignee: Wei Zheng
> Fix For: 1.2.0
>
>
> TPC-DS Q85 fails with Kryo exception when spilling big table data.
> Query
> {code}
> select substr(r_reason_desc,1,20) as r
> ,avg(wr_return_ship_cost) wq
> ,avg(wr_refunded_cash) ref
> ,avg(wr_fee) fee
> from web_returns, customer_demographics cd1,
> customer_demographics cd2, customer_address, date_dim, reason
> where
> cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk
> and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
> and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
> and reason.r_reason_sk = web_returns.wr_reason_sk
> and cd1.cd_marital_status = cd2.cd_marital_status
> and cd1.cd_education_status = cd2.cd_education_status
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> {code}
> Plan
> {code}
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> DagName: mmokhtar_20150422165209_d8eb5634-c19f-4576-9525-cad248c7ca37:5
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: web_returns
> filterExpr: (((wr_refunded_addr_sk is not null and wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and wr_returning_cdemo_sk is not null) (type: boolean)
> Statistics: Num rows: 2062802370 Data size: 185695406284 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((wr_refunded_addr_sk is not null and wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and wr_returning_cdemo_sk is not null) (type: boolean)
> Statistics: Num rows: 1875154723 Data size: 51267313780 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: wr_refunded_cdemo_sk (type: int), wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int), wr_reason_sk (type: int), wr_fee (type: float), wr_return_ship_cost (type: float), wr_refunded_cash (type: float)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
> Statistics: Num rows: 1875154723 Data size: 51267313780 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col1 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6
> input vertices:
> 1 Map 4
> Statistics: Num rows: 1875154688 Data size: 45003712512 Basic stats: COMPLETE Column stats: COMPLETE
> HybridGraceHashJoin: true
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col3 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col2, _col4, _col5, _col6, _col9
> input vertices:
> 1 Map 5
> Statistics: Num rows: 1875154688 Data size: 219393098496 Basic stats: COMPLETE Column stats: COMPLETE
> HybridGraceHashJoin: true
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col2, _col4, _col5, _col6, _col9, _col11, _col12
> input vertices:
> 1 Map 6
> Statistics: Num rows: 1875154688 Data size: 547545168896 Basic stats: COMPLETE Column stats: COMPLETE
> HybridGraceHashJoin: true
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int), _col11 (type: string), _col12 (type: string)
> 1 _col0 (type: int), _col1 (type: string), _col2 (type: string)
> outputColumnNames: _col4, _col5, _col6, _col9
> input vertices:
> 1 Map 7
> Statistics: Num rows: 402058172 Data size: 43824340748 Basic stats: COMPLETE Column stats: COMPLETE
> HybridGraceHashJoin: true
> Select Operator
> expressions: _col9 (type: string), _col5 (type: float), _col6 (type: float), _col4 (type: float)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 402058172 Data size: 43824340748 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: avg(_col1), avg(_col2), avg(_col3)
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 10975 Data size: 1064575 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 10975 Data size: 1064575 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: struct<count:bigint,sum:double,input:float>), _col2 (type: struct<count:bigint,sum:double,input:float>), _col3 (type: struct<count:bigint,sum:double,input:float>)
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: customer_address
> filterExpr: ca_address_sk is not null (type: boolean)
> Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ca_address_sk is not null (type: boolean)
> Statistics: Num rows: 40000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: ca_address_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 40000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 40000000 Data size: 160000000 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 5
> Map Operator Tree:
> TableScan
> alias: reason
> filterExpr: r_reason_sk is not null (type: boolean)
> Statistics: Num rows: 72 Data size: 14400 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: r_reason_sk is not null (type: boolean)
> Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: r_reason_sk (type: int), r_reason_desc (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string)
> Execution mode: vectorized
> Map 6
> Map Operator Tree:
> TableScan
> alias: cd1
> filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: string)
> Execution mode: vectorized
> Map 7
> Map Operator Tree:
> TableScan
> alias: cd1
> filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: string), _col2 (type: string)
> Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Group By Operator
> aggregations: avg(VALUE._col0), avg(VALUE._col1), avg(VALUE._col2)
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 25 Data size: 3025 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: substr(_col0, 1, 20) (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double)
> sort order: ++++
> Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE Column stats: COMPLETE
> TopN Hash Memory Usage: 0.04
> Reducer 3
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), KEY.reducesinkkey3 (type: double)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE Column stats: COMPLETE
> Limit
> Number of rows: 100
> Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: 100
> Processor Tree:
> ListSink
> {code}
> Exception
> {code}
> ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
> ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: output cannot be null.
> at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:411)
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:287)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> ... 18 more
> Caused by: java.lang.IllegalArgumentException: output cannot be null.
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:601)
> at org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer.add(ObjectContainer.java:101)
> at org.apache.hadoop.hive.ql.exec.MapJoinOperator.spillBigTableRow(MapJoinOperator.java:425)
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:307)
> at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390)
> ... 27 more
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1426707664723_3652_3_04 [Map 1] killed/failed due to:null]Vertex killed, vertexName=Reducer 3, vertexId=vertex_1426707664723_3652_3_06, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1426707664723_3652_3_06 [Reducer 3] killed/failed due to:null]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1426707664
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)