You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "zhihai xu (JIRA)" <ji...@apache.org> on 2016/07/21 03:17:20 UTC
[jira] [Created] (HIVE-14303) CommonJoinOperator.checkAndGenObject
should return directly in CLOSE state to avoid NPE if ExecReducer.close is
called twice.
zhihai xu created HIVE-14303:
--------------------------------
Summary: CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to avoid NPE if ExecReducer.close is called twice.
Key: HIVE-14303
URL: https://issues.apache.org/jira/browse/HIVE-14303
Project: Hive
Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
Fix For: 2.1.0
CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements Closeable interface and ExecReducer.close can be called multiple time. We saw the following NPE which hide the real exception due to this bug.
{code}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: null
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
... 8 more
{code}
The code from ReduceTask.runOldReducer:
{code}
reducer.close(); //line 453
reducer = null;
out.close(reporter);
out = null;
} finally {
IOUtils.cleanup(LOG, reducer);// line 459
closeQuietly(out, reporter);
}
{code}
Based on the above stack trace and code, reducer.close() is called twice because the exception happened when reducer.close() is called for the first time at line 453, the code exit before reducer was set to null. NullPointerException is triggered when reducer.close() is called for the second time in IOUtils.cleanup. NullPointerException hide the real exception which happened when reducer.close() is called for the first time at line 453.
The reason for NPE is:
The first reducer.close called CommonJoinOperator.closeOp which clear {{storage}}
{code}
Arrays.fill(storage, null);
{code}
the second reduce.close generated NPE due to null {{storage[alias]}} which is set to null by first reducer.close.
The following reducer log can give more proof:
{code}
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing...
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 53466
2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: null
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
... 8 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)