You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Oleksiy Sayankin (JIRA)" <ji...@apache.org> on 2018/04/24 13:16:00 UTC
[jira] [Comment Edited] (HIVE-19286) NPE in MERGE operator on MR
mode
[ https://issues.apache.org/jira/browse/HIVE-19286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449820#comment-16449820 ]
Oleksiy Sayankin edited comment on HIVE-19286 at 4/24/18 1:15 PM:
------------------------------------------------------------------
My results after some debug. NPE happens because {{inspector.getStructFieldRef(names[0]);}} returns {{null}}
{code}
@Override
public ObjectInspector initialize(ObjectInspector rowInspector) throws HiveException {
// We need to support field names like KEY.0, VALUE.1 between
// map-reduce boundary.
String[] names = expr.getColumn().split("\\.");
String[] unionfields = names[0].split("\\:");
if (names.length == 1 && unionfields.length == 1) {
simpleCase = true;
inspector = (StructObjectInspector) rowInspector;
field = inspector.getStructFieldRef(names[0]);
return outputOI = field.getFieldObjectInspector();
}
{code}
in {{ExprNodeColumnEvaluator}}. Here {{names[0]}} == {{"ROW__ID"}}. Class {{OrcStruct}} contains method
{code}
@Override
public StructField getStructFieldRef(String s) {
for(StructField field: fields) {
if (field.getFieldName().equalsIgnoreCase(s)) {
return field;
}
}
return null;
}
{code}
and array {{fields}} is initialized with {{StructField}} for only four columns: {{id}}, {{first_name}}, {{last_name}}, {{age}}. So it returns {{null}} when {{s}} == {{"ROW__ID"}} . Hive wants no insert {{"ROW__ID"}} because it transforms {{MERGE}} into multiple {{INSERT}}:
{code}
FROM
`default`.`customer_target` `trg`
RIGHT OUTER JOIN
`default`.`customer_source` `src`
ON `src`.`id` = `trg`.`id`
INSERT INTO `default`.`customer_target` -- update clause
select `trg`.ROW__ID, `trg`.`id`, `src`.`first_name`, `src`.`last_name`, `trg`.`age`
WHERE `src`.`id` = `trg`.`id`
sort by `trg`.ROW__ID
INSERT INTO `default`.`customer_target` -- insert clause
select `src`.`id`, `src`.`first_name`, `src`.`last_name`, `src`.`age`
WHERE `trg`.`id` IS NULL
INSERT INTO merge_tmp_table
SELECT cardinality_violation(`trg`.ROW__ID)
WHERE `src`.`id` = `trg`.`id` GROUP BY `trg`.ROW__ID HAVING count(*) > 1
{code}
was (Author: osayankin):
My results after some debug. NPE happens because {{inspector.getStructFieldRef(names[0]);}} returns {{null}}
{code}
@Override
public ObjectInspector initialize(ObjectInspector rowInspector) throws HiveException {
// We need to support field names like KEY.0, VALUE.1 between
// map-reduce boundary.
String[] names = expr.getColumn().split("\\.");
String[] unionfields = names[0].split("\\:");
if (names.length == 1 && unionfields.length == 1) {
simpleCase = true;
inspector = (StructObjectInspector) rowInspector;
field = inspector.getStructFieldRef(names[0]);
return outputOI = field.getFieldObjectInspector();
}
{code}
in {{ExprNodeColumnEvaluator}}. Here {{names[0] == "ROW__ID"}}. Class {{OrcStruct}} contains method
{code}
@Override
public StructField getStructFieldRef(String s) {
for(StructField field: fields) {
if (field.getFieldName().equalsIgnoreCase(s)) {
return field;
}
}
return null;
}
{code}
and array {{fields}} is initialized with {{StructField}} for only four columns: {{id}}, {{first_name}}, {{last_name}}, {{age}}. So it returns {{null}} when {{s == "ROW__ID"}}. Hive wants no insert {{ROW__ID}} because it transforms {{MERGE}} into multiple {{INSERT}}:
{code}
FROM
`default`.`customer_target` `trg`
RIGHT OUTER JOIN
`default`.`customer_source` `src`
ON `src`.`id` = `trg`.`id`
INSERT INTO `default`.`customer_target` -- update clause
select `trg`.ROW__ID, `trg`.`id`, `src`.`first_name`, `src`.`last_name`, `trg`.`age`
WHERE `src`.`id` = `trg`.`id`
sort by `trg`.ROW__ID
INSERT INTO `default`.`customer_target` -- insert clause
select `src`.`id`, `src`.`first_name`, `src`.`last_name`, `src`.`age`
WHERE `trg`.`id` IS NULL
INSERT INTO merge_tmp_table
SELECT cardinality_violation(`trg`.ROW__ID)
WHERE `src`.`id` = `trg`.`id` GROUP BY `trg`.ROW__ID HAVING count(*) > 1
{code}
> NPE in MERGE operator on MR mode
> --------------------------------
>
> Key: HIVE-19286
> URL: https://issues.apache.org/jira/browse/HIVE-19286
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.3.3
> Reporter: Oleksiy Sayankin
> Assignee: Oleksiy Sayankin
> Priority: Blocker
>
> *General Info*
> Hive version : 2.3.3
> {code}
> commit 3f7dde31aed44b5440563d3f9d8a8887beccf0be
> Author: Daniel Dai <da...@hortonworks.com>
> Date: Wed Mar 28 16:46:29 2018 -0700
> Preparing for 2.3.3 release
> {code}
> Hadoop version: 2.7.2.
> Engine
> {code}
> hive> set hive.execution.engine;
> hive.execution.engine=mr
> {code}
> *STEP 1. Create test data*
> {code}
> DROP TABLE IF EXISTS customer_target;
> DROP TABLE IF EXISTS customer_source;
> {code}
> {code}
> CREATE TABLE customer_target (id STRING, first_name STRING, last_name STRING, age INT) clustered by (id) into 2 buckets stored as ORC TBLPROPERTIES ('transactional'='true');
> {code}
> {code}
> insert into customer_target values ('001', 'John', 'Smith', 45), ('002', 'Michael', 'Watson', 27), ('003', 'Den', 'Brown', 33);
> SELECT id, first_name, last_name, age FROM customer_target;
> {code}
> {code}
> +------+-------------+------------+------+
> | id | first_name | last_name | age |
> +------+-------------+------------+------+
> | 002 | Michael | Watson | 27 |
> | 001 | John | Smith | 45 |
> | 003 | Den | Brown | 33 |
> +------+-------------+------------+------+
> {code}
> {code}
> CREATE TABLE customer_source (id STRING, first_name STRING, last_name STRING, age INT);
> insert into customer_source values ('001', 'Dorothi', 'Hogward', 77), ('007', 'Alex', 'Bowee', 1), ('088', 'Robert', 'Dowson', 25);
> SELECT id, first_name, last_name, age FROM customer_source;
> {code}
> {code}
> +------+-------------+------------+------+
> | id | first_name | last_name | age |
> +------+-------------+------------+------+
> | 088 | Robert | Dowson | 25 |
> | 001 | Dorothi | Hogward | 77 |
> | 007 | Alex | Bowee | 1 |
> +------+-------------+------------+------+
> {code}
> *STEP 2. Merge data*
> {code}
> merge into customer_target trg using customer_source src on src.id = trg.id when matched then update set first_name = src.first_name, last_name = src.last_name when not matched then insert values (src.id, src.first_name, src.last_name, src.age);
> {code}
> *ACTUAL RESULT*
> {code}
> 2018-04-24T07:11:44,448 DEBUG [main] log.PerfLogger: <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.SerializationUtilities>
> 2018-04-24T07:11:44,448 INFO [main] exec.SerializationUtilities: Deserializing MapredLocalWork using kryo
> 2018-04-24T07:11:44,463 DEBUG [main] exec.Utilities: Hive Conf not found or Session not initiated, use thread based class loader instead
> 2018-04-24T07:11:44,538 DEBUG [main] log.PerfLogger: </PERFLOG method=deserializePlan start=1524568304448 end=1524568304538 duration=90 from=org.apache.hadoop.hive.ql.exec.SerializationUtilities>
> 2018-04-24T07:11:44,545 INFO [main] mr.MapredLocalTask: 2018-04-24 07:11:44 Starting to launch local task to process map join; maximum memory = 477626368
> 2018-04-24T07:11:44,545 DEBUG [main] mr.MapredLocalTask: initializeOperators: trg, children = [HASHTABLESINK[37]]
> 2018-04-24T07:11:44,656 DEBUG [main] exec.Utilities: Hive Conf not found or Session not initiated, use thread based class loader instead
> 2018-04-24T07:11:44,676 INFO [main] mr.MapredLocalTask: fetchoperator for trg created
> 2018-04-24T07:11:44,676 INFO [main] exec.TableScanOperator: Initializing operator TS[0]
> 2018-04-24T07:11:44,676 DEBUG [main] exec.TableScanOperator: Initialization Done 0 TS
> 2018-04-24T07:11:44,676 DEBUG [main] exec.TableScanOperator: Operator 0 TS initialized
> 2018-04-24T07:11:44,676 DEBUG [main] exec.TableScanOperator: Initializing children of 0 TS
> 2018-04-24T07:11:44,676 DEBUG [main] exec.HashTableSinkOperator: Initializing child 37 HASHTABLESINK
> 2018-04-24T07:11:44,676 INFO [main] exec.HashTableSinkOperator: Initializing operator HASHTABLESINK[37]
> 2018-04-24T07:11:44,677 INFO [main] mapjoin.MapJoinMemoryExhaustionHandler: JVM Max Heap Size: 477626368
> 2018-04-24T07:11:44,680 ERROR [main] mr.MapredLocalTask: Hive Runtime Error: Map local work failed
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:91) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:153) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:508) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:411) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:391) ~[hive-exec-2.3.3.jar:2.3.3]
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:764) ~[hive-exec-2.3.3.jar:2.3.3]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[hadoop-common-2.7.2.jar:?]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[hadoop-common-2.7.2.jar:?]
> {code}
> FYI: [~ekoifman], [~eugene.koifman]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)