You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Tim Robertson (JIRA)" <ji...@apache.org> on 2016/04/18 21:17:25 UTC
[jira] [Created] (HIVE-13539) HiveHFileOutputFormat searching the
wrong directory for HFiles?
Tim Robertson created HIVE-13539:
------------------------------------
Summary: HiveHFileOutputFormat searching the wrong directory for HFiles?
Key: HIVE-13539
URL: https://issues.apache.org/jira/browse/HIVE-13539
Project: Hive
Issue Type: Bug
Components: HBase Handler
Affects Versions: 1.1.0
Environment: Built into CDH 5.4.7
Reporter: Tim Robertson
Assignee: Sushanth Sowmyan
Priority: Blocker
When creating HFiles for a bulkload in HBase I believe it is looking in the wrong directory to find the HFiles, resulting in the following exception:
{code}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more
Caused by: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
... 11 more
{code}
The issue is that is looks for the HFiles in {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}} when I believe it should be looking in the task attempt subfolder, such as {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_000000_1000}}.
This can be reproduced in any HBase load such as:
{code:sql}
CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping' = ':key,o:x,o:y',
'hbase.table.default.storage.type' = 'binary');
SET hfile.family.path=/tmp/coords_hfiles/o;
SET hive.hbase.generatehfiles=true;
INSERT OVERWRITE TABLE coords_hbase
SELECT id, decimalLongitude, decimalLatitude
FROM source
CLUSTER BY id;
{code}
Any advice greatly appreciated
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)