You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ben West <bw...@yahoo.com> on 2011/11/17 19:59:43 UTC

Hive HBase wiki

Hey all,

I'm having some trouble with the HBase bulk load, following the instructions from https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad. In the last step ("Sort Data") I get:

java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:311)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:171)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:642)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
... 7 more
Caused by: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$2.close(HiveHFileOutputFormat.java:144)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168)
... 11 more

When I look at the source of HiveHFileOutputFormat.java it has:

// Move the region file(s) from the task output directory
// to the location specified by the user.  There should
// actually only be one (each reducer produces one HFile),
// but we don't know what its name is.
FileSystem fs = outputdir.getFileSystem(jc);
fs.mkdirs(columnFamilyPath);
Path srcDir = outputdir;
for (;;) {
FileStatus [] files = fs.listStatus(srcDir);
if ((files == null) || (files.length == 0)) {
throw new IOException("No files found in " + srcDir);
}

So I am getting the issue where the "task output directory" is empty. I assume this is because the earlier task failed, but I'm not sure how to check this.

Does anyone know what is going on or how I can find the error log of whatever was supposed to populate this directory?

Thanks!
-Ben

Re: Hive HBase wiki

Posted by Ben West <bw...@yahoo.com>.
Thanks John. I tried it again and the error didn't occur. So who knows.

Now that I've got a full run through i'll try to update the wiki with what I needed.

I'm currently doing a prototype, but if we move forward I'll look more into 2365. This current method is, as you point out, not great :-)



----- Original Message -----
From: John Sichi <js...@fb.com>
To: "<us...@hive.apache.org>" <us...@hive.apache.org>; Ben West <bw...@yahoo.com>
Cc: 
Sent: Thursday, November 17, 2011 3:56 PM
Subject: Re: Hive HBase wiki

It has been quite a while since those instructions were written, so maybe something has broken.  There is a unit test for it (hbase-handler/src/test/queries/hbase_bulk.m) which is still passing.  

If you're running via CLI, logs by default go in /tmp/<username> 

Long-term, energy best expended on this would go here:

https://issues.apache.org/jira/browse/HIVE-2365

JVS

On Nov 17, 2011, at 10:59 AM, Ben West wrote:

> Hey all,
> 
> I'm having some trouble with the HBase bulk load, following the instructions from https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad. In the last step ("Sort Data") I get:
> 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:311)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:171)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:642)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
> ... 7 more
> Caused by: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$2.close(HiveHFileOutputFormat.java:144)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168)
> ... 11 more
> 
> When I look at the source of HiveHFileOutputFormat.java it has:
> 
> // Move the region file(s) from the task output directory
> // to the location specified by the user.  There should
> // actually only be one (each reducer produces one HFile),
> // but we don't know what its name is.
> FileSystem fs = outputdir.getFileSystem(jc);
> fs.mkdirs(columnFamilyPath);
> Path srcDir = outputdir;
> for (;;) {
> FileStatus [] files = fs.listStatus(srcDir);
> if ((files == null) || (files.length == 0)) {
> throw new IOException("No files found in " + srcDir);
> }
> 
> So I am getting the issue where the "task output directory" is empty. I assume this is because the earlier task failed, but I'm not sure how to check this.
> 
> Does anyone know what is going on or how I can find the error log of whatever was supposed to populate this directory?
> 
> Thanks!
> -Ben

Re: Hive HBase wiki

Posted by John Sichi <js...@fb.com>.
It has been quite a while since those instructions were written, so maybe something has broken.  There is a unit test for it (hbase-handler/src/test/queries/hbase_bulk.m) which is still passing.  

If you're running via CLI, logs by default go in /tmp/<username> 

Long-term, energy best expended on this would go here:

https://issues.apache.org/jira/browse/HIVE-2365

JVS

On Nov 17, 2011, at 10:59 AM, Ben West wrote:

> Hey all,
> 
> I'm having some trouble with the HBase bulk load, following the instructions from https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad. In the last step ("Sort Data") I get:
> 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:311)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:171)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:642)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
> ... 7 more
> Caused by: java.io.IOException: No files found in hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2
> at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$2.close(HiveHFileOutputFormat.java:144)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168)
> ... 11 more
> 
> When I look at the source of HiveHFileOutputFormat.java it has:
> 
> // Move the region file(s) from the task output directory
> // to the location specified by the user.  There should
> // actually only be one (each reducer produces one HFile),
> // but we don't know what its name is.
> FileSystem fs = outputdir.getFileSystem(jc);
> fs.mkdirs(columnFamilyPath);
> Path srcDir = outputdir;
> for (;;) {
> FileStatus [] files = fs.listStatus(srcDir);
> if ((files == null) || (files.length == 0)) {
> throw new IOException("No files found in " + srcDir);
> }
> 
> So I am getting the issue where the "task output directory" is empty. I assume this is because the earlier task failed, but I'm not sure how to check this.
> 
> Does anyone know what is going on or how I can find the error log of whatever was supposed to populate this directory?
> 
> Thanks!
> -Ben