You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Nichole Treadway <kn...@gmail.com> on 2011/03/17 21:25:06 UTC
Fwd: FileNotFoundException in Reduce step when running HBase importtsv program

I sent this to the HBase mailing list, but thought I would also send this
here in case anyone has any idea what might be going on.

Thanks

---------- Forwarded message ----------
From: Nichole Treadway <kn...@gmail.com>
Date: Thu, Mar 17, 2011 at 2:58 PM
Subject: FileNotFoundException in Reduce step when running importtsv program
To: user <us...@hbase.apache.org>


Hi all,

I am attempting to bulk load data into HBase using the
importtsv<http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html>map-reduce
program. I have a very wide table (about 200 columns, 2 column
families), and right now I'm trying to load in data from a single data file
with 1 million rows.

Importtsv works fine for this data when I am writing directly to the table
(all map tasks, no reduce tasks). However, I would like the import to write
to an output file, using the '*importtsv.bulk.output*' option. I have
installed the HBase 1861 patch (
https://issues.apache.org/jira/browse/HBASE-1861) to allow bulk upload with
multi-column families.

When I run the bulk upload program with the output file option on my data,
it always fails in the reduce step. There are a large number of reduce tasks
(2956) that get created. These tasks all get to about 35% completion and
then fail with the following error:


2011-03-17 11:52:48,095 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.FileNotFoundException: File does not exist:
> hdfs://master:9000/awardsData/_temporary/_attempt_201103151859_0066_r_000000_0
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:468)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.getUniqueFile(StoreFile.java:580)
>  at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.writeMetaData(HFileOutputFormat.java:186)
> at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.close(HFileOutputFormat.java:247)
>  at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 2011-03-17 11:52:48,100 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task



I've put the full output of the reduce task attempt here:
http://pastebin.com/WMfqUwqC

I've tried running the program on a small table (3 column families,
inserting 3 values each for 1 million rows) and it works fine, though it
only creates 1 reduce task for this.

Any idea what the problem could be?

FYI, my cluster has 4 nodes all acting as datanodes/regionservers, running
on 64-bit Red Hat Linux. I'm running the hadoop-0.20-append branch, and for
hbase, the latest revision of the 0.90.2 branch.



Thanks for your help,
Nichole