You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark Snow <ma...@yahoo.com> on 2008/07/25 20:02:06 UTC
failure after importing 42million rows
I'm running a hbase data import on 0.1.3. After 42million rows, the import fails with an RPC timeout exception. I've tried twice- once on a 2 node cluster and once on a 10 node cluster (ec2 with the same configuration) and it failed both times in the same spot, somewhere between 42 and 43 million rows. Where should I look to debug this?
>From the hbase shell, I can query the table and see the rows have been inserted, but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I specified, so I'm suspicious it's not storing the data into dfs, and unsure where it is storing this data.
hbase root last log entries
2008-07-25 13:46:10,196 INFO org.apache.hadoop.hbase.HMaster: HMaster.rootScanner scanning meta region {regionname: -ROOT-,,0, startKey: <>, server: 10.254.171.22:60020}
2008-07-25 13:46:10,213 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.rootScanner regioninfo: {regionname: .META.,,1, startKey: <>, endKey: <>, encodedName: 1028785192, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 10.254.243.146:60020, startCode: 1216947114706
2008-07-25 13:46:10,214 INFO org.apache.hadoop.hbase.HMaster: HMaster.rootScanner scan of meta region {regionname: -ROOT-,,0, startKey: <>, server: 10.254.171.22:60020} complete
last log entries from one of the region servers
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region relations,,1216948402123. Current region memcache size 0.0
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Finished memcache flush for region relations,,1216948402123 in 0ms, sequence id=32
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegionServer: Compaction requested for region: relations,,1216948402123
2008-07-25 13:44:28,190 INFO org.apache.hadoop.hbase.HRegion: checking compaction on region relations,,1216948402123
2008-07-25 13:44:28,192 INFO org.apache.hadoop.hbase.HRegion: checking compaction completed on region relations,,1216948402123; status: false; 0sec
last lines from one of the data nodes
2008-07-25 10:10:33,398 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 3 msecs
2008-07-25 11:08:15,040 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs
2008-07-25 12:05:56,871 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs
2008-07-25 13:03:38,503 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs
The relvant portion of my hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://domU-12-31-39-00-E9-23:50001/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
Any ideas on where I can look to find an error message to help make sense of this?
Re: failure after importing 42million rows
Posted by stack <st...@duboce.net>.
Mark Snow wrote:
> I'm running a hbase data import on 0.1.3. After 42million rows, the import fails with an RPC timeout exception. I've tried twice- once on a 2 node cluster and once on a 10 node cluster (ec2 with the same configuration) and it failed both times in the same spot, somewhere between 42 and 43 million rows.
Small, medium, or X-large instances?
> Where should I look to debug this?
>
> >From the hbase shell, I can query the table and see the rows have been inserted, but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I specified, so I'm suspicious it's not storing the data into dfs, and unsure where it is storing this data.
>
The $HADOOP_HOME that you are running the 'hadoop dfs -ls' under has in
its conf file hdfs://domU-12-31-39-00-E9-23:5001/ as the fs.default.name?
Perhaps 'hadoop dfs -fs hdfs://domU-12-31-39-00-E9-23:50001/ -lsr
/hbase' works?
Otherwise, nothing untoward in what you sent in email. Whats the RPC
error you're seeing? Try things like upping your lease periods. Try
doubling hbase.regionserver.lease.period and hbase.master.lease.period.
Are you loading via MR or via a custom script? If the former, are
TaskTrackers running on all nodes beside Regionservers and Datanodes?
St.Ack