You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mark Snow <ma...@yahoo.com> on 2008/07/25 20:02:06 UTC

failure after importing 42million rows

I'm running a hbase data import on 0.1.3. After 42million rows, the import fails with an RPC timeout exception. I've tried twice- once on a 2 node cluster and once on a 10 node cluster (ec2 with the same configuration) and it failed both times in the same spot, somewhere between 42 and 43 million rows. Where should I look to debug this?

>From the hbase shell, I can query the table and see the rows have been inserted, but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I specified, so I'm suspicious it's not storing the data into dfs, and unsure where it is storing this data.

hbase root last log entries
2008-07-25 13:46:10,196 INFO org.apache.hadoop.hbase.HMaster: HMaster.rootScanner scanning meta region {regionname: -ROOT-,,0, startKey: <>, server: 10.254.171.22:60020}
2008-07-25 13:46:10,213 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.rootScanner regioninfo: {regionname: .META.,,1, startKey: <>, endKey: <>, encodedName: 1028785192, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 10.254.243.146:60020, startCode: 1216947114706
2008-07-25 13:46:10,214 INFO org.apache.hadoop.hbase.HMaster: HMaster.rootScanner scan of meta region {regionname: -ROOT-,,0, startKey: <>, server: 10.254.171.22:60020} complete

last log entries from one of the region servers
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region relations,,1216948402123. Current region memcache size 0.0
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Finished memcache flush for region relations,,1216948402123 in 0ms, sequence id=32
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegionServer: Compaction requested for region: relations,,1216948402123
2008-07-25 13:44:28,190 INFO org.apache.hadoop.hbase.HRegion: checking compaction on region relations,,1216948402123
2008-07-25 13:44:28,192 INFO org.apache.hadoop.hbase.HRegion: checking compaction completed on region relations,,1216948402123; status: false; 0sec

last lines from one of the data nodes
2008-07-25 10:10:33,398 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 3 msecs
2008-07-25 11:08:15,040 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs
2008-07-25 12:05:56,871 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs
2008-07-25 13:03:38,503 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28 blocks got processed in 2 msecs

The relvant portion of my hbase-site.xml
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://domU-12-31-39-00-E9-23:50001/hbase</value>
    <description>The directory shared by region servers.
    </description>
  </property>


Any ideas on where I can look to find an error message to help make sense of this?

Re: failure after importing 42million rows

Posted by stack <st...@duboce.net>.

Mark Snow wrote:
> I'm running a hbase data import on 0.1.3. After 42million rows, the import fails with an RPC timeout exception. I've tried twice- once on a 2 node cluster and once on a 10 node cluster (ec2 with the same configuration) and it failed both times in the same spot, somewhere between 42 and 43 million rows. 
Small, medium, or X-large instances?

> Where should I look to debug this?
>
> >From the hbase shell, I can query the table and see the rows have been inserted, but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I specified, so I'm suspicious it's not storing the data into dfs, and unsure where it is storing this data.
>   

The $HADOOP_HOME that you are running the 'hadoop dfs -ls' under has in 
its conf file hdfs://domU-12-31-39-00-E9-23:5001/ as the  fs.default.name?

Perhaps 'hadoop  dfs -fs hdfs://domU-12-31-39-00-E9-23:50001/ -lsr 
/hbase' works?

Otherwise, nothing untoward in what you sent in email.  Whats the RPC 
error you're seeing?  Try things like upping your lease periods.    Try 
doubling hbase.regionserver.lease.period and hbase.master.lease.period.  
Are you loading via MR or via a custom script?  If the former, are 
TaskTrackers running on all nodes beside Regionservers and Datanodes?

St.Ack