You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Tarnas <cf...@email.com> on 2012/04/14 18:49:25 UTC

manually splitting Hlogs

Hello all,

We had a node die on us, and the master could not recover the HLogs due to timeout issues, but HBase stayed up and all of the regions were re-assigned. The node crashed hard (IT is investigating why) so we were not able to just restart it.

I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split  and didn't see any notes about not running on a live cluster so I ran it and it ran fine.  Was it safe to run with hbase up? Were the newly created files correctly added to the existing regions?

thanks,
-chris

Re: manually splitting Hlogs

Posted by Stack <st...@duboce.net>.
And what as the timeout issue?
St.Ack

On Sat, Apr 14, 2012 at 3:23 PM, Stack <st...@duboce.net> wrote:
> On Sat, Apr 14, 2012 at 9:49 AM, Chris Tarnas <cf...@email.com> wrote:
>> I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split  and didn't see any notes about not running on a live cluster so I ran it and it ran fine.  Was it safe to run with hbase up? Were the newly created files correctly added to the existing regions?
>>
>
> Should be fine w/ hbase up -- thats how the split is usually done.
>
> The newly created files were not added to the regions is my guess
> since we only check for their presence on region open.
>
> Can you see what new files were made and where?  Reassign those
> regions and that should pick up the edits made by your split.
>
> What version of hbase Chris?
>
> St.Ack

Re: manually splitting Hlogs

Posted by Chris Tarnas <cf...@email.com>.
Hi Stack,

Thanks, I have all the regions picked up now.

This particular cluster is on CDH3b4 (long story) but it is slated to be upgraded next week.

Here is a clip from the master log on the timeout:

2012-04-14 09:44:24,110 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-4506327502711501968_8582436 failed  because recovery from primary datanode 192.168.1.12:50010 failed 1 times.  Pipeline was 192.168.1.24:50010, 192.168.1.12:50010, 192.168.1.22:50010. Will retry...
2012-04-14 09:45:25,118 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode 192.168.1.12:50010
java.net.SocketTimeoutException: Call to /192.168.1.12:50020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.31:35014 remote=/192.168.1.12:50020]


The corresponding data node, 192.168.1.12,  had this:

2012-04-14 09:44:24,171 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_-4506327502711501968_8582436) from datanode (=192.168.1.22:50010)
java.net.SocketTimeoutException: Call to /192.168.1.22:50020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.12:36626 remote=/192.168.1.22:50020]

192.168.1.22 was the node that died. I can put more logs in paste bin if needed.

thanks,
-chris


On Apr 14, 2012, at 3:23 PM, Stack wrote:

> On Sat, Apr 14, 2012 at 9:49 AM, Chris Tarnas <cf...@email.com> wrote:
>> I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split  and didn't see any notes about not running on a live cluster so I ran it and it ran fine.  Was it safe to run with hbase up? Were the newly created files correctly added to the existing regions?
>> 
> 
> Should be fine w/ hbase up -- thats how the split is usually done.
> 
> The newly created files were not added to the regions is my guess
> since we only check for their presence on region open.
> 
> Can you see what new files were made and where?  Reassign those
> regions and that should pick up the edits made by your split.
> 
> What version of hbase Chris?
> 
> St.Ack


Re: manually splitting Hlogs

Posted by Stack <st...@duboce.net>.
On Sat, Apr 14, 2012 at 9:49 AM, Chris Tarnas <cf...@email.com> wrote:
> I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split  and didn't see any notes about not running on a live cluster so I ran it and it ran fine.  Was it safe to run with hbase up? Were the newly created files correctly added to the existing regions?
>

Should be fine w/ hbase up -- thats how the split is usually done.

The newly created files were not added to the regions is my guess
since we only check for their presence on region open.

Can you see what new files were made and where?  Reassign those
regions and that should pick up the edits made by your split.

What version of hbase Chris?

St.Ack