You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2009/03/19 09:18:23 UTC

Region Servers are down

Hi, all

    We are running a small cluster of Hbase 0.18.
Today the Hbase region servers were down.
They aborted approximately at the same time.

Has anybody run into a problem like that ?
See the exceptions below.

Thank you for your cooperation,
M.

region server 1:
--------------------

2009-03-19 00:31:12,105 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_6091846120190716081_2833042 bad datanode[1]
2009-03-19 00:31:12,105 FATAL
org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required.
Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: <region name>
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:967)
        at org.apache.hadoop.hbase.regionserver.Flusher.flushRegion(Flusher.java:172)
        at org.apache.hadoop.hbase.regionserver.Flusher.run(Flusher.java:90)
Caused by: java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

region server 2:
--------------------

2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
2009-03-19 00:35:03,336 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region <region name>
java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

region server 3:
--------------------

2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
2009-03-19 00:35:03,336 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region <region name>
java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

On region server #3 we noticed also the following errors before the abort:

2009-03-19 00:34:35,956 INFO org.apache.hadoop.dfs.DFSClient:
Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink <slave #2>:50010

Re: Region Servers are down

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Michael,

Is there anything in the datanode logs around that time?

J-D

On Thu, Mar 19, 2009 at 4:18 AM, Michael Dagaev
<mi...@gmail.com> wrote:
> Hi, all
>
>    We are running a small cluster of Hbase 0.18.
> Today the Hbase region servers were down.
> They aborted approximately at the same time.
>
> Has anybody run into a problem like that ?
> See the exceptions below.
>
> Thank you for your cooperation,
> M.
>
> region server 1:
> --------------------
>
> 2009-03-19 00:31:12,105 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_6091846120190716081_2833042 bad datanode[1]
> 2009-03-19 00:31:12,105 FATAL
> org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required.
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: <region name>
>        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
>        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:967)
>        at org.apache.hadoop.hbase.regionserver.Flusher.flushRegion(Flusher.java:172)
>        at org.apache.hadoop.hbase.regionserver.Flusher.run(Flusher.java:90)
> Caused by: java.io.IOException: Could not get block locations. Aborting...
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> region server 2:
> --------------------
>
> 2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
> 2009-03-19 00:35:03,336 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> Compaction/Split failed for region <region name>
> java.io.IOException: Could not get block locations. Aborting...
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> region server 3:
> --------------------
>
> 2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
> 2009-03-19 00:35:03,336 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> Compaction/Split failed for region <region name>
> java.io.IOException: Could not get block locations. Aborting...
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> On region server #3 we noticed also the following errors before the abort:
>
> 2009-03-19 00:34:35,956 INFO org.apache.hadoop.dfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink <slave #2>:50010
>