You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2009/03/19 09:18:23 UTC
Region Servers are down
Hi, all
We are running a small cluster of Hbase 0.18.
Today the Hbase region servers were down.
They aborted approximately at the same time.
Has anybody run into a problem like that ?
See the exceptions below.
Thank you for your cooperation,
M.
region server 1:
--------------------
2009-03-19 00:31:12,105 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_6091846120190716081_2833042 bad datanode[1]
2009-03-19 00:31:12,105 FATAL
org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required.
Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: <region name>
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:967)
at org.apache.hadoop.hbase.regionserver.Flusher.flushRegion(Flusher.java:172)
at org.apache.hadoop.hbase.regionserver.Flusher.run(Flusher.java:90)
Caused by: java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
region server 2:
--------------------
2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
2009-03-19 00:35:03,336 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region <region name>
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
region server 3:
--------------------
2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
2009-03-19 00:35:03,336 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region <region name>
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
On region server #3 we noticed also the following errors before the abort:
2009-03-19 00:34:35,956 INFO org.apache.hadoop.dfs.DFSClient:
Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink <slave #2>:50010
Re: Region Servers are down
Posted by Jean-Daniel Cryans <jd...@apache.org>.
Michael,
Is there anything in the datanode logs around that time?
J-D
On Thu, Mar 19, 2009 at 4:18 AM, Michael Dagaev
<mi...@gmail.com> wrote:
> Hi, all
>
> We are running a small cluster of Hbase 0.18.
> Today the Hbase region servers were down.
> They aborted approximately at the same time.
>
> Has anybody run into a problem like that ?
> See the exceptions below.
>
> Thank you for your cooperation,
> M.
>
> region server 1:
> --------------------
>
> 2009-03-19 00:31:12,105 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_6091846120190716081_2833042 bad datanode[1]
> 2009-03-19 00:31:12,105 FATAL
> org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required.
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: <region name>
> at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
> at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:967)
> at org.apache.hadoop.hbase.regionserver.Flusher.flushRegion(Flusher.java:172)
> at org.apache.hadoop.hbase.regionserver.Flusher.run(Flusher.java:90)
> Caused by: java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> region server 2:
> --------------------
>
> 2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
> 2009-03-19 00:35:03,336 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> Compaction/Split failed for region <region name>
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> region server 3:
> --------------------
>
> 2009-03-19 00:35:03,334 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery for block blk_4372454425667060106_2834420 bad datanode[0]
> 2009-03-19 00:35:03,336 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> Compaction/Split failed for region <region name>
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> On region server #3 we noticed also the following errors before the abort:
>
> 2009-03-19 00:34:35,956 INFO org.apache.hadoop.dfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink <slave #2>:50010
>