You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by schubert zhang <zs...@gmail.com> on 2009/03/19 11:43:24 UTC

NotReplicatedYetException, LeaseExpiredException and one RegionServer Down

Report two issues I have met:

Testbed: 1 master + 3 slaves.

1. after long time running (batch insert data by mapreduce), sometimes I saw
following WARN.
    Is it caused by network issue or other?

2009-03-19 13:33:43,630 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237440285652 whose
highest sequence/edit id is 301219308
2009-03-19 13:33:52,688 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([B@29a925de,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@1ff52730) from 10.24.1.14:40269:
output error
2009-03-19 13:33:52,746 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 11 on 60020 caught: java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(Unknown
Source)
        at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(Unknown
Source)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(Unknown
Source)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(Unknown
Source)

2. Maybe following is caused by DFS issue? Since my cluster is too small, I
config the replication factor = 2

2009-03-19 13:37:43,232 INFO org.apache.hadoop.hdfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)

        at org.apache.hadoop.ipc.Client.call(Unknown Source)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
Source)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
Source)
        at $Proxy1.addBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
Source)

2009-03-19 13:37:43,232 WARN org.apache.hadoop.hdfs.DFSClient:
NotReplicatedYetException sleeping
/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021 retries
left 4
2009-03-19 13:37:43,637 INFO org.apache.hadoop.hdfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)

        at org.apache.hadoop.ipc.Client.call(Unknown Source)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
Source)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
Source)
        at $Proxy1.addBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
Source)


3. 2009-03-19 16:33:55,254 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Cannot open filename
/hbase/CDR/1546147041/wap/mapfiles/6210665186830995164/data
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(Unknown
Source)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at org.apache.hadoop.hbase.io.DataOutputBuffer$Buffer.write(Unknown
Source)
        at org.apache.hadoop.hbase.io.DataOutputBuffer.write(Unknown Source)
        at org.apache.hadoop.hbase.io.SequenceFile$Reader.readBuffer(Unknown
Source)
        at
org.apache.hadoop.hbase.io.SequenceFile$Reader.seekToCurrentValue(Unknown
Source)
        at
org.apache.hadoop.hbase.io.SequenceFile$Reader.getCurrentValue(Unknown
Source)
        at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(Unknown
Source)
        at org.apache.hadoop.hbase.io.MapFile$Reader.next(Unknown Source)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
Source)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
Source)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)

2009-03-19 16:33:55,800 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
failed for region CDR,13776342680@2009-03-19 10:24:56.416,1237438713117
java.io.IOException: java.io.IOException: Could not complete write to file
/hbase/CDR/compaction.dir/1546147041/wap/mapfiles/4755271422260171357/data
by DFSClient_-577718137
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Unknown
Source)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(Unknown
Source)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(Unknown
Source)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)
2009-03-19 16:33:55,801 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting


4. and then, after many exceptions like 2 NotReplicatedYetException
   and then

2009-03-19 16:34:30,104 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2009-03-19 16:34:30,105 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
block locations. Source file
"/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index"
- Aborting...
2009-03-19 16:34:30,105 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
closing file
/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index :
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
File does not exist. Holder DFSClient_-577718137 does not have any open
files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
File does not exist. Holder DFSClient_-577718137 does not have any open
files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)

        at org.apache.hadoop.ipc.Client.call(Unknown Source)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
Source)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
Source)
        at $Proxy1.addBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
Source)
2009-03-19 16:34:31,061 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/0:0:0:0:0:0:0:0:60020 exiting
2009-03-19 16:34:32,527 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2009-03-19 16:34:32,528 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete

The region server down.

Re: NotReplicatedYetException, LeaseExpiredException and one RegionServer Down

Posted by schubert zhang <zs...@gmail.com>.
Yes, it is a so small cluster, since my tow other nodes are borrowed by
another application test.Now, I stop the whole hbase, add tow nodes and
re-start it. After about 10 minutes, every thing seems ok, and it seems data
is safe.

Schubert

On Thu, Mar 19, 2009 at 9:04 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I've seen HDFS being overwhelmed on very small clusters, like yours,
> the way you describe it. The region server probably shut down to not
> make things worse with the data.
>
> J-D
>
> On Thu, Mar 19, 2009 at 6:43 AM, schubert zhang <zs...@gmail.com> wrote:
> > Report two issues I have met:
> >
> > Testbed: 1 master + 3 slaves.
> >
> > 1. after long time running (batch insert data by mapreduce), sometimes I
> saw
> > following WARN.
> >    Is it caused by network issue or other?
> >
> > 2009-03-19 13:33:43,630 INFO org.apache.hadoop.hbase.regionserver.HLog:
> > removing old log file
> > /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237440285652 whose
> > highest sequence/edit id is 301219308
> > 2009-03-19 13:33:52,688 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > Responder, call batchUpdates([B@29a925de,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@1ff52730) from
> 10.24.1.14:40269:
> > output error
> > 2009-03-19 13:33:52,746 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 11 on 60020 caught: java.nio.channels.ClosedChannelException
> >        at
> > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
> >        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(Unknown
> > Source)
> >        at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(Unknown
> Source)
> >        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(Unknown
> > Source)
> >
> > 2. Maybe following is caused by DFS issue? Since my cluster is too small,
> I
> > config the replication factor = 2
> >
> > 2009-03-19 13:37:43,232 INFO org.apache.hadoop.hdfs.DFSClient:
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> > replicated
> > yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
> >        at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> > Source)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> > Source)
> >        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
> >        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
> >
> >        at org.apache.hadoop.ipc.Client.call(Unknown Source)
> >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> > Source)
> >        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> > Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> > Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown
> Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> > Source)
> >
> > 2009-03-19 13:37:43,232 WARN org.apache.hadoop.hdfs.DFSClient:
> > NotReplicatedYetException sleeping
> > /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021 retries
> > left 4
> > 2009-03-19 13:37:43,637 INFO org.apache.hadoop.hdfs.DFSClient:
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> > replicated
> > yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
> >        at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> > Source)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> > Source)
> >        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
> >        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
> >
> >        at org.apache.hadoop.ipc.Client.call(Unknown Source)
> >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> > Source)
> >        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> > Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> > Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown
> Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> > Source)
> >
> >
> > 3. 2009-03-19 16:33:55,254 WARN org.apache.hadoop.hdfs.DFSClient: DFS
> Read:
> > java.io.IOException: Cannot open filename
> > /hbase/CDR/1546147041/wap/mapfiles/6210665186830995164/data
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(Unknown
> Source)
> >        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(Unknown
> > Source)
> >        at java.io.DataInputStream.readFully(DataInputStream.java:178)
> >        at
> org.apache.hadoop.hbase.io.DataOutputBuffer$Buffer.write(Unknown
> > Source)
> >        at org.apache.hadoop.hbase.io.DataOutputBuffer.write(Unknown
> Source)
> >        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.readBuffer(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.io.SequenceFile$Reader.seekToCurrentValue(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.io.SequenceFile$Reader.getCurrentValue(Unknown
> > Source)
> >        at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(Unknown
> > Source)
> >        at org.apache.hadoop.hbase.io.MapFile$Reader.next(Unknown Source)
> >        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> > Source)
> >        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown
> Source)
> >        at
> > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown
> Source)
> >        at
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown
> Source)
> >
> > 2009-03-19 16:33:55,800 ERROR
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> > failed for region CDR,13776342680@2009-03-19 10:24:56.416,1237438713117
> > java.io.IOException: java.io.IOException: Could not complete write to
> file
> >
> /hbase/CDR/compaction.dir/1546147041/wap/mapfiles/4755271422260171357/data
> > by DFSClient_-577718137
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(Unknown
> > Source)
> >        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
> >        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
> >
> >        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >        at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> >        at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >        at
> >
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown
> Source)
> > 2009-03-19 16:33:55,801 INFO
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
> >
> >
> > 4. and then, after many exceptions like 2 NotReplicatedYetException
> >   and then
> >
> > 2009-03-19 16:34:30,104 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block null bad datanode[0] nodes == null
> > 2009-03-19 16:34:30,105 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get
> > block locations. Source file
> >
> "/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index"
> > - Aborting...
> > 2009-03-19 16:34:30,105 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
> > closing file
> >
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index :
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> >
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> > File does not exist. Holder DFSClient_-577718137 does not have any open
> > files.
> >        at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> > Source)
> >        at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> > Source)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> > Source)
> >        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
> >        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
> >
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> >
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> > File does not exist. Holder DFSClient_-577718137 does not have any open
> > files.
> >        at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> > Source)
> >        at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> > Source)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> > Source)
> >        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
> >        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
> >
> >        at org.apache.hadoop.ipc.Client.call(Unknown Source)
> >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> > Source)
> >        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> > Source)
> >        at $Proxy1.addBlock(Unknown Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> > Source)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> > Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown
> Source)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> > Source)
> > 2009-03-19 16:34:31,061 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > regionserver/0:0:0:0:0:0:0:0:60020 exiting
> > 2009-03-19 16:34:32,527 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> > thread.
> > 2009-03-19 16:34:32,528 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
> complete
> >
> > The region server down.
> >
>

Re: NotReplicatedYetException, LeaseExpiredException and one RegionServer Down

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I've seen HDFS being overwhelmed on very small clusters, like yours,
the way you describe it. The region server probably shut down to not
make things worse with the data.

J-D

On Thu, Mar 19, 2009 at 6:43 AM, schubert zhang <zs...@gmail.com> wrote:
> Report two issues I have met:
>
> Testbed: 1 master + 3 slaves.
>
> 1. after long time running (batch insert data by mapreduce), sometimes I saw
> following WARN.
>    Is it caused by network issue or other?
>
> 2009-03-19 13:33:43,630 INFO org.apache.hadoop.hbase.regionserver.HLog:
> removing old log file
> /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237440285652 whose
> highest sequence/edit id is 301219308
> 2009-03-19 13:33:52,688 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call batchUpdates([B@29a925de,
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@1ff52730) from 10.24.1.14:40269:
> output error
> 2009-03-19 13:33:52,746 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 11 on 60020 caught: java.nio.channels.ClosedChannelException
>        at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(Unknown
> Source)
>        at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(Unknown Source)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(Unknown
> Source)
>
> 2. Maybe following is caused by DFS issue? Since my cluster is too small, I
> config the replication factor = 2
>
> 2009-03-19 13:37:43,232 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated
> yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>        at org.apache.hadoop.ipc.Client.call(Unknown Source)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
>
> 2009-03-19 13:37:43,232 WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021 retries
> left 4
> 2009-03-19 13:37:43,637 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated
> yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>        at org.apache.hadoop.ipc.Client.call(Unknown Source)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
>
>
> 3. 2009-03-19 16:33:55,254 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/CDR/1546147041/wap/mapfiles/6210665186830995164/data
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(Unknown
> Source)
>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
>        at org.apache.hadoop.hbase.io.DataOutputBuffer$Buffer.write(Unknown
> Source)
>        at org.apache.hadoop.hbase.io.DataOutputBuffer.write(Unknown Source)
>        at org.apache.hadoop.hbase.io.SequenceFile$Reader.readBuffer(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.seekToCurrentValue(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.getCurrentValue(Unknown
> Source)
>        at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(Unknown
> Source)
>        at org.apache.hadoop.hbase.io.MapFile$Reader.next(Unknown Source)
>        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> Source)
>        at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
>        at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)
>
> 2009-03-19 16:33:55,800 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region CDR,13776342680@2009-03-19 10:24:56.416,1237438713117
> java.io.IOException: java.io.IOException: Could not complete write to file
> /hbase/CDR/compaction.dir/1546147041/wap/mapfiles/4755271422260171357/data
> by DFSClient_-577718137
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(Unknown
> Source)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>        at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(Unknown
> Source)
>        at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)
> 2009-03-19 16:33:55,801 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
>
>
> 4. and then, after many exceptions like 2 NotReplicatedYetException
>   and then
>
> 2009-03-19 16:34:30,104 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2009-03-19 16:34:30,105 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
> block locations. Source file
> "/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index"
> - Aborting...
> 2009-03-19 16:34:30,105 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
> closing file
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index :
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> File does not exist. Holder DFSClient_-577718137 does not have any open
> files.
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> File does not exist. Holder DFSClient_-577718137 does not have any open
> files.
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>        at org.apache.hadoop.ipc.Client.call(Unknown Source)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>        at $Proxy1.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
> 2009-03-19 16:34:31,061 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver/0:0:0:0:0:0:0:0:60020 exiting
> 2009-03-19 16:34:32,527 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2009-03-19 16:34:32,528 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>
> The region server down.
>

Re: NotReplicatedYetException, LeaseExpiredException and one RegionServer Down

Posted by schubert zhang <zs...@gmail.com>.
and then, after I re-start this region server, the meta table wrong.
2009-03-19 18:46:28,104 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 9 region(s) that are in transition states
2009-03-19 18:46:29,368 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.12:60020}
2009-03-19 18:46:30,363 WARN org.apache.hadoop.hbase.master.BaseScanner:
Scan one META region: {regionname: .META.,,1, startKey: <>, server:
10.24.1.12:60020}
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(Unknown Source)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(Unknown
Source)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(Unknown
Source)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Unknown
Source)
        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(Unknown
Source)
        at
org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(Unknown Source)
        at
org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(Unknown Source)
        at org.apache.hadoop.hbase.master.BaseScanner.chore(Unknown Source)
        at org.apache.hadoop.hbase.Chore.run(Unknown Source)
2009-03-19 18:46:30,365 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-03-19 18:46:31,108 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:34,112 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:37,116 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:40,120 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:43,123 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:46,127 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:46,154 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
CDR,13875542308@2009-03-1911:12:47.142,1237459467379: safeMode=false
from
10.24.1.14:60020
2009-03-19 18:46:49,130 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:52,134 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:55,138 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:46:58,142 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:01,146 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:04,150 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:07,154 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:10,158 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:13,162 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:16,166 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:19,170 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:22,173 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:25,176 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:28,179 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:29,378 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.12:60020}
2009-03-19 18:47:31,183 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:34,187 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:37,191 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:40,194 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:43,198 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:46,201 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:49,205 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states
2009-03-19 18:47:52,208 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 10 region(s) that are in transition states

and the HBase is unaccasbale.

On Thu, Mar 19, 2009 at 6:43 PM, schubert zhang <zs...@gmail.com> wrote:

> Report two issues I have met:
>
> Testbed: 1 master + 3 slaves.
>
> 1. after long time running (batch insert data by mapreduce), sometimes I
> saw following WARN.
>     Is it caused by network issue or other?
>
> 2009-03-19 13:33:43,630 INFO org.apache.hadoop.hbase.regionserver.HLog:
> removing old log file
> /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237440285652 whose
> highest sequence/edit id is 301219308
> 2009-03-19 13:33:52,688 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call batchUpdates([B@29a925de,
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@1ff52730) from 10.24.1.14:40269:
> output error
> 2009-03-19 13:33:52,746 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 11 on 60020 caught: java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(Unknown
> Source)
>         at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(Unknown
> Source)
>
> 2. Maybe following is caused by DFS issue? Since my cluster is too small, I
> config the replication factor = 2
>
> 2009-03-19 13:37:43,232 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated
> yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>         at org.apache.hadoop.ipc.Client.call(Unknown Source)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
>
> 2009-03-19 13:37:43,232 WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021 retries
> left 4
> 2009-03-19 13:37:43,637 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated
> yet:/hbase/log_10.24.1.14_1237428361729_60020/hlog.dat.1237441061021
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>         at org.apache.hadoop.ipc.Client.call(Unknown Source)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
>
>
> 3. 2009-03-19 16:33:55,254 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/CDR/1546147041/wap/mapfiles/6210665186830995164/data
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(Unknown
> Source)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at org.apache.hadoop.hbase.io.DataOutputBuffer$Buffer.write(Unknown
> Source)
>         at org.apache.hadoop.hbase.io.DataOutputBuffer.write(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.readBuffer(Unknown Source)
>         at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.seekToCurrentValue(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.getCurrentValue(Unknown
> Source)
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(Unknown
> Source)
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(Unknown Source)
>         at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> Source)
>         at org.apache.hadoop.hbase.regionserver.HStore.compact(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(Unknown Source)
>         at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)
>
> 2009-03-19 16:33:55,800 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region CDR,13776342680@2009-03-19 10:24:56.416,1237438713117
> java.io.IOException: java.io.IOException: Could not complete write to file
> /hbase/CDR/compaction.dir/1546147041/wap/mapfiles/4755271422260171357/data
> by DFSClient_-577718137
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(Unknown
> Source)
>         at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(Unknown
> Source)
>         at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(Unknown Source)
> 2009-03-19 16:33:55,801 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
>
>
> 4. and then, after many exceptions like 2 NotReplicatedYetException
>    and then
>
> 2009-03-19 16:34:30,104 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2009-03-19 16:34:30,105 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index"
> - Aborting...
> 2009-03-19 16:34:30,105 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
> closing file
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index :
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> File does not exist. Holder DFSClient_-577718137 does not have any open
> files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/CDR/compaction.dir/574480317/wap/mapfiles/6533441212405796349/index
> File does not exist. Holder DFSClient_-577718137 does not have any open
> files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
> Source)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
>
>         at org.apache.hadoop.ipc.Client.call(Unknown Source)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(Unknown Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
>         at $Proxy1.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(Unknown
> Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(Unknown
> Source)
> 2009-03-19 16:34:31,061 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver/0:0:0:0:0:0:0:0:60020 exiting
> 2009-03-19 16:34:32,527 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2009-03-19 16:34:32,528 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>
> The region server down.
>
>