You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alok Singh <al...@urbanairship.com> on 2011/12/30 00:11:42 UTC

NotReplicatedYetException when gracefully shutting down a region server.

When attempting to gracefully shutdown a regionserver, I saw a couple
of NotReplicatedYet exceptions in the logs (below). Can't find the
file that is causing this exception in on the HDFS filesystem. Have we
potentially lost the data, or is this exception benign?


Alok

hbase: 0.90.3
hadoop: 0.20.2-cdh3u0

Gracefull shutdown process:
hbase(main):001:0> balance_switch false

hbase org.jruby.Main current/bin/region_mover.rb unload <IP>

After the region count for the regionserver is 0, we kill the
regionserver process.

------------------Log-------------------------
2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
7040:HRegionServer@2142] - Received close region:
rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
2011-12-29 17:11:03,787 - WARN
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
using octal or symbolic umask specifications.
2011-12-29 17:11:03,803 - WARN
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
using octal or symbolic umask specifications.
2011-12-29 17:11:03,911 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
- Bloom added to HFile
(hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
7.1k, 5225/5276 (99%)
2011-12-29 17:11:03,944 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
2011-12-29 17:11:03,961 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
- Loaded col bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
2011-12-29 17:11:03,961 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
2011-12-29 17:11:03,964 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
7040:HRegionServer@2142] - Received close region:
rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
2011-12-29 17:11:05,716 - WARN
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
using octal or symbolic umask specifications.
2011-12-29 17:11:05,735 - WARN
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
using octal or symbolic umask specifications.
2011-12-29 17:11:05,808 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
- Bloom added to HFile
(hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
6.5k, 4727/4737 (100%)
2011-12-29 17:11:05,834 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
2011-12-29 17:11:05,852 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
- Loaded col bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
2011-12-29 17:11:05,852 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
2011-12-29 17:11:05,856 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
2011-12-29 17:14:58,942 - INFO
[Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-15,5,main]
2011-12-29 17:14:58,942 - INFO
[Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
hook
2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
Stopping server on 7040
2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
exiting
2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
2011-12-29 17:14:59,480 - INFO  [IPC Server
Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
Stopping infoServer
2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
Stopped SelectChannelConnector@0.0.0.0:7041
2011-12-29 17:14:59,517 - INFO
[regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
2011-12-29 17:14:59,517 - INFO
[regionserver7040.cacheFlusher:MemStoreFlusher@266] -
regionserver7040.cacheFlusher exiting
2011-12-29 17:14:59,517 - INFO
[regionserver7040.logSyncer:HLog$LogSyncer@966] -
regionserver7040.logSyncer exiting
2011-12-29 17:14:59,518 - INFO
[regionserver7040.majorCompactionChecker:Chore@79] -
regionserver7040.majorCompactionChecker exiting
2011-12-29 17:14:59,518 - INFO
[regionserver7040.compactor:CompactSplitThread@113] -
regionserver7040.compactor exiting
2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
stopping server at: 10.129.1.230,7040,1323990853621
2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
regionserver7040 closing leases
2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
regionserver7040 closed leases
2011-12-29 17:14:59,891 - INFO
[regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
Closed zookeeper sessionid=0x4343f266cbc0012
2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
Session: 0x4343f266cbc0012 closed
2011-12-29 17:14:59,894 - INFO
[main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
Session: 0x1343f2669140013 closed
2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
regionserver7040 exiting
2011-12-29 17:14:59,899 - INFO
[regionserver7040-EventThread:ClientCnxn$EventThread@520] -
EventThread shut down
2011-12-29 17:14:59,900 - INFO
[Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
Starting fs shutdown hook thread.
2011-12-29 17:14:59,900 - ERROR
[Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
	at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
	at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

	at org.apache.hadoop.ipc.Client.call(Client.java:1104)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
	at $Proxy5.addBlock(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy5.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
2011-12-29 17:14:59,913 - INFO
[Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
Shutdown hook finished.
------------------------

Re: NotReplicatedYetException when gracefully shutting down a region server.

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's very interesting, did you look at that region server's log to
see what was going on when creating that file? Normally if a region
server encounters a FATAL fs error (like not being able to create a
file) it will abort itself.

J-D

On Tue, Jan 3, 2012 at 4:14 PM, Alok Singh <al...@urbanairship.com> wrote:
> J-D,
>
> On the regionserver, I see these lines:
> -------------
> 2011-12-28 18:56:37,821 - INFO
> [regionserver7040.cacheFlusher:Store@494] - Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/745069841898422651
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
> 2011-12-28 18:56:37,867 - INFO
> [regionserver7040.cacheFlusher:StoreFile$Reader@1027] - Loaded row
> bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
> 2011-12-28 18:56:37,867 - INFO
> [regionserver7040.cacheFlusher:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228,
> entries=15990, sequenceid=1213107064, memsize=3.8m, filesize=200.1k
> 2011-12-28 19:22:57,674 - INFO  [PRI IPC Server handler 9 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
> 2011-12-28 19:22:57,702 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725):
> 32.0, 14/26 (54%)
> 2011-12-28 19:22:57,776 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
> 2011-12-28 19:22:57,893 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Reader@1027]
> - Loaded row bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
> 2011-12-28 19:22:57,894 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279,
> entries=210, sequenceid=1213258636, memsize=50.8k, filesize=3.6k
> 2011-12-28 19:22:57,896 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:HRegion@554] -
> Closed rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> ----------------------------------------
> We waited overnight after moving the regions from the regionserver
> before shutting it down. That is why the timestamp between the
> RS_CLOSE_REGION and ERROR are off by 22 hours.
>
> But, looking on the namenode, I found some interesting log entries.
> Note the timestamp on these entries, they are over a week before the
> regionserver errors occurred.
> -------namenode-logs---------
> rm=null
> 2011-12-20 03:47:41,208 - INFO  [IPC Server handler 8 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=create      src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423       dst=null        perm=hbaseregion:hadoop:rw-rw-rw-
> 2011-12-20 03:47:41,240 - INFO  [IPC Server handler 60 on
> 7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
> blk_4226819991054427111_66443
> 2011-12-20 03:48:10,096 - INFO  [IPC Server handler 40 on
> 7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
> blk_51125950084620321_66443
> 2011-12-20 03:48:38,808 - INFO  [IPC Server handler 1 on
> 7080:Server$Handler@1423] - IPC Server handler 1 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:39,245 - INFO  [IPC Server handler 22 on
> 7080:Server$Handler@1423] - IPC Server handler 22 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:40,055 - INFO  [IPC Server handler 42 on
> 7080:Server$Handler@1423] - IPC Server handler 42 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:41,665 - INFO  [IPC Server handler 63 on
> 7080:Server$Handler@1423] - IPC Server handler 63 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:44,875 - INFO  [IPC Server handler 26 on
> 7080:Server$Handler@1423] - IPC Server handler 26 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:51,285 - INFO  [IPC Server handler 11 on
> 7080:Server$Handler@1423] - IPC Server handler 11 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 06:34:21,299 - INFO  [IPC Server handler 10 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=listStatus  src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/attribute      dst=null        perm=null
> 2011-12-20 06:34:21,303 - INFO  [IPC Server handler 19 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=listStatus  src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user   dst=null        perm=null
> ....
> -----namenode-logs------------------
>
> Thanks for helping me figure out this error.
>
> Alok
>
> On Tue, Jan 3, 2012 at 2:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Since you did the region mover thing then all the region should be off
>> from that region server, so data loss is highly unlikely.
>>
>> My theory would be that a compaction was underway and got cancelled
>> when the region server was told to close the region, but the
>> DFSOutputStream thread hasn't been cleanup yet. Probably the file was
>> even already deleted and the data is still safe in the un-compacted
>> files.
>>
>> Ways to confirm this:
>>
>>  - Look in the region server log in the preceding lines what happens
>> to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
>> closing. Do you see a compaction being cancelled?
>>  - Look in the Namenode log for that file, is it being asked to delete
>> it or something before it sends a NotReplicatedYetException?
>>
>> Hope this helps,
>>
>> J-D
>>
>> On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
>>> When attempting to gracefully shutdown a regionserver, I saw a couple
>>> of NotReplicatedYet exceptions in the logs (below). Can't find the
>>> file that is causing this exception in on the HDFS filesystem. Have we
>>> potentially lost the data, or is this exception benign?
>>>
>>>
>>> Alok
>>>
>>> hbase: 0.90.3
>>> hadoop: 0.20.2-cdh3u0
>>>
>>> Gracefull shutdown process:
>>> hbase(main):001:0> balance_switch false
>>>
>>> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>>>
>>> After the region count for the regionserver is 0, we kill the
>>> regionserver process.
>>>
>>> ------------------Log-------------------------
>>> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
>>> 7040:HRegionServer@2142] - Received close region:
>>> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>>> 2011-12-29 17:11:03,787 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:03,803 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:03,911 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
>>> - Bloom added to HFile
>>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
>>> 7.1k, 5225/5276 (99%)
>>> 2011-12-29 17:11:03,944 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
>>> Renaming flushed file at
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
>>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>>> 2011-12-29 17:11:03,961 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
>>> - Loaded col bloom filter metadata for
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>>> 2011-12-29 17:11:03,961 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
>>> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
>>> 2011-12-29 17:11:03,964 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
>>> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>>> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
>>> 7040:HRegionServer@2142] - Received close region:
>>> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>>> 2011-12-29 17:11:05,716 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:05,735 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:05,808 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
>>> - Bloom added to HFile
>>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
>>> 6.5k, 4727/4737 (100%)
>>> 2011-12-29 17:11:05,834 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
>>> Renaming flushed file at
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
>>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>>> 2011-12-29 17:11:05,852 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
>>> - Loaded col bloom filter metadata for
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>>> 2011-12-29 17:11:05,852 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
>>> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
>>> 2011-12-29 17:11:05,856 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
>>> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>>> 2011-12-29 17:14:58,942 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
>>> Shutdown hook starting; hbase.shutdown.hook=true;
>>> fsShutdownHook=Thread[Thread-15,5,main]
>>> 2011-12-29 17:14:58,942 - INFO
>>> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
>>> hook
>>> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
>>> Stopping server on 7040
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
>>> 2011-12-29 17:14:59,480 - INFO  [IPC Server
>>> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
>>> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
>>> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
>>> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
>>> Stopping infoServer
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
>>> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
>>> Stopped SelectChannelConnector@0.0.0.0:7041
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
>>> regionserver7040.cacheFlusher exiting
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
>>> regionserver7040.logSyncer exiting
>>> 2011-12-29 17:14:59,518 - INFO
>>> [regionserver7040.majorCompactionChecker:Chore@79] -
>>> regionserver7040.majorCompactionChecker exiting
>>> 2011-12-29 17:14:59,518 - INFO
>>> [regionserver7040.compactor:CompactSplitThread@113] -
>>> regionserver7040.compactor exiting
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
>>> stopping server at: 10.129.1.230,7040,1323990853621
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
>>> regionserver7040 closing leases
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
>>> regionserver7040 closed leases
>>> 2011-12-29 17:14:59,891 - INFO
>>> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
>>> Closed zookeeper sessionid=0x4343f266cbc0012
>>> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
>>> Session: 0x4343f266cbc0012 closed
>>> 2011-12-29 17:14:59,894 - INFO
>>> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
>>> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
>>> Session: 0x1343f2669140013 closed
>>> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
>>> regionserver7040 exiting
>>> 2011-12-29 17:14:59,899 - INFO
>>> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
>>> EventThread shut down
>>> 2011-12-29 17:14:59,900 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
>>> Starting fs shutdown hook thread.
>>> 2011-12-29 17:14:59,900 - ERROR
>>> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
>>> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>> : org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>
>>> org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>
>>>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>>        at $Proxy5.addBlock(Unknown Source)
>>>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>        at $Proxy5.addBlock(Unknown Source)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>>> 2011-12-29 17:14:59,913 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
>>> Shutdown hook finished.
>>> ------------------------

Re: NotReplicatedYetException when gracefully shutting down a region server.

Posted by Alok Singh <al...@urbanairship.com>.
J-D,

On the regionserver, I see these lines:
-------------
2011-12-28 18:56:37,821 - INFO
[regionserver7040.cacheFlusher:Store@494] - Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/745069841898422651
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
2011-12-28 18:56:37,867 - INFO
[regionserver7040.cacheFlusher:StoreFile$Reader@1027] - Loaded row
bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
2011-12-28 18:56:37,867 - INFO
[regionserver7040.cacheFlusher:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228,
entries=15990, sequenceid=1213107064, memsize=3.8m, filesize=200.1k
2011-12-28 19:22:57,674 - INFO  [PRI IPC Server handler 9 on
7040:HRegionServer@2142] - Received close region:
rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
2011-12-28 19:22:57,702 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Writer@868]
- Bloom added to HFile
(hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725):
32.0, 14/26 (54%)
2011-12-28 19:22:57,776 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@494] -
Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
2011-12-28 19:22:57,893 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Reader@1027]
- Loaded row bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
2011-12-28 19:22:57,894 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279,
entries=210, sequenceid=1213258636, memsize=50.8k, filesize=3.6k
2011-12-28 19:22:57,896 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:HRegion@554] -
Closed rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
2011-12-29 17:14:59,900 - ERROR
[Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
----------------------------------------
We waited overnight after moving the regions from the regionserver
before shutting it down. That is why the timestamp between the
RS_CLOSE_REGION and ERROR are off by 22 hours.

But, looking on the namenode, I found some interesting log entries.
Note the timestamp on these entries, they are over a week before the
regionserver errors occurred.
-------namenode-logs---------
rm=null
2011-12-20 03:47:41,208 - INFO  [IPC Server handler 8 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=create	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423	dst=null	perm=hbaseregion:hadoop:rw-rw-rw-
2011-12-20 03:47:41,240 - INFO  [IPC Server handler 60 on
7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
blk_4226819991054427111_66443
2011-12-20 03:48:10,096 - INFO  [IPC Server handler 40 on
7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
blk_51125950084620321_66443
2011-12-20 03:48:38,808 - INFO  [IPC Server handler 1 on
7080:Server$Handler@1423] - IPC Server handler 1 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:39,245 - INFO  [IPC Server handler 22 on
7080:Server$Handler@1423] - IPC Server handler 22 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:40,055 - INFO  [IPC Server handler 42 on
7080:Server$Handler@1423] - IPC Server handler 42 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:41,665 - INFO  [IPC Server handler 63 on
7080:Server$Handler@1423] - IPC Server handler 63 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:44,875 - INFO  [IPC Server handler 26 on
7080:Server$Handler@1423] - IPC Server handler 26 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:51,285 - INFO  [IPC Server handler 11 on
7080:Server$Handler@1423] - IPC Server handler 11 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 06:34:21,299 - INFO  [IPC Server handler 10 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=listStatus	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/attribute	dst=null	perm=null
2011-12-20 06:34:21,303 - INFO  [IPC Server handler 19 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=listStatus	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user	dst=null	perm=null
....
-----namenode-logs------------------

Thanks for helping me figure out this error.

Alok

On Tue, Jan 3, 2012 at 2:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Since you did the region mover thing then all the region should be off
> from that region server, so data loss is highly unlikely.
>
> My theory would be that a compaction was underway and got cancelled
> when the region server was told to close the region, but the
> DFSOutputStream thread hasn't been cleanup yet. Probably the file was
> even already deleted and the data is still safe in the un-compacted
> files.
>
> Ways to confirm this:
>
>  - Look in the region server log in the preceding lines what happens
> to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
> closing. Do you see a compaction being cancelled?
>  - Look in the Namenode log for that file, is it being asked to delete
> it or something before it sends a NotReplicatedYetException?
>
> Hope this helps,
>
> J-D
>
> On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
>> When attempting to gracefully shutdown a regionserver, I saw a couple
>> of NotReplicatedYet exceptions in the logs (below). Can't find the
>> file that is causing this exception in on the HDFS filesystem. Have we
>> potentially lost the data, or is this exception benign?
>>
>>
>> Alok
>>
>> hbase: 0.90.3
>> hadoop: 0.20.2-cdh3u0
>>
>> Gracefull shutdown process:
>> hbase(main):001:0> balance_switch false
>>
>> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>>
>> After the region count for the regionserver is 0, we kill the
>> regionserver process.
>>
>> ------------------Log-------------------------
>> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
>> 7040:HRegionServer@2142] - Received close region:
>> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>> 2011-12-29 17:11:03,787 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:03,803 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:03,911 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
>> - Bloom added to HFile
>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
>> 7.1k, 5225/5276 (99%)
>> 2011-12-29 17:11:03,944 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
>> Renaming flushed file at
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>> 2011-12-29 17:11:03,961 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
>> - Loaded col bloom filter metadata for
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>> 2011-12-29 17:11:03,961 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
>> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
>> 2011-12-29 17:11:03,964 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
>> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
>> 7040:HRegionServer@2142] - Received close region:
>> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>> 2011-12-29 17:11:05,716 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:05,735 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:05,808 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
>> - Bloom added to HFile
>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
>> 6.5k, 4727/4737 (100%)
>> 2011-12-29 17:11:05,834 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
>> Renaming flushed file at
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>> 2011-12-29 17:11:05,852 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
>> - Loaded col bloom filter metadata for
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>> 2011-12-29 17:11:05,852 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
>> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
>> 2011-12-29 17:11:05,856 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
>> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>> 2011-12-29 17:14:58,942 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
>> Shutdown hook starting; hbase.shutdown.hook=true;
>> fsShutdownHook=Thread[Thread-15,5,main]
>> 2011-12-29 17:14:58,942 - INFO
>> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
>> hook
>> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
>> Stopping server on 7040
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
>> 2011-12-29 17:14:59,480 - INFO  [IPC Server
>> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
>> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
>> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
>> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
>> Stopping infoServer
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
>> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
>> Stopped SelectChannelConnector@0.0.0.0:7041
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
>> regionserver7040.cacheFlusher exiting
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
>> regionserver7040.logSyncer exiting
>> 2011-12-29 17:14:59,518 - INFO
>> [regionserver7040.majorCompactionChecker:Chore@79] -
>> regionserver7040.majorCompactionChecker exiting
>> 2011-12-29 17:14:59,518 - INFO
>> [regionserver7040.compactor:CompactSplitThread@113] -
>> regionserver7040.compactor exiting
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
>> stopping server at: 10.129.1.230,7040,1323990853621
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
>> regionserver7040 closing leases
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
>> regionserver7040 closed leases
>> 2011-12-29 17:14:59,891 - INFO
>> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
>> Closed zookeeper sessionid=0x4343f266cbc0012
>> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
>> Session: 0x4343f266cbc0012 closed
>> 2011-12-29 17:14:59,894 - INFO
>> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
>> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
>> Session: 0x1343f2669140013 closed
>> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
>> regionserver7040 exiting
>> 2011-12-29 17:14:59,899 - INFO
>> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
>> EventThread shut down
>> 2011-12-29 17:14:59,900 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
>> Starting fs shutdown hook thread.
>> 2011-12-29 17:14:59,900 - ERROR
>> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
>> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>> : org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>
>>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>        at $Proxy5.addBlock(Unknown Source)
>>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>        at $Proxy5.addBlock(Unknown Source)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>> 2011-12-29 17:14:59,913 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
>> Shutdown hook finished.
>> ------------------------

Re: NotReplicatedYetException when gracefully shutting down a region server.

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Since you did the region mover thing then all the region should be off
from that region server, so data loss is highly unlikely.

My theory would be that a compaction was underway and got cancelled
when the region server was told to close the region, but the
DFSOutputStream thread hasn't been cleanup yet. Probably the file was
even already deleted and the data is still safe in the un-compacted
files.

Ways to confirm this:

 - Look in the region server log in the preceding lines what happens
to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
closing. Do you see a compaction being cancelled?
 - Look in the Namenode log for that file, is it being asked to delete
it or something before it sends a NotReplicatedYetException?

Hope this helps,

J-D

On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
> When attempting to gracefully shutdown a regionserver, I saw a couple
> of NotReplicatedYet exceptions in the logs (below). Can't find the
> file that is causing this exception in on the HDFS filesystem. Have we
> potentially lost the data, or is this exception benign?
>
>
> Alok
>
> hbase: 0.90.3
> hadoop: 0.20.2-cdh3u0
>
> Gracefull shutdown process:
> hbase(main):001:0> balance_switch false
>
> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>
> After the region count for the regionserver is 0, we kill the
> regionserver process.
>
> ------------------Log-------------------------
> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:03,787 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,803 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,911 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
> 7.1k, 5225/5276 (99%)
> 2011-12-29 17:11:03,944 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
> 2011-12-29 17:11:03,964 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:11:05,716 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,735 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,808 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
> 6.5k, 4727/4737 (100%)
> 2011-12-29 17:11:05,834 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
> 2011-12-29 17:11:05,856 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-15,5,main]
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
> hook
> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
> Stopping server on 7040
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
> 2011-12-29 17:14:59,480 - INFO  [IPC Server
> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
> Stopping infoServer
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
> Stopped SelectChannelConnector@0.0.0.0:7041
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
> regionserver7040.cacheFlusher exiting
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
> regionserver7040.logSyncer exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.majorCompactionChecker:Chore@79] -
> regionserver7040.majorCompactionChecker exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.compactor:CompactSplitThread@113] -
> regionserver7040.compactor exiting
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
> stopping server at: 10.129.1.230,7040,1323990853621
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
> regionserver7040 closing leases
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
> regionserver7040 closed leases
> 2011-12-29 17:14:59,891 - INFO
> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
> Closed zookeeper sessionid=0x4343f266cbc0012
> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x4343f266cbc0012 closed
> 2011-12-29 17:14:59,894 - INFO
> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x1343f2669140013 closed
> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
> regionserver7040 exiting
> 2011-12-29 17:14:59,899 - INFO
> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
> EventThread shut down
> 2011-12-29 17:14:59,900 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
> Starting fs shutdown hook thread.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 2011-12-29 17:14:59,913 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
> Shutdown hook finished.
> ------------------------