You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2012/01/03 23:06:03 UTC

Re: NotReplicatedYetException when gracefully shutting down a region server.

Since you did the region mover thing then all the region should be off
from that region server, so data loss is highly unlikely.

My theory would be that a compaction was underway and got cancelled
when the region server was told to close the region, but the
DFSOutputStream thread hasn't been cleanup yet. Probably the file was
even already deleted and the data is still safe in the un-compacted
files.

Ways to confirm this:

 - Look in the region server log in the preceding lines what happens
to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
closing. Do you see a compaction being cancelled?
 - Look in the Namenode log for that file, is it being asked to delete
it or something before it sends a NotReplicatedYetException?

Hope this helps,

J-D

On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
> When attempting to gracefully shutdown a regionserver, I saw a couple
> of NotReplicatedYet exceptions in the logs (below). Can't find the
> file that is causing this exception in on the HDFS filesystem. Have we
> potentially lost the data, or is this exception benign?
>
>
> Alok
>
> hbase: 0.90.3
> hadoop: 0.20.2-cdh3u0
>
> Gracefull shutdown process:
> hbase(main):001:0> balance_switch false
>
> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>
> After the region count for the regionserver is 0, we kill the
> regionserver process.
>
> ------------------Log-------------------------
> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:03,787 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,803 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,911 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
> 7.1k, 5225/5276 (99%)
> 2011-12-29 17:11:03,944 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
> 2011-12-29 17:11:03,964 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:11:05,716 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,735 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,808 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
> 6.5k, 4727/4737 (100%)
> 2011-12-29 17:11:05,834 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
> 2011-12-29 17:11:05,856 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-15,5,main]
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
> hook
> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
> Stopping server on 7040
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
> 2011-12-29 17:14:59,480 - INFO  [IPC Server
> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
> Stopping infoServer
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
> Stopped SelectChannelConnector@0.0.0.0:7041
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
> regionserver7040.cacheFlusher exiting
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
> regionserver7040.logSyncer exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.majorCompactionChecker:Chore@79] -
> regionserver7040.majorCompactionChecker exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.compactor:CompactSplitThread@113] -
> regionserver7040.compactor exiting
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
> stopping server at: 10.129.1.230,7040,1323990853621
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
> regionserver7040 closing leases
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
> regionserver7040 closed leases
> 2011-12-29 17:14:59,891 - INFO
> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
> Closed zookeeper sessionid=0x4343f266cbc0012
> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x4343f266cbc0012 closed
> 2011-12-29 17:14:59,894 - INFO
> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x1343f2669140013 closed
> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
> regionserver7040 exiting
> 2011-12-29 17:14:59,899 - INFO
> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
> EventThread shut down
> 2011-12-29 17:14:59,900 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
> Starting fs shutdown hook thread.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 2011-12-29 17:14:59,913 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
> Shutdown hook finished.
> ------------------------

Re: NotReplicatedYetException when gracefully shutting down a region server.

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's very interesting, did you look at that region server's log to
see what was going on when creating that file? Normally if a region
server encounters a FATAL fs error (like not being able to create a
file) it will abort itself.

J-D

On Tue, Jan 3, 2012 at 4:14 PM, Alok Singh <al...@urbanairship.com> wrote:
> J-D,
>
> On the regionserver, I see these lines:
> -------------
> 2011-12-28 18:56:37,821 - INFO
> [regionserver7040.cacheFlusher:Store@494] - Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/745069841898422651
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
> 2011-12-28 18:56:37,867 - INFO
> [regionserver7040.cacheFlusher:StoreFile$Reader@1027] - Loaded row
> bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
> 2011-12-28 18:56:37,867 - INFO
> [regionserver7040.cacheFlusher:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228,
> entries=15990, sequenceid=1213107064, memsize=3.8m, filesize=200.1k
> 2011-12-28 19:22:57,674 - INFO  [PRI IPC Server handler 9 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
> 2011-12-28 19:22:57,702 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725):
> 32.0, 14/26 (54%)
> 2011-12-28 19:22:57,776 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
> 2011-12-28 19:22:57,893 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Reader@1027]
> - Loaded row bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
> 2011-12-28 19:22:57,894 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279,
> entries=210, sequenceid=1213258636, memsize=50.8k, filesize=3.6k
> 2011-12-28 19:22:57,896 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:HRegion@554] -
> Closed rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> ----------------------------------------
> We waited overnight after moving the regions from the regionserver
> before shutting it down. That is why the timestamp between the
> RS_CLOSE_REGION and ERROR are off by 22 hours.
>
> But, looking on the namenode, I found some interesting log entries.
> Note the timestamp on these entries, they are over a week before the
> regionserver errors occurred.
> -------namenode-logs---------
> rm=null
> 2011-12-20 03:47:41,208 - INFO  [IPC Server handler 8 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=create      src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423       dst=null        perm=hbaseregion:hadoop:rw-rw-rw-
> 2011-12-20 03:47:41,240 - INFO  [IPC Server handler 60 on
> 7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
> blk_4226819991054427111_66443
> 2011-12-20 03:48:10,096 - INFO  [IPC Server handler 40 on
> 7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
> blk_51125950084620321_66443
> 2011-12-20 03:48:38,808 - INFO  [IPC Server handler 1 on
> 7080:Server$Handler@1423] - IPC Server handler 1 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:39,245 - INFO  [IPC Server handler 22 on
> 7080:Server$Handler@1423] - IPC Server handler 22 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:40,055 - INFO  [IPC Server handler 42 on
> 7080:Server$Handler@1423] - IPC Server handler 42 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:41,665 - INFO  [IPC Server handler 63 on
> 7080:Server$Handler@1423] - IPC Server handler 63 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:44,875 - INFO  [IPC Server handler 26 on
> 7080:Server$Handler@1423] - IPC Server handler 26 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 03:48:51,285 - INFO  [IPC Server handler 11 on
> 7080:Server$Handler@1423] - IPC Server handler 11 on 7080, call
> addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
> DFSClient_201969766, null) from 10.129.1.230:54052: error:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> 2011-12-20 06:34:21,299 - INFO  [IPC Server handler 10 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=listStatus  src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/attribute      dst=null        perm=null
> 2011-12-20 06:34:21,303 - INFO  [IPC Server handler 19 on
> 7080:FSNamesystem@125] -
> ugi=hbaseregion ip=/10.129.1.230        cmd=listStatus  src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user   dst=null        perm=null
> ....
> -----namenode-logs------------------
>
> Thanks for helping me figure out this error.
>
> Alok
>
> On Tue, Jan 3, 2012 at 2:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Since you did the region mover thing then all the region should be off
>> from that region server, so data loss is highly unlikely.
>>
>> My theory would be that a compaction was underway and got cancelled
>> when the region server was told to close the region, but the
>> DFSOutputStream thread hasn't been cleanup yet. Probably the file was
>> even already deleted and the data is still safe in the un-compacted
>> files.
>>
>> Ways to confirm this:
>>
>>  - Look in the region server log in the preceding lines what happens
>> to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
>> closing. Do you see a compaction being cancelled?
>>  - Look in the Namenode log for that file, is it being asked to delete
>> it or something before it sends a NotReplicatedYetException?
>>
>> Hope this helps,
>>
>> J-D
>>
>> On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
>>> When attempting to gracefully shutdown a regionserver, I saw a couple
>>> of NotReplicatedYet exceptions in the logs (below). Can't find the
>>> file that is causing this exception in on the HDFS filesystem. Have we
>>> potentially lost the data, or is this exception benign?
>>>
>>>
>>> Alok
>>>
>>> hbase: 0.90.3
>>> hadoop: 0.20.2-cdh3u0
>>>
>>> Gracefull shutdown process:
>>> hbase(main):001:0> balance_switch false
>>>
>>> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>>>
>>> After the region count for the regionserver is 0, we kill the
>>> regionserver process.
>>>
>>> ------------------Log-------------------------
>>> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
>>> 7040:HRegionServer@2142] - Received close region:
>>> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>>> 2011-12-29 17:11:03,787 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:03,803 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:03,911 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
>>> - Bloom added to HFile
>>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
>>> 7.1k, 5225/5276 (99%)
>>> 2011-12-29 17:11:03,944 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
>>> Renaming flushed file at
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
>>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>>> 2011-12-29 17:11:03,961 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
>>> - Loaded col bloom filter metadata for
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>>> 2011-12-29 17:11:03,961 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
>>> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
>>> 2011-12-29 17:11:03,964 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
>>> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>>> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
>>> 7040:HRegionServer@2142] - Received close region:
>>> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>>> 2011-12-29 17:11:05,716 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:05,735 - WARN
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>>> using octal or symbolic umask specifications.
>>> 2011-12-29 17:11:05,808 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
>>> - Bloom added to HFile
>>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
>>> 6.5k, 4727/4737 (100%)
>>> 2011-12-29 17:11:05,834 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
>>> Renaming flushed file at
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
>>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>>> 2011-12-29 17:11:05,852 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
>>> - Loaded col bloom filter metadata for
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>>> 2011-12-29 17:11:05,852 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
>>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
>>> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
>>> 2011-12-29 17:11:05,856 - INFO
>>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
>>> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>>> 2011-12-29 17:14:58,942 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
>>> Shutdown hook starting; hbase.shutdown.hook=true;
>>> fsShutdownHook=Thread[Thread-15,5,main]
>>> 2011-12-29 17:14:58,942 - INFO
>>> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
>>> hook
>>> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
>>> Stopping server on 7040
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
>>> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
>>> 2011-12-29 17:14:59,480 - INFO  [IPC Server
>>> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
>>> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
>>> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
>>> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
>>> Stopping infoServer
>>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
>>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
>>> exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
>>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
>>> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
>>> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
>>> Stopped SelectChannelConnector@0.0.0.0:7041
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
>>> regionserver7040.cacheFlusher exiting
>>> 2011-12-29 17:14:59,517 - INFO
>>> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
>>> regionserver7040.logSyncer exiting
>>> 2011-12-29 17:14:59,518 - INFO
>>> [regionserver7040.majorCompactionChecker:Chore@79] -
>>> regionserver7040.majorCompactionChecker exiting
>>> 2011-12-29 17:14:59,518 - INFO
>>> [regionserver7040.compactor:CompactSplitThread@113] -
>>> regionserver7040.compactor exiting
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
>>> stopping server at: 10.129.1.230,7040,1323990853621
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
>>> regionserver7040 closing leases
>>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
>>> regionserver7040 closed leases
>>> 2011-12-29 17:14:59,891 - INFO
>>> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
>>> Closed zookeeper sessionid=0x4343f266cbc0012
>>> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
>>> Session: 0x4343f266cbc0012 closed
>>> 2011-12-29 17:14:59,894 - INFO
>>> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
>>> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
>>> Session: 0x1343f2669140013 closed
>>> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
>>> regionserver7040 exiting
>>> 2011-12-29 17:14:59,899 - INFO
>>> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
>>> EventThread shut down
>>> 2011-12-29 17:14:59,900 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
>>> Starting fs shutdown hook thread.
>>> 2011-12-29 17:14:59,900 - ERROR
>>> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
>>> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>> : org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>
>>> org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>
>>>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>>        at $Proxy5.addBlock(Unknown Source)
>>>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>        at $Proxy5.addBlock(Unknown Source)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>>> 2011-12-29 17:14:59,913 - INFO
>>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
>>> Shutdown hook finished.
>>> ------------------------

Re: NotReplicatedYetException when gracefully shutting down a region server.

Posted by Alok Singh <al...@urbanairship.com>.
J-D,

On the regionserver, I see these lines:
-------------
2011-12-28 18:56:37,821 - INFO
[regionserver7040.cacheFlusher:Store@494] - Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/745069841898422651
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
2011-12-28 18:56:37,867 - INFO
[regionserver7040.cacheFlusher:StoreFile$Reader@1027] - Loaded row
bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228
2011-12-28 18:56:37,867 - INFO
[regionserver7040.cacheFlusher:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/8857592033950378228,
entries=15990, sequenceid=1213107064, memsize=3.8m, filesize=200.1k
2011-12-28 19:22:57,674 - INFO  [PRI IPC Server handler 9 on
7040:HRegionServer@2142] - Received close region:
rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
2011-12-28 19:22:57,702 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Writer@868]
- Bloom added to HFile
(hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725):
32.0, 14/26 (54%)
2011-12-28 19:22:57,776 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@494] -
Renaming flushed file at
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/8759993386901964725
to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
2011-12-28 19:22:57,893 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:StoreFile$Reader@1027]
- Loaded row bloom filter metadata for
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279
2011-12-28 19:22:57,894 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:Store@504] - Added
hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user/4476093570994008279,
entries=210, sequenceid=1213258636, memsize=50.8k, filesize=3.6k
2011-12-28 19:22:57,896 - INFO
[RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-2:HRegion@554] -
Closed rich_push.user,4e3b951f2325b4695800679d,1317431456933.0fd5a7b472fb092a4470e9505c3a421a.
2011-12-29 17:14:59,900 - ERROR
[Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
----------------------------------------
We waited overnight after moving the regions from the regionserver
before shutting it down. That is why the timestamp between the
RS_CLOSE_REGION and ERROR are off by 22 hours.

But, looking on the namenode, I found some interesting log entries.
Note the timestamp on these entries, they are over a week before the
regionserver errors occurred.
-------namenode-logs---------
rm=null
2011-12-20 03:47:41,208 - INFO  [IPC Server handler 8 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=create	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423	dst=null	perm=hbaseregion:hadoop:rw-rw-rw-
2011-12-20 03:47:41,240 - INFO  [IPC Server handler 60 on
7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
blk_4226819991054427111_66443
2011-12-20 03:48:10,096 - INFO  [IPC Server handler 40 on
7080:FSNamesystem@1646] - BLOCK* NameSystem.allocateBlock:
/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423.
blk_51125950084620321_66443
2011-12-20 03:48:38,808 - INFO  [IPC Server handler 1 on
7080:Server$Handler@1423] - IPC Server handler 1 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:39,245 - INFO  [IPC Server handler 22 on
7080:Server$Handler@1423] - IPC Server handler 22 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:40,055 - INFO  [IPC Server handler 42 on
7080:Server$Handler@1423] - IPC Server handler 42 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:41,665 - INFO  [IPC Server handler 63 on
7080:Server$Handler@1423] - IPC Server handler 63 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:44,875 - INFO  [IPC Server handler 26 on
7080:Server$Handler@1423] - IPC Server handler 26 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 03:48:51,285 - INFO  [IPC Server handler 11 on
7080:Server$Handler@1423] - IPC Server handler 11 on 7080, call
addBlock(/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423,
DFSClient_201969766, null) from 10.129.1.230:54052: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
2011-12-20 06:34:21,299 - INFO  [IPC Server handler 10 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=listStatus	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/attribute	dst=null	perm=null
2011-12-20 06:34:21,303 - INFO  [IPC Server handler 19 on
7080:FSNamesystem@125] -
ugi=hbaseregion	ip=/10.129.1.230	cmd=listStatus	src=/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/user	dst=null	perm=null
....
-----namenode-logs------------------

Thanks for helping me figure out this error.

Alok

On Tue, Jan 3, 2012 at 2:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Since you did the region mover thing then all the region should be off
> from that region server, so data loss is highly unlikely.
>
> My theory would be that a compaction was underway and got cancelled
> when the region server was told to close the region, but the
> DFSOutputStream thread hasn't been cleanup yet. Probably the file was
> even already deleted and the data is still safe in the un-compacted
> files.
>
> Ways to confirm this:
>
>  - Look in the region server log in the preceding lines what happens
> to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
> closing. Do you see a compaction being cancelled?
>  - Look in the Namenode log for that file, is it being asked to delete
> it or something before it sends a NotReplicatedYetException?
>
> Hope this helps,
>
> J-D
>
> On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <al...@urbanairship.com> wrote:
>> When attempting to gracefully shutdown a regionserver, I saw a couple
>> of NotReplicatedYet exceptions in the logs (below). Can't find the
>> file that is causing this exception in on the HDFS filesystem. Have we
>> potentially lost the data, or is this exception benign?
>>
>>
>> Alok
>>
>> hbase: 0.90.3
>> hadoop: 0.20.2-cdh3u0
>>
>> Gracefull shutdown process:
>> hbase(main):001:0> balance_switch false
>>
>> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>>
>> After the region count for the regionserver is 0, we kill the
>> regionserver process.
>>
>> ------------------Log-------------------------
>> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
>> 7040:HRegionServer@2142] - Received close region:
>> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>> 2011-12-29 17:11:03,787 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:03,803 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:03,911 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
>> - Bloom added to HFile
>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
>> 7.1k, 5225/5276 (99%)
>> 2011-12-29 17:11:03,944 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
>> Renaming flushed file at
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>> 2011-12-29 17:11:03,961 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
>> - Loaded col bloom filter metadata for
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
>> 2011-12-29 17:11:03,961 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
>> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
>> 2011-12-29 17:11:03,964 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
>> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
>> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
>> 7040:HRegionServer@2142] - Received close region:
>> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>> 2011-12-29 17:11:05,716 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:05,735 - WARN
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
>> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
>> using octal or symbolic umask specifications.
>> 2011-12-29 17:11:05,808 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
>> - Bloom added to HFile
>> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
>> 6.5k, 4727/4737 (100%)
>> 2011-12-29 17:11:05,834 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
>> Renaming flushed file at
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
>> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>> 2011-12-29 17:11:05,852 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
>> - Loaded col bloom filter metadata for
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
>> 2011-12-29 17:11:05,852 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
>> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
>> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
>> 2011-12-29 17:11:05,856 - INFO
>> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
>> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
>> 2011-12-29 17:14:58,942 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
>> Shutdown hook starting; hbase.shutdown.hook=true;
>> fsShutdownHook=Thread[Thread-15,5,main]
>> 2011-12-29 17:14:58,942 - INFO
>> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
>> hook
>> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
>> Stopping server on 7040
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
>> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
>> exiting
>> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
>> 2011-12-29 17:14:59,480 - INFO  [IPC Server
>> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
>> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
>> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
>> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
>> Stopping infoServer
>> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
>> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
>> exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
>> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
>> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
>> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
>> Stopped SelectChannelConnector@0.0.0.0:7041
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
>> regionserver7040.cacheFlusher exiting
>> 2011-12-29 17:14:59,517 - INFO
>> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
>> regionserver7040.logSyncer exiting
>> 2011-12-29 17:14:59,518 - INFO
>> [regionserver7040.majorCompactionChecker:Chore@79] -
>> regionserver7040.majorCompactionChecker exiting
>> 2011-12-29 17:14:59,518 - INFO
>> [regionserver7040.compactor:CompactSplitThread@113] -
>> regionserver7040.compactor exiting
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
>> stopping server at: 10.129.1.230,7040,1323990853621
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
>> regionserver7040 closing leases
>> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
>> regionserver7040 closed leases
>> 2011-12-29 17:14:59,891 - INFO
>> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
>> Closed zookeeper sessionid=0x4343f266cbc0012
>> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
>> Session: 0x4343f266cbc0012 closed
>> 2011-12-29 17:14:59,894 - INFO
>> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
>> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
>> Session: 0x1343f2669140013 closed
>> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
>> regionserver7040 exiting
>> 2011-12-29 17:14:59,899 - INFO
>> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
>> EventThread shut down
>> 2011-12-29 17:14:59,900 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
>> Starting fs shutdown hook thread.
>> 2011-12-29 17:14:59,900 - ERROR
>> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
>> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>> : org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>
>>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>        at $Proxy5.addBlock(Unknown Source)
>>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>        at $Proxy5.addBlock(Unknown Source)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>> 2011-12-29 17:14:59,913 - INFO
>> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
>> Shutdown hook finished.
>> ------------------------