You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/12/09 01:11:45 UTC
[jira] Commented: (HADOOP-2283) [hbase]
AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver
edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549767 ]
stack commented on HADOOP-2283:
-------------------------------
Today over on pauls' machines I see this around a new failure to rerun edits:
{code}
2007-12-08 00:03:47,981 DEBUG hbase.HLog - Splitting 0 of 11: hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.810
2007-12-08 00:03:48,243 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_666176028/oldlogfile.log
2007-12-08 00:03:48,249 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1820335982/oldlogfile.log
2007-12-08 00:03:48,255 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_-1253552729/oldlogfile.log
....
2007-12-08 00:03:49,599 DEBUG hbase.HLog - Applied 5500 edits
2007-12-08 00:03:49,612 DEBUG hbase.HLog - Applied 5600 edits
2007-12-08 00:03:49,627 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1117809784/oldlogfile.log
2007-12-08 00:03:49,646 DEBUG hbase.HLog - Applied 5700 edits
2007-12-08 00:03:49,660 DEBUG hbase.HLog - Applied 5800 edits
2007-12-08 00:03:49,673 DEBUG hbase.HLog - Applied 5900 edits
....
2007-12-08 00:03:52,035 DEBUG hbase.HLog - Applied 30000 edits
2007-12-08 00:03:52,036 DEBUG hbase.HLog - Applied 30006 total edits
2007-12-08 00:03:52,036 DEBUG hbase.HLog - Splitting 1 of 11: hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.811
2007-12-08 00:03:52,064 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1117809784/oldlogfile.log
2007-12-08 00:04:34,397 INFO hbase.HMaster - HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: XX.XX.XX.103:60020}
{code}
and then eventually....
{code}
2007-12-08 00:04:52,069 DEBUG retry.RetryInvocationHandler - Exception while invoking create of class org.apache.hadoop.dfs.$Proxy0. Retrying.org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_1117809784/oldlogfile.log for DFSClient_863222988 on client XX.XX.XX.248 because current leaseholder is trying to recreate file. at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:861) at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:817) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:272) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644) at org.apache.hadoop.ipc.Client.call(Client.java:507) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1424)
at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:354) at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:122) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:390)
{code}
There's a double attempt at creating the file hregion_1117809784/oldlogfile.log. Should be easy to fix.
> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Fix For: 0.16.0
>
> Attachments: compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver carrying the ROOT went down (because it hadn't talked to the master in 30 seconds).
> Master notices the downed regionserver because its lease timesout. It then goes to run the shutdown server sequence only splitting the regionserver's edit log, it gets stuck trying to split the second of three log files. Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending operations: ProcessServerShutdown of XX.XX.XX.XX:60020
> 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client XX.XX.XX.XX because curren t leaseholder is trying to recreate file.
> 34976 at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
> 34977 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
> 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
> 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> 34980 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 34981 at java.lang.reflect.Method.invoke(Method.java:597)
> 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 34984
> 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 34986 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 34987 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 34989 at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are stuck in this eternal loop, ROOT never gets reallocated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.