You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "wuchang (Jira)" <ji...@apache.org> on 2022/08/31 06:38:00 UTC
[jira] [Assigned] (HBASE-27349) HBase FileNotFound Exception After Region Transitioned
[ https://issues.apache.org/jira/browse/HBASE-27349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuchang reassigned HBASE-27349:
-------------------------------
Assignee: Duo Zhang
> HBase FileNotFound Exception After Region Transitioned
> -------------------------------------------------------
>
> Key: HBASE-27349
> URL: https://issues.apache.org/jira/browse/HBASE-27349
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: wuchang
> Assignee: Duo Zhang
> Priority: Critical
> Attachments: image-2022-08-31-11-39-58-549.png
>
>
> We have the exactly the same issue with https://issues.apache.org/jira/browse/HBASE-13651:
> * The SCAN will got FNFE after RS got Full GC and transmitted and opened in another RS.
> * During which ,taking snapshot will also report FNFE
> * Issue could be resolved by move the problem region manually.
> We find that the HBASE-13651 is reverted afterwards by https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not a problem anymore with the comment in HBASE-18786
> !image-2022-08-31-11-39-58-549.png!
> Basic Timeline of my issue:
> {code:java}
> 2022-08-27 05:26:35 Snapshot TestingSnapshot is taken successfully
> 2022-08-27 15:21:51 The target hfile fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is generated by a compaction in regionserver-67
> 2022-08-27 17:26:36 041e9aeb8cdb46f991459c92f8581e16 is compacted to fd53b8e6b4874eb38712ad2d04389fff
> 2022-08-27 17:35:56 A Full GC happened and the regionserver-67 is forcefully shutdown
> 2022-08-27 17:35:50 Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in regionserver-11
> 2022-08-27 17:35:57 File fafb8f91bd20b1adfe15e2a64a39557e is archived
> 2022-08-27 18:00:00 The hfile is removed by HMaster'S CleanerChore
> 2022-08-27 19:48:10 User's Spark job on HBase shows error that the file is missing
> 2022-08-27 20:26:04 Re-taking snapshot TestingSnapshot also failed for 041e9aeb8cdb46f991459c92f8581e16 is missing{code}
> The exception of Scanning after region is transmitted:
>
> {code:java}
> java.io.FileNotFoundException: File does not exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
> at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
> at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
> at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
> at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
> at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
> at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
> at org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
> at org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
> at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
> at org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
> at org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
> at org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
> at org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
> at org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
> at org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
>
> The exception of taking snapshot after region is transmitted:
> {code:java}
> 2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
> java.io.FileNotFoundException via regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does not exist: hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
> at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
> at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
> at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
> at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
> at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File does not exist: hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
> at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
> at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
> at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
> at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
> at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
> at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
> at org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
> at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
> at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ... 4 more
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)