You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "wuchang (Jira)" <ji...@apache.org> on 2022/08/31 06:38:00 UTC

[jira] [Assigned] (HBASE-27349) HBase FileNotFound Exception After Region Transitioned

     [ https://issues.apache.org/jira/browse/HBASE-27349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

wuchang reassigned HBASE-27349:
-------------------------------

    Assignee: Duo Zhang

> HBase FileNotFound Exception After Region Transitioned 
> -------------------------------------------------------
>
>                 Key: HBASE-27349
>                 URL: https://issues.apache.org/jira/browse/HBASE-27349
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>            Reporter: wuchang
>            Assignee: Duo Zhang
>            Priority: Critical
>         Attachments: image-2022-08-31-11-39-58-549.png
>
>
> We have the exactly the same issue with https://issues.apache.org/jira/browse/HBASE-13651:
>  * The SCAN will got FNFE after RS got Full GC and  transmitted and opened in another RS. 
>  * During which ,taking snapshot will also report FNFE
>  * Issue could be resolved by move the problem region manually.
> We find that the HBASE-13651 is reverted afterwards by https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not a problem anymore with the comment in HBASE-18786
> !image-2022-08-31-11-39-58-549.png!
> Basic Timeline of my issue:
> {code:java}
>  2022-08-27 05:26:35    Snapshot TestingSnapshot is taken successfully
>  2022-08-27 15:21:51    The target hfile fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is generated by a compaction in regionserver-67
> 2022-08-27 17:26:36     041e9aeb8cdb46f991459c92f8581e16 is compacted to fd53b8e6b4874eb38712ad2d04389fff
> 2022-08-27 17:35:56     A Full GC happened and the regionserver-67 is forcefully shutdown
>  2022-08-27 17:35:50    Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in regionserver-11
> 2022-08-27 17:35:57     File  fafb8f91bd20b1adfe15e2a64a39557e  is archived
>  2022-08-27 18:00:00    The hfile is removed by HMaster'S CleanerChore
> 2022-08-27 19:48:10     User's Spark job on HBase shows error that the file is missing
>  2022-08-27 20:26:04    Re-taking snapshot TestingSnapshot also failed for 041e9aeb8cdb46f991459c92f8581e16 is missing{code}
> The exception of Scanning after region is transmitted:
>  
> {code:java}
> java.io.FileNotFoundException: File does not exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>         at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>         at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
>         at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
>         at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
>         at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
>         at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
>         at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
>         at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
>         at org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
>         at org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
>         at org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
>         at org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
>         at org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
>         at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
>         at org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
>  
> The exception of taking snapshot after region is transmitted:
> {code:java}
> 2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
> java.io.FileNotFoundException via regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does not exist: hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
>         at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
>         at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
>         at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File does not exist: hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
>         at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>         at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
>         at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
>         at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
>         at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
>         at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
>         at org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
>         at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         ... 4 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)