You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/12/06 10:36:00 UTC

[jira] [Reopened] (IOTDB-4027) ERROR o.a.i.d.e.s.SnapshotLoader:94 - Exception occurs when creating links from snapshot directory to data directory

     [ https://issues.apache.org/jira/browse/IOTDB-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

刘珍 reopened IOTDB-4027:
-----------------------

rel/1.0 2022-12-06_cbf7291
ip6 follower 停datanode服务,5分钟后启动服务,同步失败:
2022-12-06 16:24:23,343 [grpc-default-executor-25] ERROR o.a.r.s.i.SnapshotInstallationHandler:96 - 7@group-000100000004: installSnapshot failed
java.nio.file.FileAlreadyExistsException: /data/iotdb/rel_1206_cbf7291_issue/data/datanode/data/snapshot/.tmp.group-000100000004/snapshot-7b3e6a19-3184-4351-8d0f-fa996882675a/24_372652/sequence/root.ip4.g_0/4/2543/1670313965072-25-0-0.tsfile
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
        at java.base/java.nio.file.Files.createDirectory(Files.java:690)
        at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:797)
        at java.base/java.nio.file.Files.createDirectories(Files.java:743)
        at org.apache.ratis.util.FileUtils.lambda$createDirectories$4(FileUtils.java:70)
        at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
        at org.apache.ratis.util.FileUtils.createDirectories(FileUtils.java:69)
        at org.apache.ratis.util.FileUtils.createDirectories(FileUtils.java:65)
        at org.apache.ratis.server.storage.SnapshotManager.installSnapshot(SnapshotManager.java:106)
        at org.apache.ratis.server.impl.ServerState.installSnapshot(ServerState.java:480)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.checkAndInstallSnapshot(SnapshotInstallationHandler.java:181)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:120)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94)
        at org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1517)
        at org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:640)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:242)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:239)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:124)
        at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
        at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834)
        at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

测试环境-私有云1期3副本3C9D
172.16.2.2~10 
/data/iotdb/rel_1206_cbf7291_issue


>  ERROR o.a.i.d.e.s.SnapshotLoader:94 - Exception occurs when creating links from snapshot directory to data directory
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4027
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4027
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Song Ziyang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>         Attachments: image-2022-08-03-09-39-10-230.png, image-2022-08-03-09-39-48-739.png, image-2022-09-06-17-05-21-387.png, ip18_befor_stop_datanode_log.tar.gz, ip18_restart_with-error_log.tar.gz, ip4_2000_config.properties, screenshot-1.png
>
>
> master_0801_55b5b17
> 问题描述
> RatisConsensus,3副本3C9D,1个bm连1个datanode执行并发写入,停止1个follower节点,5分钟后启动;{color:#DE350B}*然后停止另1个follower节点10分钟后启动,此节点启动过程中报错,此节点少数据*{color}:
> 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR o.a.i.d.e.s.SnapshotLoader:94 - Exception occurs when creating links from snapshot directory to data directory
> java.io.IOException: Cannot find /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/sequence/root.ip4.g_0 or /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536/unsequence/root.ip4.g_0
>         at org.apache.iotdb.db.engine.snapshot.SnapshotLoader.createLinksFromSnapshotDirToDataDir(SnapshotLoader.java:163)
>         at org.apache.iotdb.db.engine.snapshot.SnapshotLoader.loadSnapshotForStateMachine(SnapshotLoader.java:91)
>         at org.apache.iotdb.db.consensus.statemachine.DataRegionStateMachine.loadSnapshot(DataRegionStateMachine.java:93)
>         at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.loadSnapshot(ApplicationStateMachineProxy.java:188)
>         at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.lambda$initialize$0(ApplicationStateMachineProxy.java:73)
>         at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
>         at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.initialize(ApplicationStateMachineProxy.java:69)
>         at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:136)
>         at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:201)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$5(RaftServerProxy.java:274)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-08-02 18:04:17,376 [pool-4-thread-1] ERROR o.a.i.d.c.s.DataRegionStateMachine:95 - Fail to load snapshot from /data/iotdb/master_0801_2de0dd8/datanode/./sbin/../data/consensus/data_region/47474747-4747-4747-4747-000100000001/sm/1_354536
> ip18少数据,期望序列的count值是20000点
>  !screenshot-1.png! 
> 1. 复现流程
> 私有云172.20.70.2/3/4/5/13/14/16/18/19
> benchmark 在ip15(连ip4)
> 停ip4/启动ip4  , 停ip18/启动ip18,ip18报错
>  !image-2022-08-03-09-39-10-230.png! 
>  !image-2022-08-03-09-39-48-739.png! 
> 2. 启动benchmark
> 2022-08-02 17:34:57 启动bm
> 3. 停止ip4的datanode
> 2022-08-02 17:45:42停止datanode
> sleep 300
> 启动ip4
> 4. 停止ip18的datanode
> 2022-08-02 17:54:11 停止ip18的datanode
> sleep 600
> 启动ip18
> {color:#DE350B}*启动过程中,报错*{color}:
> 见问题描述
> bm写入完成,各节点同步完成,{color:#DE350B}*ip18节点少数据*{color},ip16,ip4 的数据正确。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)