You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/12/20 02:04:00 UTC

[jira] [Created] (IOTDB-5244) [ratis][remove datanode]installSnapshot failed

刘珍 created IOTDB-5244:
-------------------------

             Summary: [ratis][remove datanode]installSnapshot failed
                 Key: IOTDB-5244
                 URL: https://issues.apache.org/jira/browse/IOTDB-5244
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: master branch, 1.0.0
            Reporter: 刘珍
            Assignee: Song Ziyang


rel/1.0 1216_c92440f
1. 启动3副本3C5D集群,config/schema/data 均是ratis协议。
2. BM写入数据,完成。
配置见附件。
3.缩容节点(ip73)调用stop-datanode.sh,再start,
再stop-datanode.sh,再start。
执行缩容。
4.ip68 datanode 报错
2022-12-19 20:25:22,705 [grpc-default-executor-4936] ERROR o.a.r.s.i.SnapshotInstallationHandler:96 - 5@group-00010000001E: installSnapshot failed
org.apache.ratis.io.CorruptedFileException: File /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource (exist? false, length=0) is corrupted: MD5 mismatch for snapshot-158230 installation.  Renamed temporary snapshot file /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource to /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource.corrupt20221219-202522_690
        at org.apache.ratis.server.storage.SnapshotManager.installSnapshot(SnapshotManager.java:155)
        at org.apache.ratis.server.impl.ServerState.installSnapshot(ServerState.java:480)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.checkAndInstallSnapshot(SnapshotInstallationHandler.java:181)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:120)
        at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94)
        at org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1517)
        at org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:640)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:242)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:239)
        at org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:124)
        at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
        at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834)
        at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

测试环境
1. 192.168.10.62/66/68   3ConfigNode    72cpu 256GB
192.168.10.62/66/68/64/73  5DataNode 
73机器:48CPU 384GB

2.数据库配置参数
COMMON配置
schema_replication_factor=3
data_replication_factor=3
data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
query_timeout_threshold=3600000
ConfigNode配置
cn_connection_timeout_ms=120000
MAX_HEAP_SIZE="8G"

DataNode配置
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"
dn_max_connection_for_internal_service=300
3.BM配置见附件
写入完成
4.ip73
stop-datanode.sh
清缓存,启动datanode
stop-datanode.sh
启动datanode
执行缩容。查看节点状态及日志。




--
This message was sent by Atlassian Jira
(v8.20.10#820010)