You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Song Ziyang (Jira)" <ji...@apache.org> on 2023/01/03 12:42:00 UTC

[jira] [Reopened] (IOTDB-5244) [ratis][remove datanode]installSnapshot failed

     [ https://issues.apache.org/jira/browse/IOTDB-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Song Ziyang reopened IOTDB-5244:
--------------------------------

> [ratis][remove datanode]installSnapshot failed
> ----------------------------------------------
>
>                 Key: IOTDB-5244
>                 URL: https://issues.apache.org/jira/browse/IOTDB-5244
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: master branch, 1.0.0
>            Reporter: 刘珍
>            Assignee: Song Ziyang
>            Priority: Major
>         Attachments: iotdb_5244.conf
>
>
> rel/1.0 1216_c92440f
> 1. 启动3副本3C5D集群,config/schema/data 均是ratis协议。
> 2. BM写入数据,完成。
> 配置见附件。
> 3.缩容节点(ip73)调用stop-datanode.sh,再start,
> 再stop-datanode.sh,再start。
> 执行缩容。
> 4.ip68 datanode 报错
> 2022-12-19 20:25:22,705 [grpc-default-executor-4936] ERROR o.a.r.s.i.SnapshotInstallationHandler:96 - 5@group-00010000001E: installSnapshot failed
> org.apache.ratis.io.CorruptedFileException: File /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource (exist? false, length=0) is corrupted: MD5 mismatch for snapshot-158230 installation.  Renamed temporary snapshot file /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource to /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource.corrupt20221219-202522_690
>         at org.apache.ratis.server.storage.SnapshotManager.installSnapshot(SnapshotManager.java:155)
>         at org.apache.ratis.server.impl.ServerState.installSnapshot(ServerState.java:480)
>         at org.apache.ratis.server.impl.SnapshotInstallationHandler.checkAndInstallSnapshot(SnapshotInstallationHandler.java:181)
>         at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:120)
>         at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94)
>         at org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1517)
>         at org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:640)
>         at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:242)
>         at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:239)
>         at org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:124)
>         at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
>         at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>         at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 测试环境
> 1. 192.168.10.62/66/68   3ConfigNode    72cpu 256GB
> 192.168.10.62/66/68/64/73  5DataNode 
> 73机器:48CPU 384GB
> 2.数据库配置参数
> COMMON配置
> schema_replication_factor=3
> data_replication_factor=3
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> query_timeout_threshold=3600000
> ConfigNode配置
> cn_connection_timeout_ms=120000
> MAX_HEAP_SIZE="8G"
> DataNode配置
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> dn_max_connection_for_internal_service=300
> 3.BM配置见附件
> 写入完成
> 4.ip73
> stop-datanode.sh
> 清缓存,启动datanode
> stop-datanode.sh
> 启动datanode
> 执行缩容。查看节点状态及日志。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)