You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/11/27 09:10:00 UTC

[jira] [Created] (IOTDB-5061) Failed to rename mtree.snapshot.tmp to mtree.snapshot while creating mtree snapshot

刘珍 created IOTDB-5061:
-------------------------

             Summary: Failed to rename mtree.snapshot.tmp to mtree.snapshot while creating mtree snapshot
                 Key: IOTDB-5061
                 URL: https://issues.apache.org/jira/browse/IOTDB-5061
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Jinrui Zhang
         Attachments: iotdb_4593.conf

m_1127_ffbdaf3
1. 启动3副本3C5D 集群
2.BM 写入数据,1小时后,缩容IP72 datanode。
3. 开始缩容,1小时40分钟IP72 刷大量ERROR(308个ERROR 日志文件 NPE)
2022-11-27 15:42:25,876 [3@group-000200000006-StateMachineUpdater] ERROR o.a.i.d.m.m.s.MemMTreeSnapshotUtil:89 - {color:#DE350B}Failed to rename mtree.snapshot.tmp to mtree.snapshot while creating mtree snapshot.{color}
2022-11-27 15:42:26,157 [3@group-000200000006-StateMachineUpdater] ERROR o.a.r.s.i.StateMachineUpdater:194 - 3@group-000200000006-StateMachineUpdater caught a Throwable.
{color:#DE350B}java.lang.NullPointerException: null{color}
        at org.apache.iotdb.db.metadata.tag.TagManager.createSnapshot(TagManager.java:79)
        at org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createSnapshot(SchemaRegionMemoryImpl.java:456)
        at org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.takeSnapshot(SchemaRegionStateMachine.java:62)
        at org.apache.iotdb.consensus.IStateMachine.takeSnapshot(IStateMachine.java:82)
        at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:212)
        at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270)
        at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262)
        at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:186)
        at java.base/java.lang.Thread.run(Thread.java:834)
2022-11-27 15:42:26,158 [3@group-000200000006-StateMachineUpdater] ERROR o.a.r.s.i.StateMachineUpdater:194 - 3@group-000200000006-StateMachineUpdater caught a Throwable.
java.lang.NullPointerException: null
        at org.apache.iotdb.db.metadata.tag.TagManager.createSnapshot(TagManager.java:79)
        at org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createSnapshot(SchemaRegionMemoryImpl.java:456)
        at org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.takeSnapshot(SchemaRegionStateMachine.java:62)
        at org.apache.iotdb.consensus.IStateMachine.takeSnapshot(IStateMachine.java:82)
        at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.takeSnapshot(ApplicationStateMachineProxy.java:212)
        at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:270)
        at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:262)
        at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
        at java.base/java.lang.Thread.run(Thread.java:834)


测试环境
1. 192.168.10.72~76
ConfigNode
MAX_HEAP_SIZE="8G"
cn_connection_timeout_ms=120000

Common
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus
schema_replication_factor=3
data_replication_factor=3
connection_timeout_ms=120000
max_connection_for_internal_service=200
max_waiting_time_when_insert_blocked=600000
query_timeout_threshold=36000000

DataNode
MAX_HEAP_SIZE="256G"
MAX_DIRECT_MEMORY_SIZE="32G"

2. bm配置见附件

3. ip72 ${iotdb_dir}下的脚本
sleep 1h
./sbin/start-cli.sh -h 192.168.10.72 -e "show cluster" > bef_remove.out
./sbin/start-cli.sh -h 192.168.10.72 -e "show regions" >> bef_remove.out
./sbin/start-cli.sh -h 192.168.10.72 -e "show storage group" >> bef_remove.out
./sbin/remove-datanode.sh "192.168.10.72:6667" >> remove_ip72.out

4. 查看缩容结果,各节点日志



--
This message was sent by Atlassian Jira
(v8.20.10#820010)