You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Gaofei Cao (Jira)" <ji...@apache.org> on 2022/10/25 08:17:00 UTC

[jira] [Commented] (IOTDB-4631) [ remove datanode ] The number of nodes that can be removed is not determined

    [ https://issues.apache.org/jira/browse/IOTDB-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623646#comment-17623646 ] 

Gaofei Cao commented on IOTDB-4631:
-----------------------------------

Executing remove-datanode.sh simultaneously lead to this problem, we need adding a lock to resolve it.

> [ remove datanode ] The number of nodes that can be removed is not determined
> -----------------------------------------------------------------------------
>
>                 Key: IOTDB-4631
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4631
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Gaofei Cao
>            Priority: Major
>         Attachments: more_dev.conf, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> m_1012_d7ed1c1
> 3副本,3C5D,multiLeader
> 可缩容的节点数没有进行判定,缩容3个datanode都成功,导致可用datanode只有2个,小于副本数(3)
> 第2个,第3个缩容操作对应的节点的数据没迁移成功,缩容后查询5万dev(select count(s_0),count(s_599) from root.** align by device;),只返回1000条记录
> 缩容3个节点后的集群状态 :
>  !screenshot-2.png! 
>  !screenshot-1.png! 
> 测试环境
> 1. 192.168.10.72/73/74/75/76 5个物理机  48cpu 384GB
> 3C:72/73/74
> 5D: 72/73/74/75/76
> bm在ip1
> ConfigNode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> DataNode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> max_connection_for_internal_service=300
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=3600000
> seq_memtable_flush_check_interval_in_ms=600000
> enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=3600000
> unseq_memtable_flush_check_interval_in_ms=600000
> query_timeout_threshold=36000000
> 启动3C5D集群
> 2. bm写入完成
> 配置见附件
> 3. 执行缩容脚本
> 脚本在ip72,缩容3个datanode,在第3个缩容操作应该报错,实际处理了缩容,进程退出(data没迁移)
> liuzhen@fit-72:/data/mpp_test/m_1012_d7ed1c1/datanode$ cat rm.sh 
> #!/bin.bash
> ./sbin/remove-datanode.sh 192.168.10.72:6667 > rm_ip72.out
> sleep 2h
> ./sbin/remove-datanode.sh 192.168.10.73:6667 > rm_ip73.out
> ./sbin/remove-datanode.sh  5 > rm_ip74.out
> ip74的log:
> 2022-10-13 00:23:37,887 [pool-22-IoTDB-DataNodeInternalRPC-Processor-67] INFO  o.a.i.c.conf.CommonConfig:315 -{color:#DE350B}* Change system status to Removing! The current Node is being removed from cluster!*{color}
> 另外 ip73的data也没迁移成功,ip73缩容后的data:
>  !screenshot-3.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)