You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/07/20 06:05:00 UTC
[jira] [Created] (IOTDB-3896) Shrink,ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed
刘珍 created IOTDB-3896:
-------------------------
Summary: Shrink,ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed
Key: IOTDB-3896
URL: https://issues.apache.org/jira/browse/IOTDB-3896
Project: Apache IoTDB
Issue Type: Bug
Components: mpp-cluster
Affects Versions: 0.14.0-SNAPSHOT
Reporter: 刘珍
Assignee: Song Ziyang
Attachments: image-2022-07-20-14-01-59-727.png, ip3_config.properties, ip4_config.properties, ip5-shrink_log_all.tar.gz
RatisConsensus,
3副本3C7D,缩容1个节点,缩容节点的 datanode报错:
2022-07-20 11:32:04,775 [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - {color:red}{color:#DE350B}add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed{color}{color}
ip4(add peer的节点)持续输出:
[20220720_033135_85031_3.1.0-141] WARN o.a.i.c.r.RatisConsensus:506 - group-000100000003: leader is still not ready after 20010ms
复现流程:
1. 准备环境(8C32G机器):
master_0718_967cde6
ip2/3/4/5/13/14/16 共7个datanode
ip3/4/5 共3个confignode
ratis协议,3副本3C3D
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_replication_factor=3
schema_replication_factor=3
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000
ConfigNode:
MAX_HEAP_SIZE="4G"
DataNode:
MAX_HEAP_SIZE="16G"
wal_buffer_size_in_byte=1048576
ConfigNode DataNodeRemoveManager开debug:
<logger level="debug" name="org.apache.iotdb.confignode.manager.DataNodeRemoveManager"/>
DataNode RegionMigrateService开debug:
<logger level="debug" name="org.apache.iotdb.db.service.RegionMigrateService"/>
benchmark测试机:
172.20.70.15
bm_0620_7ec96c1
启动2个benchmark分别连ip3和ip4,配置见附件
2. 运行1小时,ip5有2个dataregion,各做了1次snapshot,
缩容(ip5)时大概有60万条日志(dataregion 1)
ip5日志见附件,
ip4持续输出:
!image-2022-07-20-14-01-59-727.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)