You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/07/20 06:05:00 UTC

[jira] [Created] (IOTDB-3896) Shrink,ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed

刘珍 created IOTDB-3896:
-------------------------

             Summary: Shrink,ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed
                 Key: IOTDB-3896
                 URL: https://issues.apache.org/jira/browse/IOTDB-3896
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Song Ziyang
         Attachments: image-2022-07-20-14-01-59-727.png, ip3_config.properties, ip4_config.properties, ip5-shrink_log_all.tar.gz

RatisConsensus,
3副本3C7D,缩容1个节点,缩容节点的 datanode报错:
2022-07-20 11:32:04,775 [pool-16-IoTDB-Region-Migrate-Pool-1] ERROR o.a.i.d.s.RegionMigrateService$RegionMigrateTask:349 - {color:red}{color:#DE350B}add new peer TEndPoint(ip:172.20.70.4, port:40010) for region DataRegion[1] failed{color}{color}

ip4(add peer的节点)持续输出:
[20220720_033135_85031_3.1.0-141] WARN  o.a.i.c.r.RatisConsensus:506 - group-000100000003: leader is still not ready after 20010ms

复现流程:
1. 准备环境(8C32G机器):
master_0718_967cde6
ip2/3/4/5/13/14/16  共7个datanode
ip3/4/5 共3个confignode
ratis协议,3副本3C3D

schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus

data_replication_factor=3
schema_replication_factor=3

max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000

ConfigNode:
MAX_HEAP_SIZE="4G"
DataNode:
MAX_HEAP_SIZE="16G"
wal_buffer_size_in_byte=1048576

ConfigNode  DataNodeRemoveManager开debug:
<logger level="debug" name="org.apache.iotdb.confignode.manager.DataNodeRemoveManager"/>
DataNode RegionMigrateService开debug:
<logger level="debug" name="org.apache.iotdb.db.service.RegionMigrateService"/>

benchmark测试机:
172.20.70.15
bm_0620_7ec96c1

启动2个benchmark分别连ip3和ip4,配置见附件

2. 运行1小时,ip5有2个dataregion,各做了1次snapshot,
缩容(ip5)时大概有60万条日志(dataregion 1)

ip5日志见附件,
ip4持续输出:
 !image-2022-07-20-14-01-59-727.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)