You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Jinrui Zhang (Jira)" <ji...@apache.org> on 2022/10/08 06:39:00 UTC

[jira] [Assigned] (IOTDB-4539) [ remove datanode ] After migrating some regions, the migration cannot continue

     [ https://issues.apache.org/jira/browse/IOTDB-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jinrui Zhang reassigned IOTDB-4539:
-----------------------------------

    Assignee: Gaofei Cao  (was: Jinrui Zhang)

Let's take a look at this issue with high priority

> [ remove datanode ] After migrating some regions, the migration cannot continue
> -------------------------------------------------------------------------------
>
>                 Key: IOTDB-4539
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4539
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Gaofei Cao
>            Priority: Major
>         Attachments: image-2022-09-27-15-00-29-918.png, image-2022-09-27-15-01-23-776.png, ip39_datanode_logs.tar.gz, ip40_datanode_logs.tar.gz, remove_datanode.conf
>
>
> master_0926_2bc3954
> schemaregion :  ratis
> dataregion : multiLeader
> 均为3副本,启动3C3D,启动客户端写入(缩容时不停写入),增加ip40 datanode节点,缩容ip39,
> ip39 
> 在
> succeed to remove region SchemaRegion[2] consensus group
> succeed to remove region SchemaRegion[1] consensus group
> {color:#DE350B}迁移2个schemaregion后没有继续迁移其他schemaregion和dataregion,
> 一直处于Removing状态,还在接受写入,ip40是new peer节点,接受迁移过来的部分region后,没有新的迁移数据进来,{color}ip40接受的region:
> succeed to create new region SchemaRegion[2]
> succeed to create new region SchemaRegion[1]
> succeed to create new region DataRegion[4]
> 之后的log一直输出:
> 2022-09-27 14:58:55,941 [java.util.concurrent.ThreadPoolExecutor$Worker@34924549[State = -1, empty queue]] WARN  o.a.r.g.s.GrpcLogAppender:239 - 172.20.70.40_50010@group-000200000001->172.20.70.37_50010-GrpcLogAppender: HEARTBEAT appendEntries Timeout, request=AppendEntriesRequest:cid=4621,entriesCount=0,lastEntry=null
>  !image-2022-09-27-15-00-29-918.png! 
> 缩容前的集群与region信息(ip40上没有region分布):
>  !image-2022-09-27-15-01-23-776.png! 
> 测试环境
> 总体流程
> 1. 172.20.70.34...43 10台机器   8C32GB
> confignode在34,35,36
> bm在35
> datanode在37..43
> 先启动3C3D ,34,35,36
>                           37,38,39
> 在启动benchmark
> 启动ip40的datanode
> 缩容ip39
> 详细信息:
> 集群配置
> ConfigNode
> MAX_HEAP_SIZE="8G"
> MAX_DIRECT_MEMORY_SIZE="4G"
>  schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
>  data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
>  time_partition_interval_for_routing=86400000
>  schema_replication_factor=3
>  data_replication_factor=3
> DataNode
> MAX_HEAP_SIZE="20G"
> MAX_DIRECT_MEMORY_SIZE="6G"
> wal_buffer_size_in_byte=1048576
>  enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=3600000
> seq_memtable_flush_check_interval_in_ms=600000
>  enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=3600000
> unseq_memtable_flush_check_interval_in_ms=600000
>  query_timeout_threshold=36000000
> benchmark配置见附件
> ip39日志见附件
> ip40日志见附件



--
This message was sent by Atlassian Jira
(v8.20.10#820010)