You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/11/22 02:59:00 UTC

[jira] [Reopened] (IOTDB-4556) [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist

     [ https://issues.apache.org/jira/browse/IOTDB-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

刘珍 reopened IOTDB-4556:
-----------------------

IP62 
2022-11-20 23:38:37,931 [MultiLeaderConsensusClientPool-selector-130] ERROR o.a.i.c.m.l.IndexController:111 - failed to flush sync index. cannot find previous version file. previous: 50500
测试环境
master_1119_fd57958
1. 启动3副本3C5D集群, BM写入数据(缩容时也在写),40分钟后,执行步骤2
2. 缩容ip72 datanode ,成功后,移除残存data文件夹,重新启动加入集群
3. 缩容ip68
4.ip62 出现此ERROR

> [Remove-DataNode] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4556
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4556
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Haiming Zhu
>            Priority: Major
>         Attachments: 73to74.png, 73to75.png, 73to76.png, ip73_dataregion24.png, more_dev.conf
>
>
> m_0929_71d5f65
> SchemaRegion : ratis
> DataRegion : multiLeader
> 均为3副本,3C5D
> 启动客户端bm写入,缩容期间写入不停。
> bm运行40分钟,缩容节点1(ip72),1小时38分钟缩容成功。
> mv 节点1的data ,logs,再上线。
> 缩容节点2(ip73,开始缩容的时间09-29 14:10),此节点不包含DataRegion[24]
> *{color:#DE350B}DataRegion[24]在ip74,ip75,ip76{color}*
> 但是ip73 error :
> 2022-09-29 14:23:39,273 [pool-24-IoTDB-ClientRPC-Processor-2$20220929_062339_48081_4.1.0] {color:#DE350B}*ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:287 - The consensus group DataRegion[24] doesn't exist*{color}
> 2022-09-29 14:23:39,275 [MultiLeaderConsensusClientPool-selector-98] ERROR o.a.i.c.m.l.IndexController:111 - {color:#DE350B}*failed to flush sync index. cannot find previous version file. previous: 93500*{color}
> 2022-09-29 14:23:39,179 [pool-24-IoTDB-ClientRPC-Processor-45] WARN  o.a.i.d.u.ErrorHandlingUtils:62 - Status code: EXECUTE_STATEMENT_ERROR(400), operation: insertTablet failed
> java.lang.RuntimeException: org.apache.iotdb.commons.exception.IoTDBException: There are no available RegionGroups currently, please check the status of cluster DataNodes
>         at org.apache.iotdb.db.mpp.plan.analyze.ClusterPartitionFetcher.getOrCreateDataPartition(ClusterPartitionFetcher.java:280)
>         at org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:1236)
>         at org.apache.iotdb.db.mpp.plan.analyze.AnalyzeVisitor.visitInsertTablet(AnalyzeVisitor.java:150)
>         at org.apache.iotdb.db.mpp.plan.statement.crud.InsertTabletStatement.accept(InsertTabletStatement.java:121)
>         at org.apache.iotdb.db.mpp.plan.statement.StatementVisitor.process(StatementVisitor.java:98)
>         at org.apache.iotdb.db.mpp.plan.analyze.Analyzer.analyze(Analyzer.java:40)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.analyze(QueryExecution.java:236)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.<init>(QueryExecution.java:138)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.createQueryExecution(Coordinator.java:100)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:133)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160)
>         at org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:996)
>         at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3512)
>         at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:3492)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.iotdb.commons.exception.IoTDBException: There are no available RegionGroups currently, please check the status of cluster DataNodes
>         ... 20 common frames omitted
> 测试环境
> 1. 192.168.10.72/73/74/75/76  48CPU384GB
> 3C : 72,73,74
> 5D : 72 ,73,74,75,76
> 集群配置
> ConfigNode 
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=120000
> DataNode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> connection_timeout_ms=120000
> max_connection_for_internal_service=200
> max_waiting_time_when_insert_blocked=600000
> query_timeout_threshold=36000000
> 2. benchmark配置见附件
> 3. bm运行40分钟 缩容ip72
> 等待ip72 缩容完成,datanode进程退出
> mv data logs
> 再次启动ip72
> 4. 缩容ip73 ,出现问题描述的ERROR 信息



--
This message was sent by Atlassian Jira
(v8.20.10#820010)