You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Gaofei Cao (Jira)" <ji...@apache.org> on 2022/11/11 02:31:00 UTC

[jira] [Assigned] (IOTDB-4809) [Broadcast partitionCache on removed datanode] ConsensusGroupNotExistException: The consensus group DataRegion[11] doesn't exist

     [ https://issues.apache.org/jira/browse/IOTDB-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaofei Cao reassigned IOTDB-4809:
---------------------------------

    Assignee: Gaofei Cao  (was: 陈哲涵)

> [Broadcast partitionCache on removed datanode] ConsensusGroupNotExistException: The consensus group DataRegion[11] doesn't exist
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4809
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4809
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Gaofei Cao
>            Priority: Major
>         Attachments: after_remove_regions_info.out, before_remove_regions_info.out, more_dev.conf, screenshot-1.png, screenshot-2.png
>
>
> m_1031_76b947f
> 3rep , 3C5D
> schema region : ratis
> data region : multiLeader
> {color:#DE350B}*This issue contains 3 bugs*{color}
> benchmark runs for 1 hour and execute remove (ip72) datanode ,  ip72 datanode brushes ERROR logs :
> 2022-10-31 17:48:06,277 [pool-25-IoTDB-ClientRPC-Processor-85$20221031_094806_31457_3.1.0] ERROR o.a.i.d.m.p.s.FragmentInstanceDispatcherImpl:234 - write locally failed. TSStatus: TSStatus(code:412, message:org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[13] doesn't exist), message: org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[13] doesn't exist
> 2022-10-31 17:48:06,285 [pool-25-IoTDB-ClientRPC-Processor-93$20221031_094806_31458_3.1.0] ERROR o.a.i.d.m.e.e.RegionWriteExecutor$WritePlanNodeExecutionVisitor:235 - {color:#DE350B}*Something wrong happened while calling consensus layer's write API.
> org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group DataRegion[11] doesn't exist*{color}
>         at org.apache.iotdb.consensus.multileader.MultiLeaderConsensus.write(MultiLeaderConsensus.java:155)
>         at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.fireTriggerAndInsert(RegionWriteExecutor.java:101)
>         at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:215)
>         at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:163)
>         at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:117)
>         at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1085)
>         at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:83)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchLocally(FragmentInstanceDispatcherImpl.java:232)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:137)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:119)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:90)
>         at org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:102)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:283)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:146)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:160)
>         at org.apache.iotdb.db.service.thrift.impl.ClientRPCServiceImpl.insertTablet(ClientRPCServiceImpl.java:1198)
>         at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4078)
>         at org.apache.iotdb.service.rpc.thrift.IClientRPCService$Processor$insertTablet.getResult(IClientRPCService.java:4058)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> {color:#DE350B}**See the attachment for region information before and after the remove operation .
> An incorrect phenomenon**{color}
>  !screenshot-1.png! 
> {color:#DE350B}*When  removing,  new dataregion was created,but no data :*{color}
>  !screenshot-2.png! 
> Test ENV:
> 1. 192.168.10.72、73、74、75、76         48CPU 384GB
> ConfigNode
> MAX_HEAP_SIZE="8G"
> cn_connection_timeout_ms=120000
> Common :
> connection_timeout_ms=120000
> max_connection_for_internal_service=200
> query_timeout_threshold=36000000
> multi_leader_throttle_threshold_in_byte=536870912000
> max_waiting_time_when_insert_blocked=120000
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> Datanode :
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> 2. benchmark configuration
> See the attachment 
> 3. remove cmd :
> {color:#DE350B}*fit-72*{color}:/data/mpp_test/m_1031_76b947f$ cat rm.sh 
> #!/bin.bash
> sleep 1h
> ./sbin/start-cli.sh -h 192.168.10.76 -e "show cluster" >> bef_rm_info.out
> ./sbin/start-cli.sh -h 192.168.10.76 -e "show regions" >> bef_rm_info.out
> ./sbin/remove-datanode.sh  192.168.10.72:6667 > rm_ip72.out



--
This message was sent by Atlassian Jira
(v8.20.10#820010)