You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Yongzao Dan (Jira)" <ji...@apache.org> on 2022/07/07 02:03:00 UTC

[jira] [Commented] (IOTDB-3659) ERROR o.a.i.d.s.t.i.InternalServiceImpl:182 - Something wrong happened while calling consensus layer's write API.

    [ https://issues.apache.org/jira/browse/IOTDB-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563535#comment-17563535 ] 

Yongzao Dan commented on IOTDB-3659:
------------------------------------

This bug is caused by [IOTDB-3658|https://issues.apache.org/jira/browse/IOTDB-3658]. A DataNode is successfully join to the cluster while it failed to start its internal services , however the ConfigNode still record it as "RUNNING". Then the leader ConfigNode sends a createRegion request which is eventually failed. Finally, we restart this DataNode and cause this bug.

 

We've currently fixed this bug by [https://github.com/apache/iotdb/pull/6475], but we'll continuously reinfore the startup process of DataNode([IOTDB-3765|https://issues.apache.org/jira/browse/IOTDB-3765]) and the cluster heartbeat mechanism.

> ERROR o.a.i.d.s.t.i.InternalServiceImpl:182 - Something wrong happened while calling consensus layer's write API.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-3659
>                 URL: https://issues.apache.org/jira/browse/IOTDB-3659
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Yongzao Dan
>            Priority: Major
>         Attachments: image-2022-06-27-14-04-38-537.png
>
>
> StandAloneConsensus, 3C3D
> 启动顺序
> node1="172.20.70.13"
> node2="172.20.70.16"
> node3="172.20.70.14"
> RegionID=1的dataregion在ip16,但是ip16报:
> 2022-06-27 10:48:11,852 [pool-1-IoTDB-InternalServiceRPC-Client-93] {color:red}*ERROR o.a.i.d.s.t.i.InternalServiceImpl:182 - Something wrong happened while calling consensus layer's write API.
> org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group {color:red}DataRegion[1]{color} doesn't exist*{color}
>         at org.apache.iotdb.consensus.ratis.RatisConsensus.write(RatisConsensus.java:161)
>         at org.apache.iotdb.db.service.thrift.impl.InternalServiceImpl.sendPlanNode(InternalServiceImpl.java:170)
>         at org.apache.iotdb.mpp.rpc.thrift.InternalService$Processor$sendPlanNode.getResult(InternalService.java:1284)
>         at org.apache.iotdb.mpp.rpc.thrift.InternalService$Processor$sendPlanNode.getResult(InternalService.java:1264)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 复现流程
> 2022-06-25 14:19:11,809启动集群,ip 16 ,datanode启动失败,但是加入集群成功。
> 继续启动benchmark, 连ip14 执行写入和查询(常规长测配置)。
> ip 14 还一直去连ip16:
> 2022-06-26 22:16:02,274 [20220626_141602_01235_5.1.0-72] WARN  o.a.i.c.c.ClientManager:60 - Borrow client from pool for node TEndPoint(ip:172.20.70.16, port:9003) failed.
> net.sf.cglib.core.CodeGenerationException: org.apache.thrift.transport.TTransportException-->java.net.ConnectException: Connection refused (Connection refused)
>         at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:235)
>         at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:220)
>         at net.sf.cglib.proxy.Enhancer.createUsingReflection(Enhancer.java:639)
>         at net.sf.cglib.proxy.Enhancer.firstInstance(Enhancer.java:538)
>         at net.sf.cglib.core.AbstractClassGenerator.create(AbstractClassGenerator.java:231)
>         at net.sf.cglib.proxy.Enhancer.createHelper(Enhancer.java:377)
>         at net.sf.cglib.proxy.Enhancer.create(Enhancer.java:304)
>         at org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.newErrorHandler(SyncThriftClientWithErrorHandler.java:48)
>         at org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$Factory.makeObject(SyncDataNodeInternalServiceClient.java:127)
>         at org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$Factory.makeObject(SyncDataNodeInternalServiceClient.java:105)
>         at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:780)
>         at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:439)
>         at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
>         at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchRemote(FragmentInstanceDispatcherImpl.java:175)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchOneInstance(FragmentInstanceDispatcherImpl.java:163)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatchWriteSync(FragmentInstanceDispatcherImpl.java:144)
>         at org.apache.iotdb.db.mpp.plan.scheduler.FragmentInstanceDispatcherImpl.dispatch(FragmentInstanceDispatcherImpl.java:93)
>         at org.apache.iotdb.db.mpp.plan.scheduler.ClusterScheduler.start(ClusterScheduler.java:95)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.schedule(QueryExecution.java:215)
>         at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:169)
>         at org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:131)
>         at org.apache.iotdb.db.service.thrift.impl.DataNodeTSIServiceImpl.insertTablet(DataNodeTSIServiceImpl.java:911)
>         at org.apache.iotdb.service.rpc.thrift.TSIService$Processor$insertTablet.getResult(TSIService.java:3328)
>         at org.apache.iotdb.service.rpc.thrift.TSIService$Processor$insertTablet.getResult(TSIService.java:3308)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
>         at org.apache.thrift.transport.TSocket.open(TSocket.java:243)
>         at org.apache.iotdb.rpc.TElasticFramedTransport.open(TElasticFramedTransport.java:91)
>         at org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient.<init>(SyncDataNodeInternalServiceClient.java:63)
>         at org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$1eb1dc11.<init>(<generated>)
>         at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at net.sf.cglib.core.ReflectUtils.newInstance(ReflectUtils.java:228)
>         ... 30 common frames omitted
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at org.apache.thrift.transport.TSocket.open(TSocket.java:238)
>         ... 37 common frames omitted
> 2022-06-27 10:40 左右kill ip68的datanode,再重启,ip68 :
> 2022-06-27 10:48:11,852 [pool-1-IoTDB-InternalServiceRPC-Client-93] ERROR o.a.i.d.s.t.i.InternalServiceImpl:182 -{color:red} Something wrong happened while calling consensus layer's write API.
> org.apache.iotdb.consensus.exception.ConsensusGroupNotExistException: The consensus group *DataRegion[1] *doesn't exist{color}
>         at org.apache.iotdb.consensus.ratis.RatisConsensus.write(RatisConsensus.java:161)
>         at org.apache.iotdb.db.service.thrift.impl.InternalServiceImpl.sendPlanNode(InternalServiceImpl.java:170)
>         at org.apache.iotdb.mpp.rpc.thrift.InternalService$Processor$sendPlanNode.getResult(InternalService.java:1284)
>         at org.apache.iotdb.mpp.rpc.thrift.InternalService$Processor$sendPlanNode.getResult(InternalService.java:1264)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> ip14 cli执行show regions
>  !image-2022-06-27-14-04-38-537.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)