You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Jianyun Cheng (Jira)" <ji...@apache.org> on 2021/08/24 09:07:00 UTC

[jira] [Commented] (IOTDB-1583) Raft log failed to be committed in cluster version

    [ https://issues.apache.org/jira/browse/IOTDB-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403692#comment-17403692 ] 

Jianyun Cheng commented on IOTDB-1583:
--------------------------------------

Append much more logs where the error start.
{code:java}
2021-08-18 14:03:58,027 [Data(10.228.72.71:9003, raftId=0)-SerialToParallel0] INFO  o.a.i.c.s.m.RaftMember:737 - Data(10.228.72.71:9003, raftId=0): Start to make Node(internalIp:10.228.72.135, metaPort:9003, nodeIdentifier:1621312527, dataPort:40010, clientPort:6667, clientIp:0.0.0.0) catch up 
2021-08-18 14:03:58,727 [DataClientThread-64] INFO  o.a.i.c.l.m.s.SyncLogDequeSerializer:284 - Raft log buffer overflow! 
2021-08-18 14:03:58,737 [Data(10.228.72.71:9003, raftId=0)-CatchUpThread24] INFO  o.a.i.c.l.c.CatchUpTask:97 - Data(10.228.72.71:9003, raftId=0): use 1 logs of [50000606, 50000607] to fix log inconsistency with node [Node(internalIp:10.228.72.135, metaPort:9003, nodeIdentifier:1621312527, dataPort:40010, clientPort:6667, clientIp:0.0.0.0)], local first index: 49998998 
2021-08-18 14:03:58,737 [DataClientThread-64] ERROR o.a.i.c.s.m.RaftMember:1571 - RuntimeException during executing org.apache.iotdb.db.qp.physical.sys.DeleteTimeSeriesPlan@65ef777e,term:1,index:50000606 
java.nio.BufferOverflowException: null
	at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
	at org.apache.iotdb.cluster.log.manage.serializable.SyncLogDequeSerializer.putLogs(SyncLogDequeSerializer.java:290)
	at org.apache.iotdb.cluster.log.manage.serializable.SyncLogDequeSerializer.append(SyncLogDequeSerializer.java:243)
	at org.apache.iotdb.cluster.log.manage.RaftLogManager.commitTo(RaftLogManager.java:627)
	at org.apache.iotdb.cluster.server.member.RaftMember.commitLog(RaftMember.java:1533)
	at org.apache.iotdb.cluster.server.member.RaftMember.appendLogInGroup(RaftMember.java:1699)
	at org.apache.iotdb.cluster.server.member.RaftMember.processPlanLocally(RaftMember.java:1040)
	at org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlanWithKnownLeader(DataGroupMember.java:753)
	at org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlan(DataGroupMember.java:715)
	at org.apache.iotdb.cluster.server.member.RaftMember.executeNonQueryPlan(RaftMember.java:765)
	at org.apache.iotdb.cluster.server.service.BaseSyncService.executeNonQueryPlan(BaseSyncService.java:176)
	at org.apache.iotdb.cluster.server.DataClusterServer.executeNonQueryPlan(DataClusterServer.java:1036)
	at org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:918)
	at org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:898)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
{code}

The root cause analysed here: https://github.com/apache/iotdb/discussions/3784#discussioncomment-1226380

> Raft log failed to be committed in cluster version
> --------------------------------------------------
>
>                 Key: IOTDB-1583
>                 URL: https://issues.apache.org/jira/browse/IOTDB-1583
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: Cluster
>    Affects Versions: master branch
>            Reporter: lisijia
>            Priority: Major
>
> In master 199519dd8d1497f4c640affc8989ad0777b15188, three nodes and three replications. And i have 20 strorage group,100000 devices,and each device has 50 sensors.After two hours of uninterrupted writing, I tried to write again, but the client write was rejected.I found that the server log is sending an error message. It seems that raftlog failed during the commit.
> {code:java}
> 2021-08-18 17:50:38,479 [DataClientThread-1100] ERROR o.a.i.c.l.m.RaftLogManager:648 - Node(internalIp: x.x.x.x, metaPort:9003, nodeIdentifier:1190416664, dataPort:40010, clientPort:6667, clientIp:0.0.0.0): Unexpected error:
> org.apache.iotdb.cluster.exception.TruncateCommittedEntryException: The committed entries cannot be truncated: parameter: 50000606, commitIndex : 50000606
>         at org.apache.iotdb.cluster.log.manage.CommittedEntryManager.append(CommittedEntryManager.java:246)
>         at org.apache.iotdb.cluster.log.manage.RaftLogManager.commitTo(RaftLogManager.java:625)
>         at org.apache.iotdb.cluster.server.member.RaftMember.commitLog(RaftMember.java:1533)
>         at org.apache.iotdb.cluster.server.member.RaftMember.appendLogInGroup(RaftMember.java:1699)
>         at org.apache.iotdb.cluster.server.member.RaftMember.processPlanLocally(RaftMember.java:1040)
>         at org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlanWithKnownLeader(DataGroupMember.java:753)
>         at org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlan(DataGroupMember.java:715)
>         at org.apache.iotdb.cluster.server.member.RaftMember.executeNonQueryPlan(RaftMember.java:765)
>         at org.apache.iotdb.cluster.server.service.BaseSyncService.executeNonQueryPlan(BaseSyncService.java:176)
>         at org.apache.iotdb.cluster.server.DataClusterServer.executeNonQueryPlan(DataClusterServer.java:1036)
>         at org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:918)
>         at org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:898)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)