You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/10/19 10:19:00 UTC

[jira] [Reopened] (IOTDB-4380) The log dispatcher is inconsistent with the search index of the wal node

     [ https://issues.apache.org/jira/browse/IOTDB-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

刘珍 reopened IOTDB-4380:
-----------------------

步骤4,启动ip74失败:
/data/mpp_test/m_1019_f2ffb49/datanode/logs

2022-10-19 18:00:32,314 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-1] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:80 - unexpected consensusGroupId DataRegion[5] for TSyncLogReq which size is 5
2022-10-19 18:00:34,900 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-2] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:80 - unexpected consensusGroupId DataRegion[4] for TSyncLogReq which size is 1
2022-10-19 18:00:37,609 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-3] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:80 - unexpected consensusGroupId DataRegion[3] for TSyncLogReq which size is 8
2022-10-19 18:10:00,112 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:00,337 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:00,763 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:01,589 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:06,303 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-6] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:06,305 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:06,532 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:06,962 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:07,960 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:09,530 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:09,582 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-43] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:11,740 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:11,962 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:12,510 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:12,934 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:13,334 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:14,960 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:16,042 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:18,185 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-44] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:18,837 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:19,062 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:19,357 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:19,627 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:20,450 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:22,073 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:24,612 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.
2022-10-19 18:10:25,296 [pool-23-IoTDB-MultiLeaderConsensusRPC-Processor-45] ERROR o.a.i.c.m.s.MultiLeaderRPCServiceProcessor:88 - fail to sync log because system is read-only.


> The log dispatcher is inconsistent with the search index of the wal node
> ------------------------------------------------------------------------
>
>                 Key: IOTDB-4380
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4380
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: 张洪胤
>            Priority: Major
>         Attachments: ip74_logs.tar.gz, more_metadata.conf
>
>
> m_0908_7915b3f。
> 问题描述
> {color:#DE350B}*log dispatcher与wal node的search index不一致 , datanode重启成功后日志一直刷*{color}:
> 2022-09-09 16:32:00,011 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. {color:#DE350B}*Current search index in wal buffer is 2959, and next target index is 2501 *{color}
> MultiLeaderConsensus,3副本3节点
> 1. 创建元数据过程中,kill ip74 的datanode PID
> benchmark配置文件见附件。
> 2. 清空ip74 的操作系统缓存,启动ip74的datanode
> 3. 再次重新运行benchmark同一配置,IS_DELETE_DATA=true
> 这个参数为true,会先执行delete storage group root.test.*;
> benchmark运行完成,stop ip74的datanode服务
> 备份data 为/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test
> 4. 清ip74操作系统缓存,启动datanode服务
> 再次运行benchmark同一配置,benchmark运行完成,
> 查看ip74的日志,看到
> 2022-09-09 15:43:13,691 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-40] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
> org.apache.thrift.TException: Source fragment instance not found. Fragment instance ID: TFragmentInstanceId(queryId:20220909_074205_19400_3, fragmentId:2, instanceId:0).
>         at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:90)
>         at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
>         at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 2022-09-09 15:43:15,312 [20220909_074205_19400_3.2.0.SinkHandle-3074] ERROR o.a.i.d.m.e.e.SinkHandle:281 - The TsBlock doesn't exist. Sequence ID is 1, remaining map is [0=<org.apache.iotdb.tsfile.read.common.block.TsBlock@5f617979,1048576>]
> 2022-09-09 15:43:17,119 [pool-23-IoTDB-MPPDataExchangeRPC-Processor-22] ERROR o.a.t.ProcessFunction:47 - Internal error processing getDataBlock
> java.lang.IllegalStateException: The data block doesn't exist. Sequence ID: 1
>         at org.apache.iotdb.db.mpp.execution.exchange.SinkHandle.getSerializedTsBlock(SinkHandle.java:285)
>         at org.apache.iotdb.db.mpp.execution.exchange.MPPDataExchangeManager$MPPDataExchangeServiceImpl.getDataBlock(MPPDataExchangeManager.java:97)
>         at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:326)
>         at org.apache.iotdb.mpp.rpc.thrift.MPPDataExchangeService$Processor$getDataBlock.getResult(MPPDataExchangeService.java:306)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 5. 停止ip74的datanode服务
> 备份data 到/data/mpp_test/m_0908_7915b3f/datanode/data_for_recovery_Test_2
> 清ip74操作系统缓存,启动ip74的datanode ,可以成功,日志一直刷(此节点不能继续同步):
> 2022-09-09 16:44:00,039 [pool-33-IoTDB-LogDispatcher-DataRegion[12]-2] INFO  o.a.i.d.w.n.WALNode$PlanNodeIterator:695 - timeout when waiting for next WAL entry ready, execute rollWALFile. Current search index in wal buffer is 2959, and next target index is 2501 
> 机器 与 集群配置
> 1. 192.168.10.72/ 73/74 48核384G
> benchmark 在71
> 2. 集群参数
> confignode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> datanode
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> max_connection_for_internal_service=300
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=600000
> seq_memtable_flush_check_interval_in_ms=300000
> enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=600000
> unseq_memtable_flush_check_interval_in_ms=300000
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=3600000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)