You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/10/14 08:15:00 UTC
[jira] [Created] (IOTDB-4652) [ MultiLeaderConsensus ] The data on the replicas is inconsistent
刘珍 created IOTDB-4652:
-------------------------
Summary: [ MultiLeaderConsensus ] The data on the replicas is inconsistent
Key: IOTDB-4652
URL: https://issues.apache.org/jira/browse/IOTDB-4652
Project: Apache IoTDB
Issue Type: Bug
Components: mpp-cluster
Affects Versions: 0.14.0-SNAPSHOT
Reporter: 刘珍
Assignee: Jinrui Zhang
Attachments: image-2022-10-14-16-04-28-847.png, image-2022-10-14-16-13-37-165.png
master_1013_00dc222
schema : ratis
data : multiLeader
3副本,3C3D
bm写入完成(显示全成功),flush。
查询数据,副本间数据不一致。
查询ip68(最后的状态:此region的leader),
./sbin/start-cli.sh -h 192.168.10.68 -e "select count(s_0) from root.test.g_13.d_1013"
少了6个点数据
!image-2022-10-14-16-04-28-847.png!
分析ip68/ip62/ip66 此root.test.g_13.d_1013设备的数据
ip68:94个点,少6个点
ip62:100个点,正确
ip66:100个点,正确
ip66做过leader(直接写入数据较少),ip66 往ip68同步此region的数据时,有ERROR:
2022-10-14 10:55:02,593 [pool-96-IoTDB-LogDispatcher-DataRegion[66]-2] ERROR o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:415 - Can not sync logs to peer Peer{groupId=DataRegion[66], endpoint=TEndPoint(ip:192.168.10.68, port:40010)} because
java.io.IOException: Borrow client from pool for node TEndPoint(ip:192.168.10.68, port:40010) failed.
at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61)
at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.sendBatchAsync(LogDispatcher.java:404)
at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:289)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object, borrowMaxWaitMillis=10000
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:453)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
... 7 common frames omitted
还需要注意ip66有个ratis 堆外内存检测到泄露的error
2022-10-14 10:39:26,022 [grpc-default-worker-ELG-3-40] ERROR o.a.r.t.i.n.u.ResourceLeakDetector:319 - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:401)
org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
org.apache.ratis.thirdparty.io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
org.apache.ratis.thirdparty.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780)
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:748)
测试环境
1. 192.168.10.62/66/68 物理机 72cpu 256GB
bm在ip64 配置见附件
ConfigNode
MAX_HEAP_SIZE="16G"
MAX_DIRECT_MEMORY_SIZE="8G"
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
connection_timeout_ms=1200000
DataNode
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"
connection_timeout_ms=1200000
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=36000000
enable_auto_create_schema=false
2. bm写入
配置见附件
!image-2022-10-14-16-13-37-165.png!
3. 查询,验证数据正确性,分析结果,分析集群日志。
--
This message was sent by Atlassian Jira
(v8.20.10#820010)