You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/10/14 08:15:00 UTC

[jira] [Created] (IOTDB-4652) [ MultiLeaderConsensus ] The data on the replicas is inconsistent

刘珍 created IOTDB-4652:
-------------------------

             Summary: [ MultiLeaderConsensus ] The data on the replicas is inconsistent
                 Key: IOTDB-4652
                 URL: https://issues.apache.org/jira/browse/IOTDB-4652
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Jinrui Zhang
         Attachments: image-2022-10-14-16-04-28-847.png, image-2022-10-14-16-13-37-165.png

master_1013_00dc222
schema : ratis
data : multiLeader
3副本,3C3D
bm写入完成(显示全成功),flush。
查询数据,副本间数据不一致。
查询ip68(最后的状态:此region的leader),
./sbin/start-cli.sh -h 192.168.10.68 -e "select count(s_0) from root.test.g_13.d_1013"
少了6个点数据
 !image-2022-10-14-16-04-28-847.png! 

分析ip68/ip62/ip66 此root.test.g_13.d_1013设备的数据
ip68:94个点,少6个点
ip62:100个点,正确
ip66:100个点,正确

ip66做过leader(直接写入数据较少),ip66 往ip68同步此region的数据时,有ERROR:

2022-10-14 10:55:02,593 [pool-96-IoTDB-LogDispatcher-DataRegion[66]-2] ERROR o.a.i.c.m.l.LogDispatcher$LogDispatcherThread:415 - Can not sync logs to peer Peer{groupId=DataRegion[66], endpoint=TEndPoint(ip:192.168.10.68, port:40010)} because
java.io.IOException: Borrow client from pool for node TEndPoint(ip:192.168.10.68, port:40010) failed.
        at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:61)
        at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.sendBatchAsync(LogDispatcher.java:404)
        at org.apache.iotdb.consensus.multileader.logdispatcher.LogDispatcher$LogDispatcherThread.run(LogDispatcher.java:289)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object, borrowMaxWaitMillis=10000
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:453)
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:350)
        at org.apache.iotdb.commons.client.ClientManager.borrowClient(ClientManager.java:50)
        ... 7 common frames omitted

还需要注意ip66有个ratis 堆外内存检测到泄露的error
2022-10-14 10:39:26,022 [grpc-default-worker-ELG-3-40] ERROR o.a.r.t.i.n.u.ResourceLeakDetector:319 - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
        org.apache.ratis.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:401)
        org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
        org.apache.ratis.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
        org.apache.ratis.thirdparty.io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
        org.apache.ratis.thirdparty.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
        org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
        org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780)
        org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
        org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        java.lang.Thread.run(Thread.java:748)


测试环境
1. 192.168.10.62/66/68   物理机 72cpu 256GB
bm在ip64 配置见附件

ConfigNode 
MAX_HEAP_SIZE="16G"
MAX_DIRECT_MEMORY_SIZE="8G"
 schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
connection_timeout_ms=1200000

DataNode
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"

connection_timeout_ms=1200000
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=36000000
enable_auto_create_schema=false

2. bm写入
配置见附件

 !image-2022-10-14-16-13-37-165.png! 

3. 查询,验证数据正确性,分析结果,分析集群日志。




--
This message was sent by Atlassian Jira
(v8.20.10#820010)