You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/10/17 05:37:00 UTC

[jira] [Reopened] (IOTDB-4334) No disk space, no load balancing to new datanode

     [ https://issues.apache.org/jira/browse/IOTDB-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

刘珍 reopened IOTDB-4334:
-----------------------

master_1013_00dc222 , 问题还存在。
3副本3C3D,干净环境,bm写入过程中,手动模拟ip3的磁盘满,
此节点read-only,kill bm进程,再启动新的写入,device_name不同,创建新的region失败,confignode的报错信息
2022-10-17 11:29:41,112 [pool-4-IoTDB-ConfigNodeRPC-Processor-4] ERROR o.a.i.c.m.p.PartitionManager:298 - There are no available RegionGroups currently, please check the status of cluster DataNodes

第2个bm配置见附件

> No disk space, no load balancing to new datanode
> ------------------------------------------------
>
>                 Key: IOTDB-4334
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4334
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Yongzao Dan
>            Priority: Major
>         Attachments: cf_partition.conf, image-2022-09-05-16-46-15-829.png, image-2022-09-05-16-51-07-405.png, image-2022-09-05-16-52-07-143.png, image-2022-09-05-16-54-09-763.png, ip3_log.tar.gz, screenshot-1.png
>
>
> master_0904_2db66c6
> ConfigNode开启时间分区
>  !image-2022-09-05-16-46-15-829.png! 
> 启动3副本3C3D干净集群,启动benchmark写入数据,元数据创建完成,写入一些数据后,增加2个datanode。
> ip3(follower) 磁盘满,{color:#DE350B}*并没有路由新分区到新datanode,没有负载均衡*{color}。
> {color:#DE350B}*20分钟后,ip5(leader)写入停止(multiLeader的限流,wal 51GB)*{color}(客户端bm并没有停止写入)
> ip3 报错:
> 2022-09-05 16:30:04,560 [pool-29-IoTDB-WAL-Sync(node-root.test.g0_0-1)-1] ERROR o.a.i.d.w.b.WALBuffer$SyncBufferTask:427 - Fail to sync wal node-root.test.g0_0-1's buffer, change system mode to error.
> java.io.IOException: No space left on device
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>         at org.apache.iotdb.db.wal.io.LogWriter.write(LogWriter.java:58)
>         at org.apache.iotdb.db.wal.io.WALWriter.write(WALWriter.java:50)
>         at org.apache.iotdb.db.wal.buffer.WALBuffer$SyncBufferTask.run(WALBuffer.java:425)
>         at org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44)
>         at org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-09-05 16:30:04,637 [pool-19-IoTDB-MultiLeaderConsensusRPC-Client-15] ERROR o.a.i.c.m.t.MultiLeaderConsensusIService$AsyncProcessor$syncLog$1:215 - Exception inside handler
> org.apache.iotdb.commons.exception.IoTDBException: Fail to sync log because system is read-only.
>         at org.apache.iotdb.consensus.multileader.service.MultiLeaderRPCServiceProcessor.syncLog(MultiLeaderRPCServiceProcessor.java:76)
>         at org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:234)
>         at org.apache.iotdb.consensus.multileader.thrift.MultiLeaderConsensusIService$AsyncProcessor$syncLog.start(MultiLeaderConsensusIService.java:177)
>         at org.apache.thrift.TBaseAsyncProcessor.process(TBaseAsyncProcessor.java:103)
>         at org.apache.thrift.server.AbstractNonblockingServer$AsyncFrameBuffer.invoke(AbstractNonblockingServer.java:603)
>         at org.apache.thrift.server.Invocation.run(Invocation.java:18)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 测试流程:
> 1. 3C3D 192.168.130.3/4/5   16核32G
> 2. benchmark在ip2,配置文件见附件
> /home/benchmark/bm_0620_7ec96c1
> 元数据创建完成,开始写入数据。
> 集群和regions信息:
>  !image-2022-09-05-16-51-07-405.png! 
> 3. 增加datanode ip1和ip2
> 集群和regions信息
>  !image-2022-09-05-16-52-07-143.png! 
> 4. 往ip3机器,数据所在磁盘,复制一些数据,让ip3的磁盘空间满。
> ip3 的磁盘满,节点read-only(报错信息见问题描述),详细日志见附件。
> 集群并没有负载均衡,集群和region信息:
>  !image-2022-09-05-16-54-09-763.png! 
> ip5(leader) wal大小达到限流 阈值:
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)