You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/11/28 01:59:00 UTC
[jira] [Created] (IOTDB-5063) [ start datanode ] Failed to start Grpc server
刘珍 created IOTDB-5063:
-------------------------
Summary: [ start datanode ] Failed to start Grpc server
Key: IOTDB-5063
URL: https://issues.apache.org/jira/browse/IOTDB-5063
Project: Apache IoTDB
Issue Type: Bug
Components: mpp-cluster
Affects Versions: 0.14.0-SNAPSHOT
Reporter: 刘珍
Assignee: Jinrui Zhang
Attachments: screenshot-1.png
master : 1127_4d7c15d
1. 启动3ConfigNode
2. 启动21DataNode,总是有1个datanode启动失败({color:#DE350B}复现3次{color}均能复现),报错信息有2种:
报错1 (出现2次):
2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 - Terminating with exit status 1: Failed to start Grpc server
java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010
at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328)
at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183)
at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92)
at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
at org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
at org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387)
at org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156)
at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319)
at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132)
Caused by: org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use
2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 - Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an uncaught exception
java.lang.NullPointerException: null
at org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60)
查看这个节点的datanode进程的端口信息:
!image-2022-11-28-09-50-45-338.png!
报错2(出现1次):
!image-2022-11-28-09-51-17-256.png!
查看这个节点的datanode进程的端口信息:
!image-2022-11-28-09-51-40-357.png!
启动成功的datanode的端口信息:
!image-2022-11-28-09-51-57-453.png!
测试环境-私有云1期 , 8C32GB ,24台机器
1. ConfigNode配置
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"
2. DataNode配置
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"
3. Common配置
schema_replication_factor=3
data_replication_factor=3
4.启动3ConfigNode (ip23,24,25)
5.启动21DataNode ,启动脚本(21个Datanode的启动命令,间隔1秒)
[root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh
#!/bin/bash
cluster_dir="/data/iotdb"
cur_cluster="m_1127_4d7c15d"
u_name="root"
exec 3<datanode.txt
while read line <&3
do
ssh ${u_name}@${line} "source /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null 2>&1 &"
sleep 1
done
6.查看集群信息,总是有1个datanode 是Unknown,去这个节点查看log
!image-2022-11-28-09-56-10-962.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)