You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/11/28 01:59:00 UTC

[jira] [Created] (IOTDB-5063) [ start datanode ] Failed to start Grpc server

刘珍 created IOTDB-5063:
-------------------------

             Summary: [ start datanode ] Failed to start Grpc server
                 Key: IOTDB-5063
                 URL: https://issues.apache.org/jira/browse/IOTDB-5063
             Project: Apache IoTDB
          Issue Type: Bug
          Components: mpp-cluster
    Affects Versions: 0.14.0-SNAPSHOT
            Reporter: 刘珍
            Assignee: Jinrui Zhang
         Attachments: screenshot-1.png

master : 1127_4d7c15d
1. 启动3ConfigNode
2. 启动21DataNode,总是有1个datanode启动失败({color:#DE350B}复现3次{color}均能复现),报错信息有2种:
报错1 (出现2次):
2022-11-28 09:44:11,906 [main] ERROR o.a.ratis.util.ExitUtils:133 - Terminating with exit status 1: Failed to start Grpc server
java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50010
        at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:328)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:183)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:92)
        at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:266)
        at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
        at org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
        at org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:394)
        at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
        at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:387)
        at org.apache.iotdb.consensus.ratis.RatisConsensus.start(RatisConsensus.java:156)
        at org.apache.iotdb.db.service.DataNode.active(DataNode.java:319)
        at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:162)
        at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:95)
        at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
        at org.apache.iotdb.db.service.DataNode.main(DataNode.java:132)
Caused by: org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use
2022-11-28 09:44:11,910 [Thread-0] ERROR o.a.ratis.util.ExitUtils:133 - Terminating with exit status -1: Thread[Thread-0,5,main] has thrown an uncaught exception
java.lang.NullPointerException: null
        at org.apache.iotdb.db.service.IoTDBShutdownHook.run(IoTDBShutdownHook.java:60)

查看这个节点的datanode进程的端口信息:
 !image-2022-11-28-09-50-45-338.png! 

报错2(出现1次):
 !image-2022-11-28-09-51-17-256.png! 
查看这个节点的datanode进程的端口信息:
 !image-2022-11-28-09-51-40-357.png! 

启动成功的datanode的端口信息:
 !image-2022-11-28-09-51-57-453.png! 

测试环境-私有云1期  , 8C32GB  ,24台机器
1. ConfigNode配置
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"

2. DataNode配置
MAX_HEAP_SIZE="20G"
MAX_DIRECT_MEMORY_SIZE="6G"

3. Common配置
schema_replication_factor=3
data_replication_factor=3

4.启动3ConfigNode (ip23,24,25)

5.启动21DataNode ,启动脚本(21个Datanode的启动命令,间隔1秒)
[root@i-66xazbht deploy_mpp_scripts]# cat 4_start_data_node.sh
#!/bin/bash

cluster_dir="/data/iotdb"
cur_cluster="m_1127_4d7c15d"
u_name="root"

exec 3<datanode.txt
while read line <&3
do
ssh ${u_name}@${line} "source /etc/profile;${cluster_dir}/${cur_cluster}/sbin/start-datanode.sh > /dev/null 2>&1 &"
sleep 1
done

6.查看集群信息,总是有1个datanode 是Unknown,去这个节点查看log
 !image-2022-11-28-09-56-10-962.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)