You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/11/15 08:25:00 UTC

[jira] [Assigned] (IOTDB-4942) [ remove datanode ] org.apache.iotdb.commons.exception.ConfigurationException: Conflict is detected in directory

     [ https://issues.apache.org/jira/browse/IOTDB-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

刘珍 reassigned IOTDB-4942:
-------------------------

           Attachment: iotdb_4936.conf
               Sprint: 2022-11-Cluster
    Affects Version/s: 0.14.0-SNAPSHOT
             Assignee: Gaofei Cao
          Description: 
master_1115_09ab7fa
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
1. Start 3rep3C5D  cluster
2. benchmark write date
3. After 40mins, ip72 execute remove its datanode
sleep 40m
./sbin/start-cli.sh -h 192.168.10.72 -e "show cluster" >> bef_rm.out
./sbin/start-cli.sh -h 192.168.10.72 -e "show regions" >> bef_rm.out
./sbin/remove-datanode.sh  192.168.10.72:6667 > 4556_rm_ip72.out

ip72 datanode ERROR :
{color:red}2022-11-15 15:57:30,628 [main] ERROR o.a.i.d.s.DataNodeServerCommandLine:79 - Meet error when doing start checking 
org.apache.iotdb.commons.exception.ConfigurationException: Conflict is detected in directory /data/liuzhen_test/master_1115_2_09ab7fa/./sbin/../data/datanode/data, which may be being used by another IoTDB (ProcessId=40012). Please check configuration and restart.{color}
	at org.apache.iotdb.db.conf.directories.DirectoryChecker.registerDirectory(DirectoryChecker.java:77)
	at org.apache.iotdb.db.conf.IoTDBStartCheck.checkDirectory(IoTDBStartCheck.java:226)
	at org.apache.iotdb.db.service.DataNode.serverCheckAndInit(DataNode.java:134)
	at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:77)
	at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
	at org.apache.iotdb.db.service.DataNode.main(DataNode.java:128)

TEST ENV
测试环境
1. 启动1副本3C5D 集群 (62、66、68 : 72CPU256GB,72和73 :48CPU 384GB )
3C :192.168.10.62/66/68
5D : 192.168.10.62/66/68/72/73
ConfigNode :
MAX_HEAP_SIZE="8G"

Common :
max_connection_for_internal_service=200
query_timeout_threshold=3600000
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
connection_timeout_ms=120000
max_waiting_time_when_insert_blocked=600000
ratis_first_election_timeout_min_ms=3000
ratis_first_election_timeout_max_ms=4000


DataNode :
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"

2. benchmark在ip71
/home/liuzhen/benchmark/bm_0620_7ec96c1
配置文件见附件

3. bm运行40分钟,执行缩容ip72的datanode , 缩容失败。

  was:
master_1115_09ab7fa
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
1. Start 3rep3C5D  cluster
2. benchmark write date
3. After 40mins, ip72 execute remove its datanode
sleep 40m
./sbin/start-cli.sh -h 192.168.10.72 -e "show cluster" >> bef_rm.out
./sbin/start-cli.sh -h 192.168.10.72 -e "show regions" >> bef_rm.out
./sbin/remove-datanode.sh  192.168.10.72:6667 > 4556_rm_ip72.out

ip72 datanode ERROR :
{color:red}2022-11-15 15:57:30,628 [main] ERROR o.a.i.d.s.DataNodeServerCommandLine:79 - Meet error when doing start checking 
org.apache.iotdb.commons.exception.ConfigurationException: Conflict is detected in directory /data/liuzhen_test/master_1115_2_09ab7fa/./sbin/../data/datanode/data, which may be being used by another IoTDB (ProcessId=40012). Please check configuration and restart.{color}
	at org.apache.iotdb.db.conf.directories.DirectoryChecker.registerDirectory(DirectoryChecker.java:77)
	at org.apache.iotdb.db.conf.IoTDBStartCheck.checkDirectory(IoTDBStartCheck.java:226)
	at org.apache.iotdb.db.service.DataNode.serverCheckAndInit(DataNode.java:134)
	at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:77)
	at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
	at org.apache.iotdb.db.service.DataNode.main(DataNode.java:128)

TEST ENV
测试环境
1. 启动1副本3C5D 集群 (62、66、68 : 72CPU256GB,72和73 :48CPU 384GB )
3C :192.168.10.62/66/68
5D : 192.168.10.62/66/68/72/73
ConfigNode :
MAX_HEAP_SIZE="8G"

Common :
max_connection_for_internal_service=200
query_timeout_threshold=3600000
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
connection_timeout_ms=120000
max_waiting_time_when_insert_blocked=600000
ratis_first_election_timeout_min_ms=3000
ratis_first_election_timeout_max_ms=4000


DataNode :
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"

2. benchmark在ip71
/home/liuzhen/benchmark/bm_0620_7ec96c1
配置文件见附件

3. bm运行40分钟,执行缩容ip72的datanode


> [ remove datanode ] org.apache.iotdb.commons.exception.ConfigurationException: Conflict is detected in directory
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4942
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4942
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Gaofei Cao
>            Priority: Major
>         Attachments: iotdb_4936.conf
>
>
> master_1115_09ab7fa
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> 1. Start 3rep3C5D  cluster
> 2. benchmark write date
> 3. After 40mins, ip72 execute remove its datanode
> sleep 40m
> ./sbin/start-cli.sh -h 192.168.10.72 -e "show cluster" >> bef_rm.out
> ./sbin/start-cli.sh -h 192.168.10.72 -e "show regions" >> bef_rm.out
> ./sbin/remove-datanode.sh  192.168.10.72:6667 > 4556_rm_ip72.out
> ip72 datanode ERROR :
> {color:red}2022-11-15 15:57:30,628 [main] ERROR o.a.i.d.s.DataNodeServerCommandLine:79 - Meet error when doing start checking 
> org.apache.iotdb.commons.exception.ConfigurationException: Conflict is detected in directory /data/liuzhen_test/master_1115_2_09ab7fa/./sbin/../data/datanode/data, which may be being used by another IoTDB (ProcessId=40012). Please check configuration and restart.{color}
> 	at org.apache.iotdb.db.conf.directories.DirectoryChecker.registerDirectory(DirectoryChecker.java:77)
> 	at org.apache.iotdb.db.conf.IoTDBStartCheck.checkDirectory(IoTDBStartCheck.java:226)
> 	at org.apache.iotdb.db.service.DataNode.serverCheckAndInit(DataNode.java:134)
> 	at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:77)
> 	at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:58)
> 	at org.apache.iotdb.db.service.DataNode.main(DataNode.java:128)
> TEST ENV
> 测试环境
> 1. 启动1副本3C5D 集群 (62、66、68 : 72CPU256GB,72和73 :48CPU 384GB )
> 3C :192.168.10.62/66/68
> 5D : 192.168.10.62/66/68/72/73
> ConfigNode :
> MAX_HEAP_SIZE="8G"
> Common :
> max_connection_for_internal_service=200
> query_timeout_threshold=3600000
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=120000
> max_waiting_time_when_insert_blocked=600000
> ratis_first_election_timeout_min_ms=3000
> ratis_first_election_timeout_max_ms=4000
> DataNode :
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> 2. benchmark在ip71
> /home/liuzhen/benchmark/bm_0620_7ec96c1
> 配置文件见附件
> 3. bm运行40分钟,执行缩容ip72的datanode , 缩容失败。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)