You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "Yongzao Dan (Jira)" <ji...@apache.org> on 2023/01/10 10:33:00 UTC
[jira] [Commented] (IOTDB-4904) [ ConfigNode ] When dynamically extending DataNode resources online, you need to optimize schemaregion allocation policies

    [ https://issues.apache.org/jira/browse/IOTDB-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656563#comment-17656563 ] 

Yongzao Dan commented on IOTDB-4904:
------------------------------------

We can't fix this bug currently. 
**

*Reason*

I notice that in benchmark configuration each benchmark will create 3k devices, which means each benchmark will create 3k SchemaPartition. However, the total number of SeriesPartitionSlot is only 10k. At this dynamical extension scenario, the first 3 DataNodes will storage all 3k SchemaPartitions that created by the first 1 benchmark. And in the next step, we have 6 DataNodes but the SchemaRegionGroup won't be extended unless the number of created SchemaPartitions reachs 5k(10000/(6/3)). Therefore, the earier a DataNode be registered the more SchemaPartitions it will take.
**

*Solution*

To solve this unbalnced scenario, we should support load balancing function in Partition level. i.e. We should have the ability to migrate Partitions in different RegionGroups. But this isn't our first priority mission.

> [ ConfigNode ] When dynamically extending DataNode resources online, you need to optimize schemaregion allocation policies
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4904
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4904
>             Project: Apache IoTDB
>          Issue Type: Improvement
>          Components: mpp-cluster
>            Reporter: 刘珍
>            Assignee: Yongzao Dan
>            Priority: Major
>         Attachments: all_online.sh, image-2022-11-10-15-06-01-931.png, image-2022-11-10-15-07-15-997.png, ip26.conf, ip27.conf, ip28.conf, ip29.conf, ip30.conf, ip31.conf, ip32.conf, online_exp_datanode.sh
>
>
> m_1109_87a416e, 3副本
> 1. 启动3C3D1Benchmark，写入数据，1小时。
> 2.集群在线扩展3DataNode，再启动1Benchmark，写入数据，1小时。
> ..
> 直至DataNode 扩展至21个，客户端为7Benchmark，会出现创建元数据失败（对照：3C21D 1次全部启动，顺序间隔2s启动7Benchmark，元数据创建成功），因为schema region的分配策略不均衡，创建元数据报错日志：
> 2022-11-10 11:24:43,533 [3@group-000200000000-StateMachineUpdater] ERROR o.a.i.d.m.v.SchemaExecutionVisitor:184 - IoTDB: MetaData error:
> org.apache.iotdb.db.exception.metadata.SeriesOverflowException: There are too many timeseries in memory, please increase MAX_HEAP_SIZE in datanode-env.sh/bat, restart and create timeseries again.
>         at org.apache.iotdb.db.metadata.schemaregion.SchemaRegionMemoryImpl.createTimeseries(SchemaRegionMemoryImpl.java:575)
>         at org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.executeInternalCreateTimeseries(SchemaExecutionVisitor.java:176)
>         at org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.visitInternalCreateTimeSeries(SchemaExecutionVisitor.java:150)
>         at org.apache.iotdb.db.metadata.visitor.SchemaExecutionVisitor.visitInternalCreateTimeSeries(SchemaExecutionVisitor.java:64)
>         at org.apache.iotdb.db.mpp.plan.planner.plan.node.metedata.write.InternalCreateTimeSeriesNode.accept(InternalCreateTimeSeriesNode.java:105)
>         at org.apache.iotdb.db.consensus.statemachine.SchemaRegionStateMachine.write(SchemaRegionStateMachine.java:73)
>         at org.apache.iotdb.consensus.ratis.ApplicationStateMachineProxy.applyTransaction(ApplicationStateMachineProxy.java:137)
>         at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1672)
>         at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
>         at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 在线扩展DataNode，最后状态的SchemaRegion ：
>  !image-2022-11-10-15-06-01-931.png! 
> 不发生扩展，1次全部启动所有节点，7Benchmark运行完成的SchemaRegion ：
>  !image-2022-11-10-15-07-15-997.png! 
> 测试环境，私有云1期
> 172.16.2.2 ~ 32 
> benchmark配置文件见附件ip*.conf
> 在线扩展脚本见online_exp_datanode.sh
> 不扩展脚本见all_online.sh



--
This message was sent by Atlassian Jira
(v8.20.10#820010)