You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/05/17 03:18:26 UTC

[GitHub] [incubator-pegasus] foreverneverer opened a new pull request, #969: feat(admin-cli): support nodes capacity balance using admin-cli

foreverneverer opened a new pull request, #969:
URL: https://github.com/apache/incubator-pegasus/pull/969

   # Related-Issue
   https://github.com/apache/incubator-pegasus/issues/962
   
   # Problem
   服务端的节点容量默认需要每10分钟更新一次,所以当迁移完成一个分片后,工具并不能立即获取最新的容量分布,请使用该工具的服务端调整服务端的节点容量更新周期,以加快均衡速度。相关配置如下:
   ```diff
   - disk_stat_interval_seconds = 600
   + - disk_stat_interval_seconds = 60
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on a diff in pull request #969: feat(admin-cli): support nodes capacity balance using admin-cli

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on code in PR #969:
URL: https://github.com/apache/incubator-pegasus/pull/969#discussion_r884374617


##########
admin-cli/executor/toolkits/nodesbalancer/balancer.go:
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/session"
+	"github.com/apache/incubator-pegasus/admin-cli/executor"
+	"github.com/apache/incubator-pegasus/admin-cli/executor/toolkits"
+)
+
+// By default, the node capacity of the server needs to be updated every 10 minutes.

Review Comment:
   The config hasn't be supported query, so we just add hint for user



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on a diff in pull request #969: feat(admin-cli): support nodes capacity balance using admin-cli

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on code in PR #969:
URL: https://github.com/apache/incubator-pegasus/pull/969#discussion_r884373974


##########
admin-cli/executor/toolkits/diskbalancer/migrator.go:
##########
@@ -68,7 +68,7 @@ func changeDiskCleanerInterval(client *executor.Client, replicaServer string, cl
 }
 
 func getNextMigrateAction(client *executor.Client, replicaServer string, minSize int64) (*MigrateAction, error) {
-	disks, totalUsage, totalCapacity, err := queryDiskCapacityInfo(client, replicaServer)
+	disks, totalUsage, totalCapacity, err := QueryDiskCapacityInfo(client, replicaServer)

Review Comment:
   uppercase is means `public` in go, which can be used for other file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on a diff in pull request #969: feat(admin-cli): support nodes capacity balance using admin-cli

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on code in PR #969:
URL: https://github.com/apache/incubator-pegasus/pull/969#discussion_r884376276


##########
admin-cli/executor/toolkits/nodesbalancer/balancer.go:
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/session"
+	"github.com/apache/incubator-pegasus/admin-cli/executor"
+	"github.com/apache/incubator-pegasus/admin-cli/executor/toolkits"
+)
+
+// By default, the node capacity of the server needs to be updated every 10 minutes.
+// Therefore, after a partition is migrated completed, the tool cannot immediately
+// obtain the latest capacity distribution. Please adjust the node capacity update
+// interval of the server to speed up the equalization speed. Relevant configurations
+// are as follows:
+//
+//- disk_stat_interval_seconds = 600
+//+ disk_stat_interval_seconds = 60 # or less
+//
+//- gc_memory_replica_interval_ms = 600000
+//+ gc_memory_replica_interval_ms = 60000 # or less
+
+func BalanceNodeCapacity(client *executor.Client, auto bool) error {
+	err := initClusterEnv(client)
+	if err != nil {
+		return err
+	}
+
+	balancer := &Migrator{}
+	for {
+		err := balancer.updateNodesLoad(client)
+		if err != nil {
+			toolkits.LogInfo(fmt.Sprintf("retry update load, err = %s", err.Error()))
+			time.Sleep(time.Second * 10)
+			continue
+		}
+
+		action, err := balancer.selectNextAction(client)
+		if err != nil {
+			return err
+		}
+
+		err = client.Meta.Balance(action.replica.Gpid, action.replica.Status, action.from.Node, action.to.Node)
+		if err != nil {
+			return fmt.Errorf("migrate action[%s] now is invalid: %s", action.toString(), err.Error())
+		}
+		err = waitCompleted(client, action)
+		if err != nil {
+			return fmt.Errorf("wait replica migrate err: %s", err.Error())
+		}
+		if !auto {
+			break
+		}
+		time.Sleep(time.Second * 10)
+	}
+	return nil
+}
+
+func initClusterEnv(client *executor.Client) error {
+	toolkits.LogWarn("This cluster will be balanced based capacity, please don't open count-balance in later")
+	time.Sleep(time.Second * 3)
+
+	// set meta level as steady
+	err := executor.SetMetaLevel(client, "steady")
+	if err != nil {
+		return err
+	}
+	// disable migrate replica base `lively`
+	toolkits.LogInfo("set meta.lb.only_move_primary true")
+	err = executor.RemoteCommand(client, session.NodeTypeMeta, "", "meta.lb.only_move_primary", []string{"true"})
+	if err != nil {
+		return err
+	}
+	toolkits.LogInfo("set meta.lb.only_primary_balancer true")
+	err = executor.RemoteCommand(client, session.NodeTypeMeta, "", "meta.lb.only_primary_balancer", []string{"true"})
+	if err != nil {
+		return err
+	}
+	// reset garbage replica clear interval
+	toolkits.LogInfo("set gc_disk_error_replica_interval_seconds 10")

Review Comment:
   Yeah, I will recover it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on a diff in pull request #969: feat(admin-cli): support nodes capacity balance using admin-cli

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on code in PR #969:
URL: https://github.com/apache/incubator-pegasus/pull/969#discussion_r883222683


##########
admin-cli/cmd/nodes_balancer.go:
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package cmd
+
+import (
+	"github.com/apache/incubator-pegasus/admin-cli/executor/toolkits/nodesbalancer"
+	"github.com/apache/incubator-pegasus/admin-cli/shell"
+	"github.com/desertbit/grumble"
+)
+
+func init() {
+	shell.AddCommand(&grumble.Command{
+		Name: "nodes-balancer",
+		Help: "balance nodes capacity",

Review Comment:
   Could you add more detail description of this command? Frankly speaking, I didn't get what it do 



##########
admin-cli/executor/toolkits/nodesbalancer/balancer.go:
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/session"
+	"github.com/apache/incubator-pegasus/admin-cli/executor"
+	"github.com/apache/incubator-pegasus/admin-cli/executor/toolkits"
+)
+
+// By default, the node capacity of the server needs to be updated every 10 minutes.

Review Comment:
   I think we'd better show this hint message to help/usage. Common user don't know the comments in code.
   
   Or we can get the config from server first, then give helpful hint message when use it.



##########
admin-cli/executor/toolkits/nodesbalancer/balancer.go:
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/session"
+	"github.com/apache/incubator-pegasus/admin-cli/executor"
+	"github.com/apache/incubator-pegasus/admin-cli/executor/toolkits"
+)
+
+// By default, the node capacity of the server needs to be updated every 10 minutes.
+// Therefore, after a partition is migrated completed, the tool cannot immediately
+// obtain the latest capacity distribution. Please adjust the node capacity update
+// interval of the server to speed up the equalization speed. Relevant configurations
+// are as follows:
+//
+//- disk_stat_interval_seconds = 600
+//+ disk_stat_interval_seconds = 60 # or less
+//
+//- gc_memory_replica_interval_ms = 600000
+//+ gc_memory_replica_interval_ms = 60000 # or less
+
+func BalanceNodeCapacity(client *executor.Client, auto bool) error {
+	err := initClusterEnv(client)
+	if err != nil {
+		return err
+	}
+
+	balancer := &Migrator{}
+	for {
+		err := balancer.updateNodesLoad(client)
+		if err != nil {
+			toolkits.LogInfo(fmt.Sprintf("retry update load, err = %s", err.Error()))
+			time.Sleep(time.Second * 10)
+			continue
+		}
+
+		action, err := balancer.selectNextAction(client)
+		if err != nil {
+			return err
+		}
+
+		err = client.Meta.Balance(action.replica.Gpid, action.replica.Status, action.from.Node, action.to.Node)
+		if err != nil {
+			return fmt.Errorf("migrate action[%s] now is invalid: %s", action.toString(), err.Error())
+		}
+		err = waitCompleted(client, action)
+		if err != nil {
+			return fmt.Errorf("wait replica migrate err: %s", err.Error())
+		}
+		if !auto {
+			break
+		}
+		time.Sleep(time.Second * 10)
+	}
+	return nil
+}
+
+func initClusterEnv(client *executor.Client) error {
+	toolkits.LogWarn("This cluster will be balanced based capacity, please don't open count-balance in later")
+	time.Sleep(time.Second * 3)
+
+	// set meta level as steady
+	err := executor.SetMetaLevel(client, "steady")
+	if err != nil {
+		return err
+	}
+	// disable migrate replica base `lively`
+	toolkits.LogInfo("set meta.lb.only_move_primary true")
+	err = executor.RemoteCommand(client, session.NodeTypeMeta, "", "meta.lb.only_move_primary", []string{"true"})
+	if err != nil {
+		return err
+	}
+	toolkits.LogInfo("set meta.lb.only_primary_balancer true")
+	err = executor.RemoteCommand(client, session.NodeTypeMeta, "", "meta.lb.only_primary_balancer", []string{"true"})
+	if err != nil {
+		return err
+	}
+	// reset garbage replica clear interval
+	toolkits.LogInfo("set gc_disk_error_replica_interval_seconds 10")

Review Comment:
   Should we restore these configs after this rebalance operation?



##########
admin-cli/executor/toolkits/diskbalancer/migrator.go:
##########
@@ -68,7 +68,7 @@ func changeDiskCleanerInterval(client *executor.Client, replicaServer string, cl
 }
 
 func getNextMigrateAction(client *executor.Client, replicaServer string, minSize int64) (*MigrateAction, error) {
-	disks, totalUsage, totalCapacity, err := queryDiskCapacityInfo(client, replicaServer)
+	disks, totalUsage, totalCapacity, err := QueryDiskCapacityInfo(client, replicaServer)

Review Comment:
   Why rename it? Seems the other functions are beginning with a lower letter.



##########
admin-cli/executor/toolkits/nodesbalancer/balancer.go:
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/session"

Review Comment:
   Need update



##########
admin-cli/executor/toolkits/nodesbalancer/migrator.go:
##########
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package nodesbalancer
+
+import (
+	"fmt"
+	"math"
+	"time"
+
+	"github.com/XiaoMi/pegasus-go-client/idl/base"

Review Comment:
   need update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong merged pull request #969: feat(admin-cli): support nodes capacity balance using admin-cli

Posted by GitBox <gi...@apache.org>.
hycdong merged PR #969:
URL: https://github.com/apache/incubator-pegasus/pull/969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org