You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/18 11:18:15 UTC

[GitHub] [incubator-uniffle] xianjingfeng opened a new pull request, #166: Support deploy multiple shuffle servers in a single node

xianjingfeng opened a new pull request, #166:
URL: https://github.com/apache/incubator-uniffle/pull/166

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://github.com/Tencent/Firestorm/blob/master/CONTRIBUTING.md
     2. Ensure you have added or run the appropriate tests for your PR
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]XXXX Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   1.Sufflle server with same ip will not be assigned to same partition
   2.Check whether port is in use in start script of shuffle server
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   If we have a lot of  memory(more than 1T) per host, so we need deploy multiple shuffle servers in a single node. #77 
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   already added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1219477103

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/166?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#166](https://codecov.io/gh/apache/incubator-uniffle/pull/166?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (abe704f) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/425c02cb1ae482f555f162d7029d08fdb5a9bb55?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (425c02c) will **decrease** coverage by `1.27%`.
   > The diff coverage is `72.41%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #166      +/-   ##
   ============================================
   - Coverage     57.95%   56.68%   -1.28%     
   + Complexity     1223     1157      -66     
   ============================================
     Files           151      144       -7     
     Lines          8237     7766     -471     
     Branches        772      747      -25     
   ============================================
   - Hits           4774     4402     -372     
   + Misses         3217     3125      -92     
   + Partials        246      239       -7     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/166?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...oordinator/PartitionBalanceAssignmentStrategy.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvUGFydGl0aW9uQmFsYW5jZUFzc2lnbm1lbnRTdHJhdGVneS5qYXZh) | `95.52% <33.33%> (-2.94%)` | :arrow_down: |
   | [.../java/org/apache/uniffle/server/ShuffleServer.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlU2VydmVyLmphdmE=) | `70.40% <40.00%> (-0.34%)` | :arrow_down: |
   | [...e/uniffle/coordinator/BasicAssignmentStrategy.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQmFzaWNBc3NpZ25tZW50U3RyYXRlZ3kuamF2YQ==) | `91.17% <50.00%> (-5.60%)` | :arrow_down: |
   | [...niffle/coordinator/AbstractAssignmentStrategy.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQWJzdHJhY3RBc3NpZ25tZW50U3RyYXRlZ3kuamF2YQ==) | `91.66% <91.66%> (ø)` | |
   | [.../org/apache/uniffle/common/config/RssBaseConf.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL2NvbW1vbi9jb25maWcvUnNzQmFzZUNvbmYuamF2YQ==) | `92.18% <100.00%> (+0.31%)` | :arrow_up: |
   | [...e/uniffle/server/storage/SingleStorageManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9zdG9yYWdlL1NpbmdsZVN0b3JhZ2VNYW5hZ2VyLmphdmE=) | `65.57% <0.00%> (-1.64%)` | :arrow_down: |
   | [.../org/apache/uniffle/server/ShuffleTaskManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlVGFza01hbmFnZXIuamF2YQ==) | `63.96% <0.00%> (-0.14%)` | :arrow_down: |
   | [...che/spark/shuffle/writer/BufferManagerOptions.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQnVmZmVyTWFuYWdlck9wdGlvbnMuamF2YQ==) | | |
   | [...org/apache/spark/shuffle/RssSparkShuffleUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya1NodWZmbGVVdGlscy5qYXZh) | | |
   | [.../org/apache/spark/shuffle/writer/WriterBuffer.java](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVyQnVmZmVyLmphdmE=) | | |
   | ... and [8 more](https://codecov.io/gh/apache/incubator-uniffle/pull/166/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r978221475


##########
coordinator/src/test/java/org/apache/uniffle/coordinator/PartitionBalanceAssignmentStrategyTest.java:
##########
@@ -282,4 +285,70 @@ public void testAssignmentShuffleNodesNum() {
             .size()
     );
   }
+
+
+  @Test
+  public void testAssignmentWithSameIP() throws Exception {
+    CoordinatorConf ssc = new CoordinatorConf();
+    ssc.setInteger(CoordinatorConf.COORDINATOR_SHUFFLE_NODES_MAX, shuffleNodesMax);
+    ssc.set(CoordinatorConf.COORDINATOR_ASSGINMENT_HOST_STRATEGY,
+        AbstractAssignmentStrategy.HostAssignmentStrategy.MUST_DIFF);
+    clusterManager = new SimpleClusterManager(ssc, new Configuration());
+    strategy = new PartitionBalanceAssignmentStrategy(clusterManager, ssc);
+
+    Set<String> serverTags = Sets.newHashSet("tag-1");
+
+    for (int i = 0; i < 5; ++i) {
+      clusterManager.add(new ServerNode("t1-" + i, "127.0.0." + i, 0, 0, 0,
+          20 - i, 0, serverTags, true));
+    }
+    for (int i = 0; i < 5; ++i) {
+      clusterManager.add(new ServerNode("t2-" + i, "127.0.0." + i, 1, 0, 0,
+          20 - i, 0, serverTags, true));
+    }
+    PartitionRangeAssignment pra = strategy.assign(100, 1, 5, serverTags, -1);
+    pra.getAssignments().values().forEach((nodeList) -> {
+      Map<String, ServerNode> nodeMap = new HashMap<>();
+      nodeList.forEach((node) -> {
+        ServerNode serverNode = nodeMap.get(node.getIp());
+        assertNull(serverNode);
+        nodeMap.put(node.getIp(), node);
+      });
+    });
+
+    pra = strategy.assign(100, 1, 6, serverTags, -1);
+    pra.getAssignments().values().forEach((nodeList) -> {
+      Map<String, ServerNode> nodeMap = new HashMap<>();
+      boolean hasSameHost = false;
+      for (ServerNode node : nodeList) {
+        ServerNode serverNode = nodeMap.get(node.getIp());
+        if (serverNode != null) {
+          hasSameHost = true;
+          break;
+        }
+        assertNull(serverNode);
+        nodeMap.put(node.getIp(), node);
+      }
+      assertTrue(hasSameHost);
+    });
+  }
+

Review Comment:
   We would better test the Strategy `PREFER_DIFF` and `NONE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r977230410


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -191,6 +191,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(60L)
       .withDescription("The kerberos authentication relogin interval. unit: sec");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   > Could we reuse this environment value? [6937631](https://github.com/apache/incubator-uniffle/commit/6937631876052425b8d808d26caf78c79b24536a)
   
   Yes, but using environment variables may cause some problems if we create mutil ShuffleServer in ut, should we reuse?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r976250918


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractAssignmentStrategy.java:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public abstract class AbstractAssignmentStrategy implements AssignmentStrategy {
+  protected List<ServerNode> getCandidateNodes(List<ServerNode> allNodes, int expectNum) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1254917501

   @leixm Could you help see this flaky test? https://github.com/apache/incubator-uniffle/actions/runs/3104642429/jobs/5029351283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1259266750

   @xianjingfeng  Sorry, I forget updating the document. If u have time, please raise a pr to update the related the doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1254940099

   > @leixm Could you help see this flaky test? https://github.com/apache/incubator-uniffle/actions/runs/3104642429/jobs/5029351283 error logs is as follow [test-reports-spark3.2.0.zip](https://github.com/apache/incubator-uniffle/files/9625399/test-reports-spark3.2.0.zip)
   
   I have fixed this in the pr https://github.com/apache/incubator-uniffle/pull/238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1255736038

   Should we add the documents for this feature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r982259583


##########
bin/start-shuffle-server.sh:
##########
@@ -48,7 +48,9 @@ MAIN_CLASS="org.apache.uniffle.server.ShuffleServer"
 HADOOP_DEPENDENCY="$("$HADOOP_HOME/bin/hadoop" classpath --glob)"
 
 echo "Check process existence"
-is_jvm_process_running "$JPS" $MAIN_CLASS
+RPC_PORT=`grep '^rss.rpc.server.port' $CONF_FILE |awk '{print $2}'`

Review Comment:
   `CONF_FILE` should be `SHUFFLE_SERVER_CONF_FILE`. I will raise a pr to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng closed pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
xianjingfeng closed pull request #166: Support deploy multiple shuffle servers in a single node
URL: https://github.com/apache/incubator-uniffle/pull/166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r978221475


##########
coordinator/src/test/java/org/apache/uniffle/coordinator/PartitionBalanceAssignmentStrategyTest.java:
##########
@@ -282,4 +285,70 @@ public void testAssignmentShuffleNodesNum() {
             .size()
     );
   }
+
+
+  @Test
+  public void testAssignmentWithSameIP() throws Exception {
+    CoordinatorConf ssc = new CoordinatorConf();
+    ssc.setInteger(CoordinatorConf.COORDINATOR_SHUFFLE_NODES_MAX, shuffleNodesMax);
+    ssc.set(CoordinatorConf.COORDINATOR_ASSGINMENT_HOST_STRATEGY,
+        AbstractAssignmentStrategy.HostAssignmentStrategy.MUST_DIFF);
+    clusterManager = new SimpleClusterManager(ssc, new Configuration());
+    strategy = new PartitionBalanceAssignmentStrategy(clusterManager, ssc);
+
+    Set<String> serverTags = Sets.newHashSet("tag-1");
+
+    for (int i = 0; i < 5; ++i) {
+      clusterManager.add(new ServerNode("t1-" + i, "127.0.0." + i, 0, 0, 0,
+          20 - i, 0, serverTags, true));
+    }
+    for (int i = 0; i < 5; ++i) {
+      clusterManager.add(new ServerNode("t2-" + i, "127.0.0." + i, 1, 0, 0,
+          20 - i, 0, serverTags, true));
+    }
+    PartitionRangeAssignment pra = strategy.assign(100, 1, 5, serverTags, -1);
+    pra.getAssignments().values().forEach((nodeList) -> {
+      Map<String, ServerNode> nodeMap = new HashMap<>();
+      nodeList.forEach((node) -> {
+        ServerNode serverNode = nodeMap.get(node.getIp());
+        assertNull(serverNode);
+        nodeMap.put(node.getIp(), node);
+      });
+    });
+
+    pra = strategy.assign(100, 1, 6, serverTags, -1);
+    pra.getAssignments().values().forEach((nodeList) -> {
+      Map<String, ServerNode> nodeMap = new HashMap<>();
+      boolean hasSameHost = false;
+      for (ServerNode node : nodeList) {
+        ServerNode serverNode = nodeMap.get(node.getIp());
+        if (serverNode != null) {
+          hasSameHost = true;
+          break;
+        }
+        assertNull(serverNode);
+        nodeMap.put(node.getIp(), node);
+      }
+      assertTrue(hasSameHost);
+    });
+  }
+

Review Comment:
   We would better test the strategy `PREFER_DIFF` and `NONE`, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r950314194


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractAssignmentStrategy.java:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public abstract class AbstractAssignmentStrategy implements AssignmentStrategy {

Review Comment:
   Most cases need this logic, and strategys may have many same logics in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r949743151


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -158,6 +158,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(true)
       .withDescription("The switch for jvm metrics verbose");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   Is this config option is only used for test? We already have https://github.com/apache/incubator-uniffle/commit/6937631876052425b8d808d26caf78c79b24536a, could we use this?



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractAssignmentStrategy.java:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public abstract class AbstractAssignmentStrategy implements AssignmentStrategy {
+  protected List<ServerNode> getCandidateNodes(List<ServerNode> allNodes, int expectNum) {

Review Comment:
   For a scheduler, we usually provide an option for users to choose whether to allocate on a single node.  Because there are enough nodes to choose sometimes. To solve this problems, scheduler usually provide three semantics.
   First, don't consider this factor. When we can assign the servers, we don't consider whether there are partitions on the same node.
   Second, prefer considering this factor. When we can assign the servers, we try our best to avoid the partitions on the same node. But if we don't have enough the servers, we could assign the same node  to the partitions.
   Third, must considering this factor. We must avoid the partitions on the same node. If we don't have enough servers, we return the servers directly.
   You don't need to implement the every semantics, but we hope we can implement the other semantics easily in the future when there are users which need the semantics , so should we need some config options or implement abstraction?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r976250599


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -158,6 +158,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(true)
       .withDescription("The switch for jvm metrics verbose");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   I think use environment variable is not good in this case, because we need create mutil ShuffleServer in ut sometimes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r977138987


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -191,6 +191,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(60L)
       .withDescription("The kerberos authentication relogin interval. unit: sec");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   Could we reuse this environment value? https://github.com/apache/incubator-uniffle/commit/6937631876052425b8d808d26caf78c79b24536a



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1254942473

   > > @leixm Could you help see this flaky test? https://github.com/apache/incubator-uniffle/actions/runs/3104642429/jobs/5029351283 error logs is as follow [test-reports-spark3.2.0.zip](https://github.com/apache/incubator-uniffle/files/9625399/test-reports-spark3.2.0.zip)
   > 
   > I have fixed this in the pr #238
   
   Thank you. @jerqi  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r950289516


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractAssignmentStrategy.java:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public abstract class AbstractAssignmentStrategy implements AssignmentStrategy {

Review Comment:
   Why we need this? If u want to deploy multiple shuffle servers on one machine and hope avoid partitions assigned to same host. I think you could implement custom assignment strategy.  There is no need to change default strategy.
   
   Right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r950289516


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractAssignmentStrategy.java:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public abstract class AbstractAssignmentStrategy implements AssignmentStrategy {

Review Comment:
   Why we need this? If u want to deploy multiple shuffle servers on one machine and hope avoid partitions assigned to different host. I think you could implement custom assignment strategy. 
   
   Right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r977247512


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -191,6 +191,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(60L)
       .withDescription("The kerberos authentication relogin interval. unit: sec");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   You can see the pr's test case, it's not a big problem for ut. We just need to set the environment variable for one server and start the server instead of starting all the servers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#discussion_r977138987


##########
common/src/main/java/org/apache/uniffle/common/config/RssBaseConf.java:
##########
@@ -191,6 +191,12 @@ public class RssBaseConf extends RssConf {
       .defaultValue(60L)
       .withDescription("The kerberos authentication relogin interval. unit: sec");
 
+  public static final ConfigOption<String> RSS_SERVER_IP = ConfigOptions

Review Comment:
   Could we reuse this environment? https://github.com/apache/incubator-uniffle/commit/6937631876052425b8d808d26caf78c79b24536a



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #166: Support deploy multiple shuffle servers in a single node

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #166:
URL: https://github.com/apache/incubator-uniffle/pull/166#issuecomment-1258928910

   Pending CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org